Thank you so much for this help and very soon reply, I now understand what to do!


Hi Tobjan,

> I'm have some problem with my testing of pycuda, which I don't understand.
> I want to understand example of how to pointwise multiply large vector (like
> 10000 length) use pycuda.
> So I am trying the code from here:
> about "multiply_them".  It work correctly for length 400 vector, but I
> replace 400 with larger number, like 550, and now immediately it is not
> work.

Blocks are limited to 512 threads. To submit more work, you likely want to use
*both* blocks and grid of non-unit size. I've updated that doc example
to show how you can specify the grid size.

> So what is wrong?   Is my idea wrong to use larger values, should I split up
> problem more into smaller piece?

Yes, exactly.