Hello:

I'm have some problem with my testing of pycuda, which I don't understand.

I want to understand example of how to pointwise multiply large vector (like 10000 length) use pycuda. 

So I am trying the code from here:

     http://documen.tician.de/pycuda/

about "multiply_them".  It work correctly for length 400 vector, but I replace 400 with larger number, like 550, and now immediately it is not work. 

Here is error trace:

Traceback (most recent call last):
  File "test3.py", line 23, in <module>
    doublify(a_gpu, block=(100,100,1))
  File "/usr/local/Cellar/python/2.7/lib/python2.7/site-packages/pycuda-2011.1-py2.7-macosx-10.4-x86_64.egg/pycuda/driver.py", line 166, in function_call
    func.set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid value

I am use OSX 10.6, with python 2.7, most recent pycuda, and cuda. 

So what is wrong?   Is my idea wrong to use larger values, should I split up problem more into smaller piece? 

Thank you!

-Tob