Hi Andreas and others,
I want to run a function on a large, 2D complex array (eventually
2*12x2*12 datapoints). However, pycuda does not work as expected. The
ElementWise function doesn't work at 2d arrays, so I used the
SourceModule function with block sizes.
The problem is now that the C code on the GPU does not give the same
result as the numpy calculation on the CPU. Very large and strange
numbers are resulting.
I also made a post on http://stackoverflow.com/q/13031439/1768422
because the source code is easier to read. What's going wrong in my code?