There are two errors:
1. You are trying to use 32x32 block, but this size is only supported
by compute compatibility 2.0 devices (Teslas and probably other new
cards, look it up in the programming guide). Older cards (such as
mine) only allow maximum 512 threads per block, so I had to change it
to 16x16 (with corresponding changes to other parts of the code). You
did not specify what error exactly are you getting, but keep this in
2. When you create the output array as
rot_im = zeros((width,height))
it has dtype=float64 by default. You have to explicitly set it to the
same type as curr_im (float32), for example by writing
rot_im = zeros((width,height)).astype(curr_im.dtype)
With these changes your code works correctly on my system.
On Sun, Oct 30, 2011 at 6:28 PM, Apostolis Glenis <apostglen46(a)gmail.com> wrote:
I tried adapting the SDK naive transpose example for a
class project that
i'm working on.(The class project isn't about transpose but it is related).
Could you please tell me what is wrong with my code?
PyCUDA mailing list