[Hedge] Problem with Pycuda & hedge
Andreas Kloeckner
lists at informa.tiker.net
Thu Jul 21 08:07:05 PDT 2011
On Wed, 20 Jul 2011 16:16:21 -0400, Paul Cazeaux <paul_cazeaux at brown.edu> wrote:
> Hey,
> Thanks for the tip. I wanted to wait until victory to sing your praises, but that took a little longer than expected as I had to fix a few things
> (and read a whole lot of your code in the process):
>
> - I had the same problem as Peter, with a "pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid value". More precisely, I get
> block = (
> given.devdata.smem_granularity,
> plan.parallelism.parallel,
> plan.aligned_image_dofs_per_microblock
> #//given.devdata.smem_granularity)
> set to (512,1,0). Now I think that given.devdata.smem_granularity is always going to be 512 for my laptop's GeForce 9600 GT - so I hardcoded (512, 1, 1)
> as the block shape, and it seems to work. Is there a big problem with that?
>
> - Next I got stuck on a type error for a while, with pycuda complaining about doubles in the kernel code:
> pycuda.driver.CompileError: nvcc said it demoted types in source code it compiled--this is likely not what you want.
>
> After a lot of code-reading, I think the problem is that while the fields where set as floats in kernel.cu, the constants were written as " 0.569 " and cast
> as doubles by the compiler, hence the complaining. Now the integers didn't have that problem, and the example "wave.py" processed just fine with
> c = -1, but crashes if c is 1.5 for example. The reason is in the file vector_exp.py:
>
> # Make sure we do not generate integers by accident.
> # Oh, C and your broken division semantics.
>
> r = repr(num)
> if "." not in r:
> from pytools import to_uncomplex_dtype
> from codepy.cgen import dtype_to_ctype
> return "%s(%s)" % (dtype_to_ctype(
> to_uncomplex_dtype(result_dtype)), r)
> else:
> return r
>
> This adds float() around the integer, and the compiler doesn't complain anymore. I put "or result_dtype == numpy.float32" into the if condition and things
> are working fine now, at least for my basic operator or examples.
> For the actual operator, I have way too many variables so I get "Error: Formal parameter space overflowed (256 bytes max) in function vector_expression".
> I guess GPU's aren't perfect.
Still visiting Tim, so not much time to respond--just two things: 1) I'd
be happy to merge all your fixes--thanks for spending the time to figure
out what's going on. 2) The 'too many parameters' thing is easy to make
go away. A hack that does that already exists in quite a few places
(Create a struct, copy that to the GPU and pass a pointer to
that). You're still coming to NYC next week, right? If so (or even if
not so), we can look at all this then.
Andreas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/hedge/attachments/20110721/8efd04d7/attachment.pgp>
More information about the Hedge
mailing list