[PyCUDA] failed test_gpuarray on GTX480
bogomips at post.pl
Tue Sep 28 14:56:47 PDT 2010
Dnia 2010-09-28, wto o godzinie 00:29 -0700, jmcarval pisze:
> Thanks for your reply.
> I've read the first thread you mention, that ends without a solution
> Maybe I'm doing a huge mistake but it does not seem to be a precision
> The following code (a simplification of test_gpuarray), returns 30 from the
> CPU and 14 from the GTX480, either with integer, float32 or float64.
> I don't get it. Can anybody explain me what I'm doing wrong please?
> import pycuda.autoinit
> import numpy
> import pycuda.gpuarray as gpuarray
> from pycuda.curandom import rand as curand
> a = numpy.array([1,2,3,4])#.astype(numpy.float32)
> a_gpu = gpuarray.to_gpu(a)
> b = a
> b_gpu = gpuarray.to_gpu(b)
> dot_ab = numpy.dot(a, b)
> dot_ab_gpu = gpuarray.dot(a_gpu, b_gpu).get()
> print "CPU dot product:", dot_ab
> print "GPU dot product:", dot_ab_gpu
I have idea for (maybe) checking whether problem is with PyCUDA,
CUDA toolkit, or driver.
Can you force PyCUDA to generate not sm_20 code, but 1x?
I have found that it is determined in line 190 of file
arch = "sm_%d%d" % Context.get_device().compute_capability()
Try to change it to
arch = "sm_10"
and so on, and check whether you get incorrect 14 in such
If there is simpler way of changing architecture to which
PyCUDA generates code, feel free to use it and share this
Tomasz Rybak <bogomips at post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: This is a digitally signed message part
More information about the PyCUDA