gpuarray allocator error .. size cannot be np.int32 etc
by nithin s
Hi
consider this code fragment
shape = np.array(gpuarr.shape,dtype=np.int16)
another_gpuarr = gpuarray.zeros(shape, np.uint8)
this code faults at
pycuda/gpuarray.py, line 81
where a call is made to
self.gpudata = self.allocator(self.size * self.dtype.itemsize)
this is because now type(self.size * self.dtype.itemsize) =
np.int64 and not int. This messes up the call to Boost.python
allocate.
I'm sure there are many places where this kind of error happens.
Regards
Nithin
8 years, 8 months
SparseSolve.py example
by elafrit
Hello - I'm trying to run the SparseSolve.py example. I installed PyMetis
package after fixing the configuration like here :
./configure --python-exe=python2.6 --boost-inc-dir=/usr/include/boost
--boost-lib-dir=/usr/lib/ --boost-python-libname=boost_python-mt-py26
But when running the SparseSolve.py example I encountred this error :
ImportError:
/usr/local/lib/python2.6/dist-packages/PyMetis-0.91-py2.6-linux-x86_64.egg/pymetis/_internal.so:
undefined symbol: regerrorA
What does this error means? thanks for any suggestions.
--
View this message in context: http://pycuda.2962900.n2.nabble.com/PyCUDA-SparseSolve-py-example-tp59697...
Sent from the PyCuda mailing list archive at Nabble.com.
8 years, 8 months
Problem with fp_tex2D
by Markus Wollgarten
Dear Mailing-Listeners!
Executing the following code (adapted from test_driver.py):
-------------------
import pycuda.driver as drv
import pycuda.gpuarray as ga
from pycuda.compiler import SourceModule
print "pycuda.VERSION "+str(pycuda.VERSION)
print "Compute Capapility
"+str(drv.Context.get_device().compute_capability())
for tp in [numpy.float32, numpy.float64]:
from pycuda.tools import dtype_to_ctype
tp_cstr = dtype_to_ctype(tp)
mod = SourceModule("""
#include <pycuda-helpers.hpp>
texture<fp_tex_%(tp)s, 2>my_tex;
__global__ void copy_texture(%(tp)s *dest)
{
dest[threadIdx.xthreadIdx.y*8]=fp_tex2D(my_tex,threadIdx.y,threadIdx.x);
}
""" % {"tp": tp_cstr})
copy_texture = mod.get_function("copy_texture")
my_tex = mod.get_texref("my_tex")
shape = (8,2,)
a = numpy.random.randn(*shape).astype(tp)
a_gpu = ga.to_gpu(a)
a_gpu.bind_to_texref_ext(my_tex, allow_double_hack=True)
blck=shape+(1,)
dest = numpy.zeros(shape, dtype=tp)
g_dest=drv.to_device(dest)
copy_texture.prepare("P",blck,texrefs=[my_tex])
time=copy_texture.prepared_timed_call((1,1),g_dest)
dest=drv.from_device(g_dest, dest.shape, dest.dtype, order='C')
print a
print dest
---------
returns:
pycuda.VERSION (0, 94, 2)
Compute Capapility (1, 3)
[[-0.92633218 0.20489018]
[ 1.14500916 0.23236905]
[ 0.43516356 0.4719891 ]
[-0.8008799 0.81867486]
[-0.20814744 -0.55152911]
[ 0.81224 -1.37392473]
[ 1.99982738 0.11174646]
[ 0.11471771 -1.01642931]]
[[-0.92633218 -0.92633218]
[-0.92633218 -0.92633218]
[-0.92633218 -0.92633218]
[-0.92633218 -0.92633218]
[ 0. 0. ]
[ 0. 0. ]
[ 0. 0. ]
[ 0. 0. ]]
[[ -6.39043527e-01 -1.95960158e-01]
[ 1.69915072e+00 1.16279297e+00]
[ -4.03001846e-01 -1.23898467e+00]
[ 3.62089701e-01 1.84103824e-01]
[ 5.73958324e-01 -3.26678644e-04]
[ -2.28391102e-01 1.59704601e+00]
[ 1.43664545e+00 -1.15527274e-01]
[ 4.74887599e-01 7.09184358e-01]]
[[-0.63904353 -0.63904353]
[-0.63904353 -0.63904353]
[-0.63904353 -0.63904353]
[-0.63904353 -0.63904353]
[ 0. 0. ]
[ 0. 0. ]
[ 0. 0. ]
[ 0. 0. ]]
which is not what I expected, i.e. getting the same array back. The 1D
test from test_driver.py works but for 2D I probably did something
wrong. However, I have no clue what and would appreciate your help very
much.
Best wishes,
Markus
8 years, 8 months
adding an __int__ method to GPUArray?
by Lev Givon
Not sure if this suggestion has already been made, but would it be
possible to add an __int__() method to GPUArray that would return the
output of the instance's gpudata.__int__() method? This would
somewhat facilitate manipulating GPUArray instances with
ctypes-wrapped library functions that expect a pointer to GPU memory.
L.G.
8 years, 9 months
OpenGL missing feature
by Tomasz Rybak
Hello.
Some time ago I have sent new-style OpenGL wrapper. Don't worry, it works.
But recently I have been working on new program and noticed that new-style
OpenGL wrapper requires calling separate functions when mapping buffer and
surface objects. When mapping ordinary textures, they are treated as
surfaces, when texture is backed by a buffer, one can just map buffer -
and this is how I missed this function during testing functionality of
wrapper. I attach the patch that adds needed function (it also removes old
comment about missing feature).
Few problems with current situation (even after applyting this patch):
1. Programmer needs to remember which mapping contains surface and which
contains buffer. Would it be better to create separete classes that
would deal with those cases separately? Even if so, could you Andreas
apply this patch (to have fully working OpenGL wrapper) and deal with
changes in API later?
2. CUDA API returns CUarray. So far the only way of accessing it in PyCUDA
is through GPUArray, but when attaching to CUarray GPUArray creates new
object; we need ability to attach to existing CUarray without changing it.
Sorry for missing this so far. The only explanation is that accessing bare
textures in CUDA
is not used very often - especially as noone has complained about this
missing feature so far :-)
Best regards
Tomasz Rybak
8 years, 9 months
autoinit weirdness in cuda-gdb
by Lev Givon
I have a 64-bit Linux system that contains two GPUs. Device #0 is a
GTX 460, while device #1 is a GeForce 9300. When I attempt to run a
PyCUDA program such as
import pycuda.autoinit
print pycuda.autoinit.device.name()
print pycuda.autoinit.device.count()
in cuda-gdb via
cuda-gdb --args python -m pycuda.debug test.py
I obtain the following output:
(cuda-gdb) r
Starting program: /usr/bin/python -m pycuda.debug test.py
[Thread debugging using libthread_db enabled]
[New process 9632]
[New Thread 47944755071776 (LWP 9632)]
GeForce 9300 / nForce 730i
1
Program exited normally.
When I run the program from the command line, I obtain
GeForce GTX 460
2
Why does pycuda.autoinit not detect the first device when I run the
program from within cuda-gdb?
I'm using PyCUDA 0.94.2 and CUDA 3.2.
L.G.
8 years, 9 months
pycuda._driver.LogicError
by Pierre Castellani
Hi,
I am really sorry to, probably, make a very simple post, but i just have
installed pycuda
94.2 on ubuntu 10.10 64bits and i get the following error
>>>import pycuda.driver as drv
>>>drv.mem_alloc( 50 )
Traceback (most recent call last):
File "/home/pierre/workspace/TeoTech/testpython/test_pycuda.py", line 2,
in <module>
drv.mem_alloc( 50 )
pycuda._driver.LogicError: cuMemAlloc failed: not initialized
Any idea on what I have done wrong?
Thanks.
8 years, 9 months
pyCUDA and streams
by Magnus Paulsson
Attached a rather contrived example of computing ffts with pyfft. The
names say it all: serial.py, streams.py, streams-time.py.
I have tried to make an example utilizing streams. However, I'm not
convinced that it actually works as expected. If you can convince me
that it works or improve the code I promise to clean it up and put it
on the wiki.
1: Checking the GPU time width plot in the Compute visual profiler do
not show any overlap between stream 1 and 2. Reading over at the CUDA
forums it seems that perhaps this is caused by the profiler and that
running the code outside the profiler would not give the same
behaviour. Is this true? (And how do you profile your code if the
profiler is broken?)
2: The streamed version runs faster than the serial version. However,
I have a nagging suspicion that this speedup is only from faster
mem-copies and not from any overlap between streams. E.g., putting in
a line to print the time after each line shows that the "python time"
of the first mem-copy is ~0.3 ms while the "python time" of the first
fft call is ~ 6 ms while the second fft call is ~0.1 ms. 6 ms happens
to be the time of the mem-copy according to the visual profiler !?!
Can anyone confirm this ... is the first fft call blocking until the
data has been copied to the device? (also the get_async seems to be
blocking according to the "python time")
Any help appreciated.
-----------------------------------------------
Magnus Paulsson
Assistant Professor
School of Computer Science, Physics and Mathematics
Linnaeus University
Phone: +46-480-446308
Mobile: +46-70-6942987
8 years, 9 months