Hello,
While attempting to compile PyCUDA under Python 3:
$ virtualenv -p python3.2 --system-site-packages myenv
$ cd myenv
$ source bin/activate
$ git clone https://github.com/inducer/pycuda.git
$ cd pycuda
$ git submodule init
$ git submodule update
$ python setup.py install
I received:
x86_64-pc-linux-gnu-g++ -pthread -fPIC
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/include/python3.2 -c src/wrapper/_pvt_struct_v3.cpp -o
build/temp.linux-x86_64-3.2/src/wrapper/_pvt_struct_v3.o
src/wrapper/_pvt_struct_v3.cpp: In function ‘int s_init(PyObject*,
PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1045:41: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1047:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1138:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack_from(PyObject*, PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1172:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1296:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack_into(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1336:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
error: command 'x86_64-pc-linux-gnu-g++' failed with exit status 1
which is very similar to https://github.com/inducer/pycuda/issues/11
in that it can also be fixed by passing the -DNDEBUG flag. Would it be
possible for this fix to be ported to _pvt_struct_v3? (Or just ensure
that -DNDEBUG is always passed.)
Also, are there any other potential issues with PyCUDA and Python 3.x
that I should be aware of?
Regards, Freddie.
I have a weird problem when using the visual profiler: for about two
seconds my program works fine, but after that the kernel launches
become extremly slow (total running time goes up over hundredfold). I
made a small signal handler that reacts to SIGUSR1, and saw that while
the program does move on slowly, it's always busy waiting at
func._launch_kernel of driver.py. I then tried to decrease the number
of loops my simulation does to decrease the total run time to that
crucial two seconds for testing purposes, but the profiler runs the
program multiple times and on second time it's as slow right from the
first iteration.
I also tried running the program with CUDA_PROFILE=1, and everything
works just fine, runtime being roughly doubled compared to running
without any profiling.
Trying to use nvprof (that the visual profiler uses underneath, IIUC)
just gives "Warning: Application received signal 139".
Have you used the visual profiler or nvprof succesfully? Or noticed
similar behaviour?
In case it matters, I'm running the program on a remote headless
server with ssh -X, and using cuda 5.0.
--
Tomi Pieviläinen, +358 400 487 504
A: Because it disrupts the natural way of thinking.
Q: Why is top posting frowned upon?
I couldn't find anything on google, so has anyone used pycuda with the
now integrated curandom mtgp?
--
Tomi Pieviläinen, +358 400 487 504
A: Because it disrupts the natural way of thinking.
Q: Why is top posting frowned upon?
Hi Theodore,
Ted Kord <teddy.kord(a)gmail.com> writes:
> What's the level of support for sparse matrices and LA solvers in pyCUDA?
> Does it leverage CUSPARSE or is there a way of using Scipy, etc?
There is no support for CUSPARSE or CUSP. This might be included in
scikits.cuda. As far as support is concerned--there are two sparse
matrix formats and one Krylov-space solver (CG).
HTH,
Andreas
Hi
What's the level of support for sparse matrices and LA solvers in pyCUDA?
Does it leverage CUSPARSE or is there a way of using Scipy, etc?
--
Best regards,
Theodore
Hi all,
I am wondering if anyone has worked up a class to automatically select a
suitable thread block dimension given a function, nrow and ncol. I know
using OccupancyRecord I can determine the occupancy for a given number
of threads but it does not appear to be able to solve the inverse problem.
While I know there is more to performance than just occupancy it does
often correlate with performance.
Regards, Freddie.
Hi all;
I'm trying to work through the examples in the PyCuda distribution and
on the wiki, and I run into trouble right away.
When I try demo.py, I get the following error message:
phillip@phillip-P5E:~/pygpu/pycuda/examples$ python demo.py
Traceback (most recent call last):
File "demo.py", line 22, in <module>
""")
File
"/usr/local/lib/python2.7/dist-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/compiler.py",
line 289, in __init__
self.get_surfref = self.module.get_surfref
AttributeError: 'Module' object has no attribute 'get_surfref'
I've tried to solve this problem myself in several ways. First, I
upgraded my Linux distro to Ubuntu 12.04 (Oops -- too far -- Cuda
doesn't officially support this yet), then I upgraded to Cuda 4.2, which
seems to work. But neither of these changes helped this error message.
So finally, I tried building PyCuda from git, but even that didn't help.
I'm not sure what to make of this message, and it doesn't seem to matter
what's in my cuda code (I even tried an empty cuda code, but it didn't
help). My GPU is compute capability 1.3, so it's an older GPU.
Thanks for the help!
Phillip David
Received from Ahmed Fasih on Wed, Nov 07, 2012 at 11:07:31PM EST:
(snip)
> Thanks Lev! These gists were really useful in understanding how to use
> these functions, and they work for me too. Nonetheless, I tried and
> succeeded in breaking the second one: see
> https://gist.github.com/4036693
>
> First, I had to add "assert" in the calls to np.allclose to make sure
> I'd be informed if things weren't all close. Then I extended the
> kernel to work with multiple blocks, and finally I moved the unpinned
> test first. As I increased N from 20 to 22, both tests passed. But at
> N=23 (23 by 23 array), although the unpinned version works, the pinned
> assertion fails and PyCUDA complains that cleanup operations failed.
>
> I can't find any documented limit on the size of page-locked memory
> allocations, but it ought to be >3kb, right?
I'm not aware of any such limits.
> Ubuntu 11.10, NVIDIA driver 304.51, CUDA 5, PyCUDA 2012.1, Tesla
> C2050. If you or any other kind soul is able to successfully run this
> gist, let me know! https://gist.github.com/4036693
>
> Thanks again,
> Ahmed
When N*N > 512, the mismatch between array size
(np.double().nbytes*N*N) and the default alignment assumed by
pycuda.driver.aligned_empty() (4096) prevents all of the array elements from
being properly updated; if you preallocate a device-mapped array, you
don't need to worry about setting the alignment.
L.G.
On Fri, Nov 9, 2012 at 11:41 AM, mohsen jadidi <mohsen.jadidi(a)gmail.com> wrote:
> Hello,
> I am not sure if this problem related to pycuda gpuarray or sickits.cuda
> library so I just posted in both mailing list maybe I can find a solution
> about it.
>
> my problem is that when I am trying to find a matrix transpose of matrix
> which has mad up by concatenation by 2 matrices I'm getting wrong result. To
> be more precises :
>
> a1=np.array([[1,3,4,5],[7,4,8,2],[7,5,0,9]],np.float64)
>
> temp=np.array([[3,4,5],[4,8,2],[5,0,9]],np.float64)
>
> a2=r2=np.c_[np.array([1,7,7],np.float64),temp]
>
> a1_gpu=gpuarray.to_gpu(a1)
> a2_gpu=gpuarray.to_gpu(a2)
>
>
> so far everything works fine and I have same value for all matrices.
> a1=a1=a1_gpu=a2_gpu :
>
> [ 1., 3., 4., 5.]
> [ 7., 4., 8., 2.]
> [ 7., 5., 0., 9.]
>
> but now
>
> import scikits.cuda.linalg as la
>
> np.all(la.transpose(a1_gpu).get())==a1.T)
>
> returns True but but False for
>
> np.all(la.transpose(a2_gpu).get())==a2.T)
>
> my la.transpose(a2_gpu) :
>
> [ 1., 4., 0.]
> [ 7., 5., 5.]
> [ 7., 4., 2.]
> [ 3., 8., 9.]
>
> by looking at a1 and la.transpose(a2_gpu) it looks like the problem is
> somehow related to memory storage! I am right?
Thanks for including a minimal example (NB., it had a syntax error in
the np.all() calls :). The problem seems to go away if you do the
following:
a2 = r2 = np.array(np.c_[np.array([1,7,7],np.float64),temp], order='C')
For some reason, the array concatenation munges the C versus Fortran
ordering, and the call to np.array() with the order='C' keyword tells
Numpy to be explicit about C ordering. It's ugly but it works... if
anybody can explain why Numpy seems to have such crevices of unreason,
we'd be obliged. Hope this helps though,
Ahmed