Hello,
While attempting to compile PyCUDA under Python 3:
$ virtualenv -p python3.2 --system-site-packages myenv
$ cd myenv
$ source bin/activate
$ git clone https://github.com/inducer/pycuda.git
$ cd pycuda
$ git submodule init
$ git submodule update
$ python setup.py install
I received:
x86_64-pc-linux-gnu-g++ -pthread -fPIC
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/include/python3.2 -c src/wrapper/_pvt_struct_v3.cpp -o
build/temp.linux-x86_64-3.2/src/wrapper/_pvt_struct_v3.o
src/wrapper/_pvt_struct_v3.cpp: In function ‘int s_init(PyObject*,
PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1045:41: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1047:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1138:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack_from(PyObject*, PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1172:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1296:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack_into(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1336:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
error: command 'x86_64-pc-linux-gnu-g++' failed with exit status 1
which is very similar to https://github.com/inducer/pycuda/issues/11
in that it can also be fixed by passing the -DNDEBUG flag. Would it be
possible for this fix to be ported to _pvt_struct_v3? (Or just ensure
that -DNDEBUG is always passed.)
Also, are there any other potential issues with PyCUDA and Python 3.x
that I should be aware of?
Regards, Freddie.
hi blahblahblah,
I had same error, and tried your way.
/I did this :/
user@ubuntu:~/pycuda-2011.2.2$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59)
[GCC 4.6.3] on linux2
>>> import pycuda.driver as cuda
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pycuda/driver.py", line 2, in <module>
from pycuda._driver import *
ImportError: No module named _driver
/But i got error here too: /
user@ubuntu:~$ python
>>> import pycuda.driver as cuda
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File
"/usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-x86_64.egg/pycuda/driver.py",
line 2, in <module>
from pycuda._driver import *
ImportError: libcurand.so.4: wrong ELF class: ELFCLASS32
I am doing something wrong. Can you please elaborate ?
--
View this message in context: http://pycuda.2962900.n2.nabble.com/ImportError-No-module-named-driver-tp40…
Sent from the PyCuda mailing list archive at Nabble.com.
Forwarding off-list reply.
-------- Original Message --------
Subject: Re: [PyCUDA] Contexts and Threading
Date: Sat, 29 Sep 2012 19:08:09 +0200
From: Eelco Hoogendoorn <e.hoogendoorn(a)uva.nl>
To: Freddie Witherden <freddie(a)witherden.org>
Actually; Seems I should RTFM; see the pycuda FAQ
Combining threads and streams does not seem to work at all (or I am doing
something really stupid). Seems like you need to init the context in the
thread, and can not share it between them.
At least for the thing I have in mind, creating a context per thread
wouldn’t
really make sense; a context has a huge overhead, and trying to get
multiple
contexts to play nicely on the same device at the same time has so far
eluded me as well.
That is rather disappointing, as it seems there is no way around the hacky
state machine stream nonsense, if you want to run a lot of small kernels in
parallel (I am thinking millions of calls, each of which would be lucky to
saturate a single SMP)
Am I missing something?
-----Oorspronkelijk bericht-----
From: Freddie Witherden
Sent: Friday, September 28, 2012 7:26 PM
To: pycuda(a)tiker.net
Subject: [PyCUDA] Contexts and Threading
Hello,
I have a question regarding how PyCUDA interacts with CUDA 4.x's
support for sharing contexts across threads.
Broadly speaking I wish to create an analogue of CUDA streams that
also support invoking arbitrary Python functions (as opposed to just
CUDA kernels and memcpy operations).
My idea is to associate a Python thread with each CUDA stream in my
application and use a Queue (import Queue) to submit either CUDA
kernels or Python functions to the queue with the core code being
along the lines of:
def queue_worker(q, comm, stream):
while True:
item = q.get()
if item_is_a_cuda_kernel:
item(stream=stream)
stream.synchronize()
elif item_is_a_mpireq:
comm.Prequest.startall(item)
comm.Prequest.waitall(item)
else:
item()
q.task_done()
Allowing one to do:
q1, q2 = Queue(), Queue()
t1 = Thread(target=queue_worker, args=(q1, comm, a_stream1)
t2 = Thread(target=queue_worker, args=(q2, comm, a_stream2)
t1.start()
t2.start()
# Stick items into the queue for the thread to consume
However, this is only meaningful if it is possible to share a PyCUDA
context between threads. Can someone update me on if this is possible
at all (on the CUDA driver level) and if PyCUDA supports this?
Regards, Freddie.
_______________________________________________
PyCUDA mailing list
PyCUDA(a)tiker.net
http://lists.tiker.net/listinfo/pycuda
If you accidentally pass a numpy integer type as the shape
argument to GPUarray, memory allocation fails:
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as np
gpuarray.empty(shape=np.prod(10), dtype='double')
ArgumentError: Python argument types in
pycuda._driver.mem_alloc(numpy.int64)
did not match C++ signature:
mem_alloc(unsigned long)
Here's a patch:
-- >8 --
Subject: [PATCH] Accept numpy scalar types in shape when creating GPUarray
If shape contained a numpy int, allocation of the data would fail,
because we'd pass a numpy int not a python int into the allocator.
Fix this by explicitly converting s to a python scalar type.
This means we can now write:
a = gpuarray.empty(shape=np.prod(some_shape), ...)
In addition raise an assertion error if, for some reason, we
were passed a non-integral shape.
---
pycuda/gpuarray.py | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/pycuda/gpuarray.py b/pycuda/gpuarray.py
index 3f37016..ceb9cb3 100644
--- a/pycuda/gpuarray.py
+++ b/pycuda/gpuarray.py
@@ -162,6 +162,10 @@ class GPUArray(object):
s = shape
shape = (shape,)
+ if isinstance(s, np.integer):
+ s = np.asscalar(s)
+ assert isinstance(s, (int, long))
+
if strides is None:
if order == "F":
strides = _f_contiguous_strides(
Forwarding off-list reply.
-------- Original Message --------
Subject: Re: [PyCUDA] Contexts and Threading
Date: Sat, 29 Sep 2012 17:57:43 +0200
From: Eelco Hoogendoorn <e.hoogendoorn(a)uva.nl>
To: Freddie Witherden <freddie(a)witherden.org>
That is an interesting thought; I have been thinking about a similar
design,
where the aim is to cleanly execute algorithms that are embarrassingly
parralel, in the sense of not requiring any inter-stream communication.
Indeed it would seem to me that python threads should be a good fit for
this
type of application. One process per device and one thread per stream seems
to be a natural match, given the implementation of those concepts in python.
Don’t know what kind of issues youd run into though; best to juststart
trying and see, but for what its worth; I imagine a design pattern with
anabstract subclass of thread, which creates and holds a cuda stream. You
could then implement youd algo in a subclass thereof, and thatd be fairly
clean.
What bothers me though is that the overloaded gpuarray operators do not
support stream arguments; I cant really think of an elegant way to solve
that, and I suppose there are lots of problems of that nature if you start
digging deeper.
-----Oorspronkelijk bericht-----
From: Freddie Witherden
Sent: Friday, September 28, 2012 7:26 PM
To: pycuda(a)tiker.net
Subject: [PyCUDA] Contexts and Threading
Hello,
I have a question regarding how PyCUDA interacts with CUDA 4.x's
support for sharing contexts across threads.
Broadly speaking I wish to create an analogue of CUDA streams that
also support invoking arbitrary Python functions (as opposed to just
CUDA kernels and memcpy operations).
My idea is to associate a Python thread with each CUDA stream in my
application and use a Queue (import Queue) to submit either CUDA
kernels or Python functions to the queue with the core code being
along the lines of:
def queue_worker(q, comm, stream):
while True:
item = q.get()
if item_is_a_cuda_kernel:
item(stream=stream)
stream.synchronize()
elif item_is_a_mpireq:
comm.Prequest.startall(item)
comm.Prequest.waitall(item)
else:
item()
q.task_done()
Allowing one to do:
q1, q2 = Queue(), Queue()
t1 = Thread(target=queue_worker, args=(q1, comm, a_stream1)
t2 = Thread(target=queue_worker, args=(q2, comm, a_stream2)
t1.start()
t2.start()
# Stick items into the queue for the thread to consume
However, this is only meaningful if it is possible to share a PyCUDA
context between threads. Can someone update me on if this is possible
at all (on the CUDA driver level) and if PyCUDA supports this?
Regards, Freddie.
_______________________________________________
PyCUDA mailing list
PyCUDA(a)tiker.net
http://lists.tiker.net/listinfo/pycuda
Hi folks, I have a nagging feeling that this question has to have been
asked and answered before but I've only found tantalizing hints that
it is possible (and dare I say even easy). We use the NVIDIA Nsight
tool to analyze our all-C/CUDA projects and I'd like to be able to do
the same with the Python scripts I write that use the fabulous PyCUDA.
How do I go about doing this?
In early 2011 [1], a question appeared on this list that implied that
the poster was already doing this. I have tried attaching an
already-running process (in this case, IPython running inside Sage)
but get an error, "'Launching attach_launch' has encountered a
problem', with the following details:
Error in final launch sequence
Failed to execute MI command:
-target-attach 28235
Error message from debugger back end:
ptrace: Operation not permitted.
ptrace: Operation not permitted.
I'll attempt to debug this problem, but I thought to ask about
alternative ways to get this to happen, e.g., compiling the Python
code into a C representation (Cython?), etc.
Ubuntu 11.10 64-bit, Python 2.7.2, PyCUDA 2012.1.
[1] http://lists.tiker.net/pipermail/pycuda/2011-March/002934.html
Thanks,
Ahmed
--
Ahmed Fasih
fasih.1(a)osu.edu
wuzzyview(a)gmail.com
614 547 3323 (Google Voice)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hello,
I have a question regarding how PyCUDA interacts with CUDA 4.x's
support for sharing contexts across threads.
Broadly speaking I wish to create an analogue of CUDA streams that
also support invoking arbitrary Python functions (as opposed to just
CUDA kernels and memcpy operations).
My idea is to associate a Python thread with each CUDA stream in my
application and use a Queue (import Queue) to submit either CUDA
kernels or Python functions to the queue with the core code being
along the lines of:
def queue_worker(q, comm, stream):
while True:
item = q.get()
if item_is_a_cuda_kernel:
item(stream=stream)
stream.synchronize()
elif item_is_a_mpireq:
comm.Prequest.startall(item)
comm.Prequest.waitall(item)
else:
item()
q.task_done()
Allowing one to do:
q1, q2 = Queue(), Queue()
t1 = Thread(target=queue_worker, args=(q1, comm, a_stream1)
t2 = Thread(target=queue_worker, args=(q2, comm, a_stream2)
t1.start()
t2.start()
# Stick items into the queue for the thread to consume
However, this is only meaningful if it is possible to share a PyCUDA
context between threads. Can someone update me on if this is possible
at all (on the CUDA driver level) and if PyCUDA supports this?
Regards, Freddie.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/
iEYEARECAAYFAlBl3bQACgkQ/J9EM/uoqVclOgCfYBNn8ibh2sRIwHmYo6oX8P30
SZEAnA+732Spf1anfi3yv9wpEtPO/rNt
=q8uY
-----END PGP SIGNATURE-----
Hello , if i use cache_dir=False ,i deactivate the cache memory ,right?
And then, my program should run more slowly,right?
But i see that it has no difference in time.Maybe this is due to low memory
consum from my program?
For example, i have memory consum 208Mbytes.
(i calculate memory consum using
mem_finish=drv.mem_get_info()
print("free "," total ")
print(mem_finish)
print("memory consum (bytes) = ",mem_finish[1]-mem_finish[0])
)
Is this right?
Thanks!
--
View this message in context: http://pycuda.2962900.n2.nabble.com/question-about-cache-dir-False-tp757484…
Sent from the PyCuda mailing list archive at Nabble.com.
Hello ,
I am trying to check the speed on cpu and gpu for linear solver using
numpy.solve() and cula.culaDeviceDgesv().
when I'm testing the function for about 300 samples the result on gpu is
correct.but when I increase the number of samples I have two problem:
1- when increase to 400 and 500 I am getting this error:
numpy array time: 0.030175s
correctness= True
Traceback (most recent call last):
File "/home/jadidi/python-workespace/kernel/linear regression/solver.py",
line 78, in
gpu_result=gpu_solve(k,y)
File "/home/jadidi/python-workespace/kernel/linear regression/solver.py",
line 61, in gpu_solve
t=cula.culaDeviceDgesv(n, nrhs, k_gpu.ptr, lda, ipiv_gpu.ptr, y_gpu.ptr,
ldb)
File
"/usr/local/lib/python2.7/dist-packages/scikits.cuda-0.042-py2.7.egg/scikits/cuda/cula.py",
line 489, in culaDeviceDgesv
culaCheckStatus(status)
File
"/usr/local/lib/python2.7/dist-packages/scikits.cuda-0.042-py2.7.egg/scikits/cuda/cula.py",
line 210, in culaCheckStatus
raise culaExceptionsstatus<https://github.com/lebedov/scikits.cuda/issues/error>
scikits.cuda.cula.culaRuntimeError: 4
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuEventDestroy failed: launch failed
2-when I increase the to 600 there is no error but the result on gpu are
incorrect.!!
my code:
https://docs.google.com/document/d/1Owb20-6K_ffRuZH3FX2Vjgp4VD5YsLqXqF5jkWI…
I appreciate any help!
Mohsen