Hello,
While attempting to compile PyCUDA under Python 3:
$ virtualenv -p python3.2 --system-site-packages myenv
$ cd myenv
$ source bin/activate
$ git clone https://github.com/inducer/pycuda.git
$ cd pycuda
$ git submodule init
$ git submodule update
$ python setup.py install
I received:
x86_64-pc-linux-gnu-g++ -pthread -fPIC
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/lib/python3.2/site-packages/numpy/core/include
-I/usr/include/python3.2 -c src/wrapper/_pvt_struct_v3.cpp -o
build/temp.linux-x86_64-3.2/src/wrapper/_pvt_struct_v3.o
src/wrapper/_pvt_struct_v3.cpp: In function ‘int s_init(PyObject*,
PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1045:41: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1047:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1138:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack_from(PyObject*, PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1172:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1296:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_pack_into(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1336:5: error: ‘PyStructType’ was not
declared in this scope
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
error: command 'x86_64-pc-linux-gnu-g++' failed with exit status 1
which is very similar to https://github.com/inducer/pycuda/issues/11
in that it can also be fixed by passing the -DNDEBUG flag. Would it be
possible for this fix to be ported to _pvt_struct_v3? (Or just ensure
that -DNDEBUG is always passed.)
Also, are there any other potential issues with PyCUDA and Python 3.x
that I should be aware of?
Regards, Freddie.
Hi All,
I am not sure if this is the right place to post this (nvidia forum would
be better I guess?), but since I am running my cuda kernels inside python
(using pycuda) I am going to give this list a try.
My problem is that if I run:
cuda-memcheck python -m pycuda.debug <my_pycuda_code.py>
I end up with few global memory access violations that look like this
========= CUDA-MEMCHECK
*** compiler output in /tmp/tmpwmbHa1
Traceback (most recent call last):
File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
exec code in run_globals
File
"/usr/local/lib/python2.7/dist-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/debug.py",
line 25, in <module>
execfile(mainpyfile)
File "./layers.py", line 239, in <module>
main()
File "./layers.py", line 233, in main
filt, s1, c1, s2, c2 = extract_all_layers(img, params, index_gpu)
File "./layers.py", line 213, in extract_all_layers
test= True)
File "./layers.py", line 117, in extract_s2_c2
s2_w += [s2_d_temp.get()]
File
"/usr/local/lib/python2.7/dist-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.py",
line 254, in get
drv.memcpy_dtoh(ary, self.gpudata)
pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: launch failed
========= Invalid __global__ read of size 4
========= at 0x00000970 in /tmp/tmpwmbHa1/kernel.cu:135
:extract_s2_matlab_investigation
========= by thread (31,20,0) in block (0,0,0)
========= Address 0xb00350d6c is out of bounds
=========
========= Invalid __global__ read of size 4
========= at 0x00000970 in /tmp/tmpwmbHa1/kernel.cu:135
:extract_s2_matlab_investigation
========= by thread (30,20,0) in block (0,0,0)
========= Address 0xb00350d68 is out of bounds
.....
========= Invalid __global__ read of size 4
========= at 0x00000970 in /tmp/tmpwmbHa1/kernel.cu:135
:extract_s2_matlab_investigation
========= by thread (0,0,0) in block (1,1,0)
========= Address 0xb00351730 is out of bounds
=========
========= Program hit error 700 on CUDA API call to cuMemcpyDtoH_v2
=========
========= Program hit error 700 on CUDA API call to cuMemFree_v2
=========
========= Program hit error 700 on CUDA API call to cuMemFree_v2
=========
========= Program hit error 700 on CUDA API call to cuModuleUnload
=========
========= ERROR SUMMARY: 1096 errors
Now, if I run my code (./<my_python_cuda.py>), everything works fine (no
crash), even if I run it thousands of times. The output that I am getting
also makes sense: I work in computer vision and I compare my pycuda code
with a scipy implementation of my algorithm and they both output the same
arrays.
Now, I used cuda-gdb on the pycuda code and put breakpoints where
cuda-memcheck tells me that global memory access violation occurs. I
checked all my variables and everything made sense (I run my kernel line by
line and I didn't "notice" any out of bounds access). Also, I tried to "set
cuda memcheck on" before starting to run my code on cuda-gdb and it didn't
stop where global memory access violation seems to occur according to
cuda-memcheck. Any ideas?
PS: I am new to debugging with cuda-gdb so please let me know if I am
missing something crucial related to cuda-memcheck
Thank you so much!!
Youssef
--
Youssef Barhomi, MSc, MEng.
Research Software Engineer at the CLPS department
Brown University
T: +1 (617) 797 9929 | GMT -5:00
I tried to start debugging within cuda, but can't even get the demo.py
to work as shown at
http://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions#Is_it_possible_to_use….
After setting the breakpoint and starting the problem, it just
finishes without ever breaking when calling the doublify function.
Am I missing something? I'm using the cuda 5.0.
--
Tomi Pieviläinen, +358 400 487 504
A: Because it disrupts the natural way of thinking.
Q: Why is top posting frowned upon?
Hi all,
I'm doing some experiments on paged-locked memory, but got some errors that I failed to address when I use aligned memory to invoke the kernel function.
My code is like below:
1 import pycuda.driver as drv
2 import autoinit
3
4 temp = drv.aligned_empty(my_numpy_array.shape, dtype=my_numpy_array.dtype, order='C')
5 temp = drv.register_host_memory( temp, flags=drv.mem_host_register_flags.DEVICEMAP)
6 my_plptr = numpy.intp(temp.base.get_device_pointer())
7
8 start_event.record()
9 kernel_function( my_plptr, rest_parameters, block, grid)
10 end_event.record()
11 end_event.synchronize()
But when executing, it reports error saying
Traceback (most recent call last):
end.synchronize()
pycuda._driver.LaunchError: cuEventSynchronize failed: launch failed
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch failed
This code https://gist.github.com/4036292 works good on the same machine.
Thanks.
John
--------------------------------
M: (+61) 415786645
Canberra, Australia
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
Daisuke Nishino <niboshi000(a)gmail.com> writes:
> Andreas,
> Thank you for your reply.
>
> When I add mem_flags=pycuda.driver.host_alloc_flags.DEVICEMAP in
> pagelocked_empty(),
> it results in error:
> pycuda._driver.LogicError: cuMemHostAlloc failed: invalid value
>
> Does this mean my card doesn't support this functionality?
> I'm not sure if it is relevant, but CAN_MAP_HOST_MEMORY entry in
> pycuda.driver.Context.get_driver().get_attributes() is 1.
http://documen.tician.de/pycuda/driver.html#pycuda.driver.ctx_flags.MAP_HOST
Andreas
Andreas,
I found that the previous error is the problem of initialization.
Instead of pycuda.autoinit, I made the context by myself with MAP_HOST flag.
It seems working fine now.
Thank you so much!
My code looks like below now.
##CODE START######################
import pycuda.driver as drv
drv.init()
dev = drv.Device(0)
ctx = dev.make_context(drv.ctx_flags.SCHED_AUTO | drv.ctx_flags.MAP_HOST)
k = pycuda.compiler.SourceModule("""
__global__ void krnl(float* a) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i] = i;
}
""").get_function("krnl")
a = drv.pagelocked_empty((10, 10), numpy.float32,
mem_flags=drv.host_alloc_flags.DEVICEMAP)
aa = numpy.intp(a.base.get_device_pointer())
k(aa, grid=(100,1), block=(1,1,1))
ctx.pop()
##CODE END######################
On 5 December 2012 17:14, Daisuke Nishino <niboshi000(a)gmail.com> wrote:
> Andreas,
> Thank you for your reply.
>
> When I add mem_flags=pycuda.driver.host_alloc_flags.DEVICEMAP in
> pagelocked_empty(),
> it results in error:
> pycuda._driver.LogicError: cuMemHostAlloc failed: invalid value
>
> Does this mean my card doesn't support this functionality?
> I'm not sure if it is relevant, but CAN_MAP_HOST_MEMORY entry in
> pycuda.driver.Context.get_driver().get_attributes() is 1.
>
> Thanks,
> Daisuke
>
>
>
> On 5 December 2012 16:47, Andreas Kloeckner <lists(a)informa.tiker.net>wrote:
>
>> Daisuke Nishino <niboshi000(a)gmail.com> writes:
>>
>> > Hi, all*.
>> >
>> > *I have a problem using pagelocked memory.
>> > I allocated one with pagelocked_xxx or PageLockedMemoryPool, but how
>> can I
>> > pass it into a kernel?
>> > I put a simple code below.
>> > What should "aa" be?
>> >
>> > aa = pycuda.driver.Out(a) works just fine, but I guess it involves a
>> copy.
>>
>>
>> http://documen.tician.de/pycuda/driver.html#pycuda.driver.host_alloc_flags.…
>>
>>
>> http://documen.tician.de/pycuda/driver.html#pycuda.driver.HostPointer.get_d…
>>
>> Andreas
>>
>>
>
>
> --
> Daisuke Nishino
>
--
Daisuke Nishino
Hi, all*.
*I have a problem using pagelocked memory.
I allocated one with pagelocked_xxx or PageLockedMemoryPool, but how can I
pass it into a kernel?
I put a simple code below.
What should "aa" be?
aa = pycuda.driver.Out(a) works just fine, but I guess it involves a copy.
Thanks,
Daisuke
##CODE START######################
k = pycuda.compiler.SourceModule("""
__global__ void krnl(float* a) {
int i = blockIdx.x * blockDim.x + threadIdx.x;
a[i] = i;
}
""").get_function("krnl")
a = pycuda.driver.pagelocked_empty((10, 10), numpy.float32)
aa = ??
k(aa, grid=(100,1), block=(1,1,1))
##CODE END######################
--
Daisuke Nishino
Hi all,
Is there any way I can edit the kernel function in .h or .cu extension file, which is syntax highlight, instead of editing it as a string?
For example, I have a kernel called swap(float *a, float *b, int length) stored in swap.h and implemented in swap.cu, and in the Python file where the module is like below
SourceModule("""
#include"swap.h"
""")
But currently it just reports fail to find the swap.h file.
Thanks!
Cheers,
John
--------------------------------
M: (+61) 415786645
Canberra, Australia
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
After some reading and profiling I realized that as long as they ain't
launched in different streams they'll be serialized. Sorry for the
inconvenience
On Tue, Dec 4, 2012 at 11:51 AM, Leandro Demarco Vedelago
<leandrodemarco(a)gmail.com> wrote:
> Hi there. I have 2 kernels, one which must be launched only after the
> first one has finished it's computations as it uses the results
> computed by the first.
>
> My code looks something like:
>
> def main():
> ----- Something ------
> src = SourceModule(cudaCode)
> f1 = src.get_function("f1")
> f2 = src.get_function("f2")
> f1(args1, res1, block=..., grid=...)
> f2(args2, res2, block=..., grid=...)
> ---- Something else -----
>
> where res1 is contained in args2
>
> So, I was wondering if in "normal" conditions (I'm using the same
> context and just one stream for both launches) f2 is launched only
> after f1 has ended or it's possible (maybe because of the Cuda
> scheduler) that it's launched before f1 is done, in which case I
> should seek a way to prevent this.
>
> Thanks in advance, Leandro Demarco.