Hello.
I've been packaging PyCUDA for Debian.
I run all the tests to ensure that package works on Python 2
and Python 3. All tests pass except for on from test_driver.py:
$ python test_driver.py
============================= test session starts
==============================
platform linux2 -- Python 2.7.5 -- pytest-2.3.5
collected 21 items
test_driver.py ........F............
=================================== FAILURES
===================================
_____________________ TestDriver.test_register_host_memory
_____________________
args = (<test_driver.TestDriver instance at 0x24e7d88>,), kwargs = {}
pycuda = <module 'pycuda' from
'/usr/lib/python2.7/dist-packages/pycuda/__init__.pyc'>
ctx = <pycuda._driver.Context object at 0x2504488>
clear_context_caches = <function clear_context_caches at 0x1dbf848>
collect = <built-in function collect>
def f(*args, **kwargs):
import pycuda.driver
# appears to be idempotent, i.e. no harm in calling it more than
once
pycuda.driver.init()
ctx = make_default_context()
try:
assert isinstance(ctx.get_device().name(), str)
assert isinstance(ctx.get_device().compute_capability(),
tuple)
assert isinstance(ctx.get_device().get_attributes(), dict)
> inner_f(*args, **kwargs)
/usr/lib/python2.7/dist-packages/pycuda/tools.py:434:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
self = <test_driver.TestDriver instance at 0x24e7d88>
@mark_cuda_test
def test_register_host_memory(self):
if drv.get_version() < (4,):
from py.test import skip
skip("register_host_memory only exists on CUDA 4.0 and
later")
import sys
if sys.platform == "darwin":
from py.test import skip
skip("register_host_memory is not supported on OS X")
a = drv.aligned_empty((2**20,), np.float64, alignment=4096)
> drv.register_host_memory(a)
E LogicError: cuMemHostRegister failed: invalid value
test_driver.py:559: LogicError
==================== 1 failed, 20 passed in 116.85 seconds
=====================
This test fails both on ION (GeForce 9400M, CC 1.1) and GeForce 460
(CC 2.1). I've compiled PyCUDA with gcc 4.8, run with kernel 3.9
and drivers 304.88.
Regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi there!
I am currently chasing a very weird bug in my code: The following code
will consistently crash on Kepler-type GPUs (tested on a Tesla K40 and
on a GTX 780), but runs fine on my Fermi-class notebook GPU:
import numpy as np
import pycuda.autoinit
from pycuda import gpuarray
from pycuda.driver import Stream
from scikits.cuda.cublas import cublasSgemm
import scikits.cuda.autoinit
from scikits.cuda.misc import _global_cublas_handle as handle
for _ in range(3):
n = 131
s = slice(128, n)
X = gpuarray.to_gpu(np.random.randn(n, 2483).astype(np.float32))
a = gpuarray.empty((X.shape[1], 3), dtype=np.float32)
c = gpuarray.empty((a.shape[0], X.shape[1]), dtype=np.float32)
b = gpuarray.empty_like(X)
m, n, k = a.shape[0], b[s].shape[1], a.shape[1]
lda, ldb, ldc = m, k, m
cublasSgemm(handle, 'n', 'n', m, n, k, 1.0, b[s].gpudata, lda,
a.gpudata, ldb, 0.0, c.gpudata, ldc)
stream = Stream()
stream.synchronize()
The errors I'm getting are:
Traceback (most recent call last):
File "<stdin>", line 22, in <module>
pycuda._driver.LogicError: cuStreamSynchronize failed:
invalid/unknown error code
>>>
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuStreamDestroy failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
the Stream - thing at the end of the code is only necessary to notice
the error (by triggering error-checks). Copies to/from the device would
also trigger the same errors. The bug is extremely weird, especially since:
* the used constants seem to matter. If I change 'n' to 132 the error
goes away. If I change the 2nd dimension of X to 100 instead of 2483, it
goes away as well.
* the order of the allocations matter. If I allocate 'd' before 'c', the
error goes away
* the for-loop is necessary (i.e., the error only occurs at the third
run-through)
Still, the error seems to be completely reproducable across different
machines (tried on a machine running CentOS 6 on a K40, on a machine
running Ubuntu 13.10 on a K40 and on a machine running Xubuntu 14.04 on
a GTX 780).
At this point, I am at a complete loss. I don't know if the error is
caused by PyCUDA, cuBLAS or scikits.cuda (the latter seems the least
probable since cublasSgemm is very straight-forward) or by something
else entirely. I'd appreciate any help or advice.
Cheers
Thomas
Dear Junyi,
Junyi <9jhzguy(a)gmail.com> writes:
> For the duration of the kernel call, the cpu busy-waits by default. I
> changed the make_context() portion to include the SCHED_BLOCKING_SYNC flag,
> but the kernel call just hangs. How should I trigger the release? Thanks!
Can you please supply a) a reproducing snippet of code (ideally short)
and b) some information about your system (GPU, OS, CUDA, Python
versions)?
Thanks!
Andreas
Hi All,
I just installed PyCuda and got it working on my Windows 8.1 laptop. I'm
able to run the examples in VS 2010. My question concerns the 'preferred'
development environment for PyCuda: While it runs on Windows, I'm not able to
debug any kernel code using VS (Nvidia's Nsight) like I can using C/C++. I've
seen a bunch of threads concerning 'cuda-gdb' which leads me to believe folks
are using the command line on Linux to run PyCuda. Is this the 'preferred'
environment to run/debug PyCuda? Is there a way to debug the kernels in
PyCuda in Windows with/without VS Nsight or are we forced to use a command
line interface for debugging purposes?
Thanks,
Daniel
For the duration of the kernel call, the cpu busy-waits by default. I
changed the make_context() portion to include the SCHED_BLOCKING_SYNC flag,
but the kernel call just hangs. How should I trigger the release? Thanks!
Jimmy
JeHoon Song <song.je-hoon(a)kaist.ac.kr> writes:
> Hello,
>
> I just started to develop PyCUDA application.
>
> It build process is not successful as following:
>
> ...
> bpl-subset/bpl_subset/boost/type_traits/detail/cv_traits_impl.hpp:37:
> internal compiler error: in make_rtl_for_nonlocal_decl, at cp/decl.c:5067
> ...
http://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions#I_have_.3Cinsert_rand…
HTH,
Andreas
Hello,
I just started to develop PyCUDA application.
It build process is not successful as following:
...
bpl-subset/bpl_subset/boost/type_traits/detail/cv_traits_impl.hpp:37:
internal compiler error: in make_rtl_for_nonlocal_decl, at cp/decl.c:5067
...
I am not sure if I have to upgrade gcc. My gcc version is as following:
(j277)[pbs@tesla0 pycuda]$ gcc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-libgcj-multifile
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada
--enable-java-awt=gtk --disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic
--host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)
Could you help me to fix this problem?
Best,
Je-Hoon Song
--
*Je-Hoon Song*
Ph.D. Candidate
Laboratory for Systems Biology and Bio-Inspired Engineering (SBIE),
Department of Bio and Brain Engineering, KAIST, Republic of Korea
Phone: +82-42-350-4365