Hello.
On 2014-11-05 current Debian testing (Jessie) will get
frozen as a next step in preparation for release.
It means that it'll contain packages that are in archive
on that day, and in versions that are in archive on that
particular day.
So - when can we expect new version of PyCUDA?
To avoid any accidents, I intend to upload
packages by 2014-10-10 - to give them time
to migrate from unstable to testing, and to avoid
rushed upload.
If there is quite recent released version (both
PyCUDA and PyOpenCL) - e.g. 2014.1.1 from September,
I'll upload such a version. If not - I intend
to upload latest git commit of appropriate
package.
At the same time - if anyone has any suggestions
regarding packaging, please let me know. Jessie will
be the first Debian containing both PyCUDA and PyOpenCL
both for Python 2 and Python 3, and support for
free (libre) ICDs in PyOpenCL package, so I'd like to have
packages as polished as possible.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi
I'm having an issue with PyCUDA that at first glance seem like they
might be similar to those of Thomas Unterthiner (messages from Jun 20
2014, "Weird bug when slicing arrays on Kepler cards"). I'm also using a
Kepler card (GTX 670) and getting the same clean-up/dead context errors.
However, unlike Thomas, I'm not using cublas. The simplest example I can
show is below, which is a cuda kernel taken directly from here:
http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-write-flexible-kerne…
----------- code --------------
from numpy import random
from numpy import float32, float64, int32
import time
# CUDA
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
def main(n=2**12):
# CUDA grid
block_size=(256,1,1)
grid = (n/block_size[0],1)
# CUDA source
cusrc = SourceModule("""
__global__ void saxpy(int n, double a, double *x, double *y)
{
for (int i = blockIdx.x * blockDim.x + threadIdx.x;
i < n;
i += blockDim.x * gridDim.x)
{
y[i] = a * x[i] + y[i];
}
}
""")
SAXPY = cusrc.get_function('saxpy')
# data arrays
w = 500 #arbitrary
x = random.uniform(0,w,n) #.astype(float32) << same error with
either float or double
y = random.uniform(0,w,n) #.astype(float32)
y_o = y
#init gpu (input) arrays
a = float64(24.5)
n = int32(n)
a_gpu = cuda.mem_alloc(a.nbytes)
cuda.memcpy_htod(a_gpu, a)
n_gpu = cuda.mem_alloc(n.nbytes)
cuda.memcpy_htod(n_gpu, n)
X_gpu = cuda.mem_alloc(x.nbytes)
cuda.memcpy_htod(X_gpu, x)
Y_gpu = cuda.mem_alloc(y.nbytes)
cuda.memcpy_htod(Y_gpu, y)
SAXPY(n_gpu, a_gpu, X_gpu, Y_gpu, grid=grid, block=block_size)
#retrieve outputs
cuda.memcpy_dtoh(y,Y_gpu)
#free gpu memory
X_gpu.free(); Y_gpu.free()
a_gpu.free(); n_gpu.free()
print Yout
print a*pos_x + pos_y_o
print Yout - (a*pos_x + pos_y_o) ## compare the two
if __name__ == '__main__':
main()
------- output from command line -----------
Traceback (most recent call last):
File "as_cuda_loop.py", line 60, in <module>
main()
File "as_cuda_loop.py", line 49, in main
cuda.memcpy_dtoh(y,Y_gpu)
pycuda._driver.LogicError: cuMemcpyDtoH failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: invalid/unknown error code
-------------------------------
I'm running windows 8.1, PyCUDA 2013.1.1, CUDA 6.0. I have absolutely no
idea what's going wrong here - can anyone help?
Thanks
James
Hello.
I've been packaging PyCUDA for Debian.
I run all the tests to ensure that package works on Python 2
and Python 3. All tests pass except for on from test_driver.py:
$ python test_driver.py
============================= test session starts
==============================
platform linux2 -- Python 2.7.5 -- pytest-2.3.5
collected 21 items
test_driver.py ........F............
=================================== FAILURES
===================================
_____________________ TestDriver.test_register_host_memory
_____________________
args = (<test_driver.TestDriver instance at 0x24e7d88>,), kwargs = {}
pycuda = <module 'pycuda' from
'/usr/lib/python2.7/dist-packages/pycuda/__init__.pyc'>
ctx = <pycuda._driver.Context object at 0x2504488>
clear_context_caches = <function clear_context_caches at 0x1dbf848>
collect = <built-in function collect>
def f(*args, **kwargs):
import pycuda.driver
# appears to be idempotent, i.e. no harm in calling it more than
once
pycuda.driver.init()
ctx = make_default_context()
try:
assert isinstance(ctx.get_device().name(), str)
assert isinstance(ctx.get_device().compute_capability(),
tuple)
assert isinstance(ctx.get_device().get_attributes(), dict)
> inner_f(*args, **kwargs)
/usr/lib/python2.7/dist-packages/pycuda/tools.py:434:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
self = <test_driver.TestDriver instance at 0x24e7d88>
@mark_cuda_test
def test_register_host_memory(self):
if drv.get_version() < (4,):
from py.test import skip
skip("register_host_memory only exists on CUDA 4.0 and
later")
import sys
if sys.platform == "darwin":
from py.test import skip
skip("register_host_memory is not supported on OS X")
a = drv.aligned_empty((2**20,), np.float64, alignment=4096)
> drv.register_host_memory(a)
E LogicError: cuMemHostRegister failed: invalid value
test_driver.py:559: LogicError
==================== 1 failed, 20 passed in 116.85 seconds
=====================
This test fails both on ION (GeForce 9400M, CC 1.1) and GeForce 460
(CC 2.1). I've compiled PyCUDA with gcc 4.8, run with kernel 3.9
and drivers 304.88.
Regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Forrest Pruitt <fpruitt0922(a)gmail.com> writes:
> This reeks of a permission issue. Check which users have access to
> /dev/nv*, and make sure that the user that Celery runs as also has
> access to those devices.
>
> Hope that helps,
> Andreas
>
> Just as a test, I made sure it was not a permisson issue by doing a chmod
> 777 on /dev/nv*. Still getting problems initializing Cuda in the context of
> django/celery.
Can you try and run (e.g. using Python's standard 'subprocess' module) a
CUDA C example code? If that works, then PyCUDA has nothing to do with
your problem. This might help you isolate your problem.
HTH,
Andreas
Hello,
after updating
i get the following errror runing a pycuda-program
import pycuda.autoinit
File "/usr/lib/python2.7/dist-packages/pycuda/autoinit.py", line 4,
in <module>
cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no device
Ernst
This has also been my experience when dealing with long-running programs that allocate large fractions of the GPU memory. However, I'm not sure why normal Python reference counting is insufficient to free GPU memory as soon as the Python object container goes out of scope.
The fact that gc.collect() fixes the problem suggests that there is a reference cycle associated with each GPU memory allocation, which is why garbage collection is required to free the memory. In my application, all of my GPU arrays were attributes in instances of a Python class, so I added a __del__ method to my class to call gc.collect() for me whenever an class instance was deallocated.
On Jul 23, 2014, at 11:59 AM, Matthias Lee <matthias.a.lee(a)gmail.com> wrote:
> Hi all,
>
> I noticed something interesting today.
> I am working on an image processing tool which loops several times over each of a series of images. Everything is done in place and I should not be growing my memory footprint between iterations.
>
> Now when I tracked the actual GPU memory consumption I found that I would ultimately I would run out of GPU memory (just a short excerpt): http://i.imgur.com/AjmmpEk.png
>
> I double and triple checked that everything is happening in place, started trying to delete GPU objects as soon as I'm finished with them to try to trigger the GC, but that only had limited success. I would expect the GC to kick in before the GPU runs out of memory..
>
> I then started manually calling gc.collect() every few iteration and suddenly everything started behaving and is now relatively stable. See here (note the scale difference): http://i.imgur.com/Zzq5YdC.png
>
> Is this normal? Is this a bug?
>
> Thanks,
>
> Matthias
>
> --
> Matthias Lee
> IDIES/Johns Hopkins University
> Performance @ Rational/IBM
>
> Matthias.A.Lee(a)gmail.com
> MatthiasLee(a)jhu.edu
> (320) 496 6293
>
> To know recursion, you must first know recursion.
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda
Hi Matthias,
Matthias Lee <matthias.a.lee(a)gmail.com> writes:
> I noticed something interesting today.
> I am working on an image processing tool which loops several times over
> each of a series of images. Everything is done in place and I should not be
> growing my memory footprint between iterations.
>
> Now when I tracked the actual GPU memory consumption I found that I would
> ultimately I would run out of GPU memory (just a short excerpt):
> http://i.imgur.com/AjmmpEk.png
>
> I double and triple checked that everything is happening in place, started
> trying to delete GPU objects as soon as I'm finished with them to try to
> trigger the GC, but that only had limited success. I would expect the GC to
> kick in before the GPU runs out of memory..
>
> I then started manually calling gc.collect() every few iteration and
> suddenly everything started behaving and is now relatively stable. See here
> (note the scale difference): http://i.imgur.com/Zzq5YdC.png
>
> Is this normal? Is this a bug?
First off, you can force-free GPU memory using this, if all else fails:
http://documen.tician.de/pycuda/driver.html#pycuda.driver.DeviceAllocation.…
Next, the behavior you're seeing means that a reference cycle of some
sort must exist within the object that's holding on to you GPU
memory. (Could be PyCUDA's GPUArray--it has happened before, but I'd
consider it a bug. I'll go poke around if it is. Let me know.) A
reference cycle means that Python will only free these objects upon a GC
run (since the refcount will never return to zero on its own). Unless
told explicitly otherwise (see above), PyCUDA will only free GPU memory
once the associated Python handle objects have been pronounced unused by
the Python runtime.
PyCUDA is smart enough to force a GC run before declaring defeat on a
memory allocation, so if you're the only one using a GPU, this shouldn't
pose an issue. If you're using other libraries that also (try to)
allocate GPU memory, then this might pose an issue, because they *won't*
know to try GC'ing.
Hope that helps,
Andreas
Hi all,
I noticed something interesting today.
I am working on an image processing tool which loops several times over
each of a series of images. Everything is done in place and I should not be
growing my memory footprint between iterations.
Now when I tracked the actual GPU memory consumption I found that I would
ultimately I would run out of GPU memory (just a short excerpt):
http://i.imgur.com/AjmmpEk.png
I double and triple checked that everything is happening in place, started
trying to delete GPU objects as soon as I'm finished with them to try to
trigger the GC, but that only had limited success. I would expect the GC to
kick in before the GPU runs out of memory..
I then started manually calling gc.collect() every few iteration and
suddenly everything started behaving and is now relatively stable. See here
(note the scale difference): http://i.imgur.com/Zzq5YdC.png
Is this normal? Is this a bug?
Thanks,
Matthias
--
Matthias Lee
IDIES/Johns Hopkins University
Performance @ Rational/IBM
Matthias.A.Lee(a)gmail.com
MatthiasLee(a)jhu.edu
(320) 496 6293
To know recursion, you must first know recursion.
This reeks of a permission issue. Check which users have access to
/dev/nv*, and make sure that the user that Celery runs as also has
access to those devices.
Hope that helps,
Andreas
Just as a test, I made sure it was not a permisson issue by doing a chmod
777 on /dev/nv*. Still getting problems initializing Cuda in the context of
django/celery.
On Tue, Jul 15, 2014 at 12:00 PM, <pycuda-request(a)tiker.net> wrote:
> Send PyCUDA mailing list submissions to
> pycuda(a)tiker.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.tiker.net/listinfo/pycuda
> or, via email, send a message with subject or body 'help' to
> pycuda-request(a)tiker.net
>
> You can reach the person managing the list at
> pycuda-owner(a)tiker.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of PyCUDA digest..."
>
>
> Today's Topics:
>
> 1. Re: Trouble Getting Set Up (Andreas Kloeckner)
> 2. Re: Problem with pow (elodw)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 14 Jul 2014 17:18:59 -0500
> From: Andreas Kloeckner <lists(a)informa.tiker.net>
> To: pycuda(a)tiker.net
> Subject: Re: [PyCUDA] Trouble Getting Set Up
> Message-ID: <53C45753.3090600(a)informa.tiker.net>
> Content-Type: text/plain; charset=utf-8; format=flowed
>
> Am 14.07.2014 um 16:47 schrieb Forrest Pruitt:
> > The frustrating thing is that in a stand-alone python shell, pycuda
> > behaves appropriately. It is only in a Celery process that things
> > break down.
> >
> > Any help here would be appreciated!
> >
> > If I need to provide any more information, just let me know!
> >
> This reeks of a permission issue. Check which users have access to
> /dev/nv*, and make sure that the user that Celery runs as also has
> access to those devices.
>
> Hope that helps,
> Andreas
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 15 Jul 2014 08:54:49 +0200
> From: elodw <apo(a)pdauf.de>
> To: pycuda(a)tiker.net
> Subject: Re: [PyCUDA] Problem with pow
> Message-ID: <53C4D039.6000404(a)pdauf.de>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
>
> On 14.07.2014 20:49, Andreas Kloeckner wrote:
> > elodw <apo(a)pdauf.de> writes:
> >
> > Probably neither, since it's a two-index loop. I.e. you should probably
> > write that one from scratch to be able to map both i and j to CUDA axes.
> >
> > Hope that helps,
> > Andreas
> Thank You Andreas,
>
> perhaps You know a Source in the Net ?
>
> Thanks in advance
> Ernst
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda
>
>
> ------------------------------
>
> End of PyCUDA Digest, Vol 73, Issue 10
> **************************************
>
Dear Graham,
Graham Mills <13gm10(a)queensu.ca> writes:
> Thanks for the quick reply.I was able to track that problem down to a syntax error, but I've run into a problem launching kernels on gpu arrays allocated from the device memory pool, as in the following code:
>
> import numpy
> import pycuda.autoinit
> import pycuda.gpuarray as gpua
> from pycuda.tools import DeviceMemoryPool as DMP
>
> pool=DMP()
>
> test=gpua.GPUArray((1,2),dtype=numpy.float32,allocator=pool.allocate)
>
> print(test)
>
> # [[ nan nan]]
>
> print( int( test.gpudata))
>
> # 30066083328
>
> print( test.allocator)
>
> # <bound method DeviceMemoryPool.allocate of <pycuda._driver.DeviceMemoryPool object at 0x2909e68>>
>
> # attempting to launch a kernel returns an error
> test.fill(3.)
>
>
> The error I get is as follows (depending on the kernel and the way it is launched)
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python3.2/dist-packages/pycuda-2013.1.1-py3.2-linux-x86_64.egg/pycuda/gpuarray.py", line 516, in fill
> value, self.gpudata, self.mem_size)
> File "/usr/local/lib/python3.2/dist-packages/pycuda-2013.1.1-py3.2-linux-x86_64.egg/pycuda/driver.py", line 475, in function_prepared_async_call
> arg_buf = pack(func.arg_format, *args)
> struct.error: required argument is not an integer
>
>
> Manually specifying the allocator in the same way with pycuda.driver.mem_alloc seems to work fine, though. Do you know what it might be?
Sorry for the extremely belated reply, but for the record, I am unable
to reproduce this issue.
Andreas