Hello.
I've been packaging PyCUDA for Debian.
I run all the tests to ensure that package works on Python 2
and Python 3. All tests pass except for on from test_driver.py:
$ python test_driver.py
============================= test session starts
==============================
platform linux2 -- Python 2.7.5 -- pytest-2.3.5
collected 21 items
test_driver.py ........F............
=================================== FAILURES
===================================
_____________________ TestDriver.test_register_host_memory
_____________________
args = (<test_driver.TestDriver instance at 0x24e7d88>,), kwargs = {}
pycuda = <module 'pycuda' from
'/usr/lib/python2.7/dist-packages/pycuda/__init__.pyc'>
ctx = <pycuda._driver.Context object at 0x2504488>
clear_context_caches = <function clear_context_caches at 0x1dbf848>
collect = <built-in function collect>
def f(*args, **kwargs):
import pycuda.driver
# appears to be idempotent, i.e. no harm in calling it more than
once
pycuda.driver.init()
ctx = make_default_context()
try:
assert isinstance(ctx.get_device().name(), str)
assert isinstance(ctx.get_device().compute_capability(),
tuple)
assert isinstance(ctx.get_device().get_attributes(), dict)
> inner_f(*args, **kwargs)
/usr/lib/python2.7/dist-packages/pycuda/tools.py:434:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
self = <test_driver.TestDriver instance at 0x24e7d88>
@mark_cuda_test
def test_register_host_memory(self):
if drv.get_version() < (4,):
from py.test import skip
skip("register_host_memory only exists on CUDA 4.0 and
later")
import sys
if sys.platform == "darwin":
from py.test import skip
skip("register_host_memory is not supported on OS X")
a = drv.aligned_empty((2**20,), np.float64, alignment=4096)
> drv.register_host_memory(a)
E LogicError: cuMemHostRegister failed: invalid value
test_driver.py:559: LogicError
==================== 1 failed, 20 passed in 116.85 seconds
=====================
This test fails both on ION (GeForce 9400M, CC 1.1) and GeForce 460
(CC 2.1). I've compiled PyCUDA with gcc 4.8, run with kernel 3.9
and drivers 304.88.
Regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hello.
When updating packages I've noticed strange difference.
Both PyCUDA and PyOpenCL contain install_requires=["pytools']
in setup.py. But while PyOpenCL depends on pytools 2014.2,
PyCUDA depends on pytools 2011.2. Is this a typo, or really
I can put such a dependency into Debian package?
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Bogdan Opanchuk <mantihor(a)gmail.com> writes:
> Hi Andreas,
>
> Unfortunately, it does not quite work. The script from my first
> message gives the following error:
>
> Traceback (most recent call last):
> File "t.py", line 34, in <module>
> get_second(dest, pair, block=(400,1,1), grid=(1,1))
> File "/Users/bogdan/.pyenv/versions/3.4.0/lib/python3.4/site-packages/pycuda/driver.py",
> line 365, in function_call
> handlers, arg_buf = _build_arg_buf(args)
> File "/Users/bogdan/.pyenv/versions/3.4.0/lib/python3.4/site-packages/pycuda/driver.py",
> line 147, in _build_arg_buf
> return handlers, pack(format, *arg_data)
> struct.error: bad char in struct format
Next attempt now in git.
Andreas
Dear Danny,
Daniel Jeck <djeck1(a)jhmi.edu> writes:
> My name is Danny Jeck. I don't really want to subscribe to the mailing list for pycuda, but I thought I should point out to you that the following code creates an error
>
>
> import pycuda.gpuarray as gpuarray
>
> import pycuda.driver as cuda
>
> import pycuda.autoinit
>
> import numpy
>
>
> a_gpu = gpuarray.to_gpu(numpy.zeros((100,100)))
>
>
> print a_gpu>0
>
>
> The error is because 0 (or a float or whatever) doesn't have a shape, and if it does it isn't the same shape as the array. If this were a numpy array, the 0 would be "broadcast" to the rest of the larger array for comparison. You may want to include this in future versions.
Your code works for me using current git (and I'm quite sure it should
also work with the latest release). What version do you have?
>>> import pycuda
>>> print pycuda.VERSION
Andreas
Hi Bogdan,
Bogdan Opanchuk <mantihor(a)gmail.com> writes:
> Thank you for the correction. Just curious, how come in PyOpenCL it
> works with rank-0 numpy arrays (which, in my opinion, is more
> intuitive than implicitly casting a rank-1 array to a scalar)? Is it
> just a difference between PyCUDA and PyOpenCL, or a limitation of CUDA
> itself?
I think I've patched this in git. Can you please give this a try and
report back?
Thanks!
Andreas
Bogdan Opanchuk <mantihor(a)gmail.com> writes:
> Hello,
>
> Does PyCUDA support struct arguments to kernels? From the Python side
> it means an element of an array with a struct dtype (a numpy.void
> object), e.g.
>
> dtype = numpy.dtype([('first', numpy.int32), ('second', numpy.int32)])
> pair = numpy.empty(1, dtype)[0]
>
> See https://gist.github.com/Manticore/15383a1ae367bfc6efe8 for an
> example of the functionality in question. It fails on ``get_second()``
> call complaining about the second argument (the structure).
>
> An analogous code in PyOpenCL works fine, but as far as I understand
> from its source, it uses a somewhat different mechanism of argument
> passing as compared to what is employed by PyCUDA.
The following minor variant works:
https://gist.github.com/inducer/88ac86874112b0e126ce
(The point is that the argument has to be an array to be recognized. A
'scalar' of a derived dtype will not work.)
Andreas
Alexander Bock <alexander.asp.bock(a)gmail.com> writes:
> I am creating some timing tests with PyCUDA for batch-loading an image
> sequence. I first tried timing a normal, synchronous transfer over global
> memory.
>
> Now I am looking to test pagelocked memory, specifically, I would like to
> test: Single-stream, pagelocked synchronous transfers, multi-stream,
> asynchronous pagelocked transfers and zero-copy memory using device mapped
> memory.
>
> For the first one, do I simply call pycuda.driver.memcpy_htod/dtoh using
> the pagelocked memory (I am using memflags=0 for creating the pagelocked
> memory, I assume it corresponds to cudaHostAllocDefault?) For the second, I
> would use the memcpy_(htod/dtoh)_async calls with more than one stream (my
> laptop supports concurrent kernels). For the final one, I would create my
> own context using pycuda.driver.make_context with the MAP_HOST flag,
> allocate the pagelocked memory using host_alloc_flags.DEVICE_MAP and call
> my kernel with the device pointer? Am I on the right track?
Yep, that sounds right.
In terms of documentation, the CUDA programming guide applies. One thing
to notice is to look at the "driver" interface, not the "runtime"
interface. The lowest layer of PyCUDA is just a coat of Python paint on that.
Example and docs contributions would be more than welcome!
Andreas
Hi,
I am creating some timing tests with PyCUDA for batch-loading an image
sequence. I first tried timing a normal, synchronous transfer over global
memory.
Now I am looking to test pagelocked memory, specifically, I would like to
test: Single-stream, pagelocked synchronous transfers, multi-stream,
asynchronous pagelocked transfers and zero-copy memory using device mapped
memory.
For the first one, do I simply call pycuda.driver.memcpy_htod/dtoh using
the pagelocked memory (I am using memflags=0 for creating the pagelocked
memory, I assume it corresponds to cudaHostAllocDefault?) For the second, I
would use the memcpy_(htod/dtoh)_async calls with more than one stream (my
laptop supports concurrent kernels). For the final one, I would create my
own context using pycuda.driver.make_context with the MAP_HOST flag,
allocate the pagelocked memory using host_alloc_flags.DEVICE_MAP and call
my kernel with the device pointer? Am I on the right track?
I had a hard time finding good tutorials/source (even in the PyCUDA
examples section), so I plan to submit some examples if I have time :)
Also...Excellent library!
Best regards,
Alexander
Hello,
Does PyCUDA support struct arguments to kernels? From the Python side
it means an element of an array with a struct dtype (a numpy.void
object), e.g.
dtype = numpy.dtype([('first', numpy.int32), ('second', numpy.int32)])
pair = numpy.empty(1, dtype)[0]
See https://gist.github.com/Manticore/15383a1ae367bfc6efe8 for an
example of the functionality in question. It fails on ``get_second()``
call complaining about the second argument (the structure).
An analogous code in PyOpenCL works fine, but as far as I understand
from its source, it uses a somewhat different mechanism of argument
passing as compared to what is employed by PyCUDA.
Best regards,
Bogdan
Hello.
Just to remind - next week I'll be in Berlin on AWS Summit.
If anyone wants to meet, I'm in Berlin since Wednesday 2014-05-14
till Saturday 2014-05-17.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak