Hi all,
I've just released PyCUDA version 2011.1. This is, once again, a rather
big release. A detailed list of changes is below. In the same spirit as
PyOpenCL (whose 2011.1 release happened yesterday), I'll try to move to
smaller, more frequent releases in the future.
Have fun, and let me know if there are any issues.
Andreas
Detailed list of changes in 2011.1:
* Add support for CUDA 3.0-style OpenGL interop. (thanks to Tomasz Rybak)
* Add pycuda.driver.Stream.wait_for_event().
* Add range and slice keyword argument to pycuda.elementwise.ElementwiseKernel.__call__().
* Document preamble constructor keyword argument to pycuda.elementwise.ElementwiseKernel.
* Add vector types, see pycuda.gpuarray.vec.
* Add pycuda.scan.
* Add support for new features in CUDA 4.0.
* Add pycuda.gpuarray.GPUArray.strides, pycuda.gpuarray.GPUArray.flags. Allow the creation of arrys in C and Fortran order.
* Adopt stateless launch interface from CUDA, deprecate old one.
* Add CURAND wrapper. (with work by Tomasz Rybak)
* Add pycuda.compiler.DEFAULT_NVCC_FLAGS.
Hello!
I'm a bit of a pycuda newb, and today I dove into the source for the first time.
Essentially, I was trying to implement a numpy-style argmax argument
using a custom reduction kernel:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html
I thought this would be a pretty simple procedure, and the code I tried was:
maxloc_reduction_k = ReductionKernel(
numpy.int32,
arguments="float *x",
neutral="0",
map_expr="i",
reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")
What I discovered was that pycuda really didn't appreciate me using
the specified arguments (in this case, "float *x") in the reduce_expr
part.
Looking at reduction.py (in the latest release code, 0.94.2) , I
couldn't really see a reason for why this shouldn't be possible. I
added my own parameter to the reduction kernel's constructor ("hack")
to allow my original parameters to be passed to both _stage1 and
_stage2. This allowed me to do this:
maxloc_reduction_k = ReductionKernel(
numpy.int32,
hack=True,
arguments="float *x",
neutral="0",
map_expr="i",
reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")
This way, my hacked changes would only affect some reduction kernels.
I'm not totally clear on how to create a patch. Hopefully I didn't
screw it up. I don't really know if I'm allowed to send attachments to
a mailing group... so I used pastebin. Here's my changes:
http://pastebin.com/R5i5JveM
Just in case I messed up creating a patch, here's my full modified
version of reduction.py: http://pastebin.com/WExgBTQ9
Is there a better way to be doing this?
I considered using the current dev-build of pycuda to access float2,
and store my array value in x and my index in y, but the reduction
code hasn't been updated to properly handle float2's (you can't just
assert a float to another volatile type -- you have to copy the .x and
the .y).
If this is functionality that is actually missing, I'll try to
actually do the work to implement this in a non-hackish way.
Thanks,
Ryan Marcus
Hi,
I would like to report a possible bug with pycuda. To show the issue
I've modified demo.py adding by passing a option to the nvcc compiler:
mod = SourceModule("""
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx] *= 2;
}
""", options=['-ccbin /usr/bin'])
However I get the following error when running demo.py:
python demo.py
*** compiler options is ['-ccbin /usr/bin', '-arch', 'sm_12',
'-I/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda']
Traceback (most recent call last):
File "demo.py", line 22, in <module>
""", options=['-ccbin /usr/bin'])
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 239, in __init__
arch, code, cache_dir, include_dirs)
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 229, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 108, in compile_plain
cmdline, stdout=stdout, stderr=stderr)
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp4keOYP/kernel.cu
failed
[command: nvcc --cubin -ccbin /usr/bin -arch sm_12
-I/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda kernel.cu]
[stderr:
nvcc fatal : Unknown option 'ccbin /usr/bin'
]
Notice that the nvcc command listed in the output is valid and will
compile the kernel correctly, yet I receiving an error from pycuda.
I would welcome any suggestions or workarounds, I'm using pycuda 0.94.2
from source with python 2.7.1+ on Ubuntu 11.04.
Thanks,
- Scott
Hello.
I attach patch updating pycuda.tools.DeviceData and
pycuda.tools.OccupancyRecord
to take new devices into consideration. I have tried to maintain "style" of
those classes
and introduced changes only when necessary. I have done changes using my old
notes
and NVIDIA Occupancy Calculator. Unfortunately I currently do not have
access to Fermi
to test those fully.
Best regards.
Tomasz Rybak
Hi all,
I'm observing the following behavior with latest (git-fetched today)
pycuda and opencl versions on Snow Leopard 10.6.4:
$ python
>>> import pycuda.driver
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/pyopencl-0.92beta-py2.6-macosx-10.6-i386.egg/pyopencl/__init__.py",
line 3, in <module>
import pyopencl._cl as _cl
AttributeError: 'NoneType' object has no attribute '__dict__'
$ python
>>> import pyopencl
>>> import pycuda.driver
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/pycuda-0.94rc-py2.6-macosx-10.6-i386.egg/pycuda/driver.py",
line 1, in <module>
from _driver import *
AttributeError: 'NoneType' object has no attribute '__dict__'
This worked with stable versions. Does anyone know why is this happening?
(One may ask why do I need both libraries in the same program. I have
the set of tests for my module, which can use both Cuda and OpenCL,
and it is convenient to run all the tests using the single file.
Although it is not a critical issue, I'm just curious).
Best regards,
Bogdan
This is with the version from the trunk
(7804dc6d1b40b506b02a5f7a0b7bde8771f1446c).
import pycuda.driver as cuda
import pycuda.compiler
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
from pycuda.elementwise import ElementwiseKernel
zero_kernel = ElementwiseKernel(
"float *out",
"out[i] = pdf()",
"test",
preamble=
"""
__device__ float pdf()
{
return 0;
}
""")
size = 100
out_gpu = gpuarray.empty(size, float)
zero_kernel(out_gpu)
print all(out_gpu.get() == 0)
print all(out_gpu.get()[:size/2] == 0)
Produces output (for varying size):
False
True
The second half is the same as before the elementwise kernel call.
Oddly enough, demo_elementwise.py does not seem to produce this bug.
Hi,
I always got these warnings while compiling, but now it seems they
lead to a compile error:
pycuda.driver.CompileError: nvcc said it demoted types in source code
it compiled--this is likely not what you want.
My function is an elementwise kernel (so c code), all input arguments
and variables are floats. In the function I use logf, expf, sinf,
floorf, fmaxf etc. Would any of these return a double? How can I find
the perpetrator?
Thanks,
Thomas
OK, I figured it out. It seems that either (a) standard algebraic
computations such as 2*a will be cast to double in some cases or (b)
this happens in cases like powf(a, 2). I replaced all numbers with a
float number (e.g. 2.0f) and that error is not raised anymore.
-Thomas
On Wed, Jun 29, 2011 at 11:22 AM, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> Hi,
>
> I always got these warnings while compiling, but now it seems they
> lead to a compile error:
> pycuda.driver.CompileError: nvcc said it demoted types in source code
> it compiled--this is likely not what you want.
>
> My function is an elementwise kernel (so c code), all input arguments
> and variables are floats. In the function I use logf, expf, sinf,
> floorf, fmaxf etc. Would any of these return a double? How can I find
> the perpetrator?
>
> Thanks,
> Thomas
>
Hi,
I want to run element-wise computations on different parts of an
array. Loading each part of the array to device mem when needed turned
out to use up a lot of time and not really speed things up compared to
cpu. Instead, I want to once load the data array into device mem and
provide pointers to which elements to look at (I do have the numpy
view/slice of the array). I looked into different ways of doing this
but can't seem to find the right approach, any help would be
appreciated.
ElementwiseKernel seems to support range and slicing now, however, my
code is (cuda) c and I import it as a SourceModule which probably
means I can't use the ElementwiseKernel approach.
-Thomas
Hello everybody -
I am pleased to announce my first tentative bindings for CUDA in Perl.
They are available on Github at
https://github.com/run4flat/perl-CUDA-Minimal. I will push them to
CPAN after I've had a chance to get some feedback.
A few months back I asked about the design of PyCUDA, and I should
note that these bindings do *not* take those observations into
account. I wrote these bindings for my research back in January and I
have published them in hopes of creating some enthusiasm and
developing a group of perl hackers who would like to help me take on
the task of the full Driver API.
Thanks!
David
P.S. Don't worry, I won't spam you guys with any more PerlCUDA
announcements. Very few Perl hackers care about CUDA and I just wanted
to make an announcement to a group of people who would actually find
it interesting. :-)
P.P.S I've written a tutorial at
http://blogs.perl.org/users/david_mertens/2011/06/perls-first-real-cuda-bin…,
in case you want to see how it works.
--
Sent via my carrier pigeon.