Hi all,
I've just released PyCUDA version 2011.1. This is, once again, a rather
big release. A detailed list of changes is below. In the same spirit as
PyOpenCL (whose 2011.1 release happened yesterday), I'll try to move to
smaller, more frequent releases in the future.
Have fun, and let me know if there are any issues.
Andreas
Detailed list of changes in 2011.1:
* Add support for CUDA 3.0-style OpenGL interop. (thanks to Tomasz Rybak)
* Add pycuda.driver.Stream.wait_for_event().
* Add range and slice keyword argument to pycuda.elementwise.ElementwiseKernel.__call__().
* Document preamble constructor keyword argument to pycuda.elementwise.ElementwiseKernel.
* Add vector types, see pycuda.gpuarray.vec.
* Add pycuda.scan.
* Add support for new features in CUDA 4.0.
* Add pycuda.gpuarray.GPUArray.strides, pycuda.gpuarray.GPUArray.flags. Allow the creation of arrys in C and Fortran order.
* Adopt stateless launch interface from CUDA, deprecate old one.
* Add CURAND wrapper. (with work by Tomasz Rybak)
* Add pycuda.compiler.DEFAULT_NVCC_FLAGS.
Hello!
I'm a bit of a pycuda newb, and today I dove into the source for the first time.
Essentially, I was trying to implement a numpy-style argmax argument
using a custom reduction kernel:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html
I thought this would be a pretty simple procedure, and the code I tried was:
maxloc_reduction_k = ReductionKernel(
numpy.int32,
arguments="float *x",
neutral="0",
map_expr="i",
reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")
What I discovered was that pycuda really didn't appreciate me using
the specified arguments (in this case, "float *x") in the reduce_expr
part.
Looking at reduction.py (in the latest release code, 0.94.2) , I
couldn't really see a reason for why this shouldn't be possible. I
added my own parameter to the reduction kernel's constructor ("hack")
to allow my original parameters to be passed to both _stage1 and
_stage2. This allowed me to do this:
maxloc_reduction_k = ReductionKernel(
numpy.int32,
hack=True,
arguments="float *x",
neutral="0",
map_expr="i",
reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b")
This way, my hacked changes would only affect some reduction kernels.
I'm not totally clear on how to create a patch. Hopefully I didn't
screw it up. I don't really know if I'm allowed to send attachments to
a mailing group... so I used pastebin. Here's my changes:
http://pastebin.com/R5i5JveM
Just in case I messed up creating a patch, here's my full modified
version of reduction.py: http://pastebin.com/WExgBTQ9
Is there a better way to be doing this?
I considered using the current dev-build of pycuda to access float2,
and store my array value in x and my index in y, but the reduction
code hasn't been updated to properly handle float2's (you can't just
assert a float to another volatile type -- you have to copy the .x and
the .y).
If this is functionality that is actually missing, I'll try to
actually do the work to implement this in a non-hackish way.
Thanks,
Ryan Marcus
Hi,
I would like to report a possible bug with pycuda. To show the issue
I've modified demo.py adding by passing a option to the nvcc compiler:
mod = SourceModule("""
__global__ void doublify(float *a)
{
int idx = threadIdx.x + threadIdx.y*4;
a[idx] *= 2;
}
""", options=['-ccbin /usr/bin'])
However I get the following error when running demo.py:
python demo.py
*** compiler options is ['-ccbin /usr/bin', '-arch', 'sm_12',
'-I/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda']
Traceback (most recent call last):
File "demo.py", line 22, in <module>
""", options=['-ccbin /usr/bin'])
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 239, in __init__
arch, code, cache_dir, include_dirs)
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 229, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File
"/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/compiler.py", line 108, in compile_plain
cmdline, stdout=stdout, stderr=stderr)
pycuda.driver.CompileError: nvcc compilation of /tmp/tmp4keOYP/kernel.cu
failed
[command: nvcc --cubin -ccbin /usr/bin -arch sm_12
-I/usr/local/lib/python2.7/dist-packages/pycuda-0.94.2-py2.7-linux-x86_64.egg/pycuda/../include/pycuda kernel.cu]
[stderr:
nvcc fatal : Unknown option 'ccbin /usr/bin'
]
Notice that the nvcc command listed in the output is valid and will
compile the kernel correctly, yet I receiving an error from pycuda.
I would welcome any suggestions or workarounds, I'm using pycuda 0.94.2
from source with python 2.7.1+ on Ubuntu 11.04.
Thanks,
- Scott
Hello.
I attach patch updating pycuda.tools.DeviceData and
pycuda.tools.OccupancyRecord
to take new devices into consideration. I have tried to maintain "style" of
those classes
and introduced changes only when necessary. I have done changes using my old
notes
and NVIDIA Occupancy Calculator. Unfortunately I currently do not have
access to Fermi
to test those fully.
Best regards.
Tomasz Rybak
This is a sorta-cross-post from stack overflow:
http://stackoverflow.com/questions/6892280/how-do-i-diagnose-a-cuda-launch-…
I'm getting an out-of-resources error when trying to launch a CUDA
kernel (through PyCUDA), and I'm wondering if it's possible to get the
system to tell me which resource it is that I'm short on. Obviously
the system knows what resource has been exhausted, I just want to
query that as well.
I've used the occupancy calculator, and everything seems okay, so
either there's a corner case not covered, or I'm using it wrong. I
know it's not registers (which seems to be the usual culprit) because
I'm using < 70 and it still fails with a 1x1x1 block and 1x1 grid on a
CC 2.1 device.
Thanks for any help. I posted a thread on the NVidia boards:
http://forums.nvidia.com/index.php?showtopic=206261&st=0
But got no responses. If the answer is "you can't ask the system for
that information" that would be nice to know too (sort of... ;).
Thanks,
Eli
On Tue, 12 Jul 2011 19:06:52 -0700, Eli Stevens (Gmail) wrote:
> Followup:
>
> What is class pycuda.driver.Memcpy3D.src_height supposed to be set
> to?
> I can't seem to find a valid value.
See test_3d_texture() in test/test_wrapper.py.
Andreas
Hi,
Im using pycuda on arch linux with CUDA 4 and have had problems with the
gcc 4.4 requirement (with arch using gcc-4.6 as default and providing
gcc-4.4 as a separate package).
I am aware of many possible solutions (which all work for me) but it
would be quite nice to change the gcc that nvcc uses via 'nvcc
-ccbin=/usr/bin/gcc-4.4' etc. You can obviously pass this to the compile
function in the wrapper but it would be nice to be able to add some
options at install time (via configure.py etc.) so a PKGBUILD can
configure pycuda to 'work out the box'.
Im not sure if this is already possible or if it would be feasible?
regards,
Ben Aylott
On Mon, 25 Jul 2011 17:42:10 -0400, Anthony LaTorre <tlatorre9(a)gmail.com> wrote:
Non-text part: multipart/mixed
Non-text part: multipart/alternative
> As of CUDA 4.0 it is possible to share a CUDA context across threads; is it
> also possible to do this in the latest release of pyCUDA?
If this is possible with the old-style push/pop API, then yes, otherwise
no. Since I want PyCUDA to remain backward-compatible for a few releases
of CUDA, I haven't switched it to CUDA 4's new context management.
HTH,
Andreas