I've just released PyCUDA version 2011.1. This is, once again, a rather
big release. A detailed list of changes is below. In the same spirit as
PyOpenCL (whose 2011.1 release happened yesterday), I'll try to move to
smaller, more frequent releases in the future.
Have fun, and let me know if there are any issues.
Detailed list of changes in 2011.1:
* Add support for CUDA 3.0-style OpenGL interop. (thanks to Tomasz Rybak)
* Add pycuda.driver.Stream.wait_for_event().
* Add range and slice keyword argument to pycuda.elementwise.ElementwiseKernel.__call__().
* Document preamble constructor keyword argument to pycuda.elementwise.ElementwiseKernel.
* Add vector types, see pycuda.gpuarray.vec.
* Add pycuda.scan.
* Add support for new features in CUDA 4.0.
* Add pycuda.gpuarray.GPUArray.strides, pycuda.gpuarray.GPUArray.flags. Allow the creation of arrys in C and Fortran order.
* Adopt stateless launch interface from CUDA, deprecate old one.
* Add CURAND wrapper. (with work by Tomasz Rybak)
* Add pycuda.compiler.DEFAULT_NVCC_FLAGS.
I am implementing an iterative algorithm in pyCUDA.
Inside the while loop I need to do a reduction,I have implemented it with a
gpuarray and it works nicely but i think that using a gpuarray inside a
loop is a bad idea because of the allocation and deallocation overhead.
So I was thinking that I could use the reduction example from the SDK.
The problem is that i need only the first value of the reduced array
returned to the host.
I couldn't find anywhere in the documentation how to do that.
Thank you in advance.
got it to work finally,I had a stupid mistake in it and that is why it
2011/11/22 Apostolis Glenis <apostglen46(a)gmail.com>
> I am trying to use matrix transpose as a part of a project.
> The same code that i used to rotate an image (and it worked) doesn't works
> on a matrix.
> The code is pretty much taken from the SDK.
> here is a self contained file that doesn't work on my system.
> Thank you in advance
I am trying to use matrix transpose as a part of a project.
The same code that i used to rotate an image (and it worked) doesn't works
on a matrix.
The code is pretty much taken from the SDK.
here is a self contained file that doesn't work on my system.
Thank you in advance
There's a bug in python (http://bugs.python.org/issue3905) that affects PyTools when PyCUDA creates a subprocess to run nvcc. Fortunately this is only the case for Windows machines, and specifically when PyCUDA is run in a UI application (shell) which itself is run from the console; something that's probably not very common. Unfortunately for me, that is _exactly_ the scenario for the IDE/shell that we use where I work (Windows XP, Python 2.6.5).
Although the python bug remains open, there's a simple workaround which involves passing subprocess.PIPE as the argument to _all_ the std* keywords when using subprocess.Popen in prefork.py. I've attached a patch, but let me know if this is not the correct forum to do so, or if there's an alternative process (e.g. a git pull request).
One simple way of reproducing this is to use IDLE:
1. From the Windows console run 'pythonw idlelib\idle.pyw -n', then in IDLE:
>>> # This fails
>>> from subprocess import Popen, PIPE
>>> Popen(['nvcc', '--version'], stdout=PIPE)
>>> # This works
>>> Popen(['nvcc', '--version'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
2. Or run hello_gpu.py in that IDLE window
And you should get an error like the following
File "C:\home\dbo\argh\ext\win32_vc9\lib\python2.6\subprocess.py", line 773, in _make_inheritable
WindowsError: [Error 6] The handle is invalid
Hope that helps, and it would be good to get this into the official PyTools repo/distribution so that we don't have to maintain a patched version here! :)
Risk Technology, BoAML
This message w/attachments (message) is intended solely for the use of the intended recipient(s) and may contain information that is privileged, confidential or proprietary. If you are not an intended recipient, please notify the sender, and then please delete and destroy all copies and attachments, and be advised that any review or dissemination of, or the taking of any action in reliance on, the information contained in or attached to this message is prohibited.
Unless specifically indicated, this message is not an offer to sell or a solicitation of any investment products or other financial product or service, an official confirmation of any transaction, or an official statement of Sender. Subject to applicable law, Sender may intercept, monitor, review and retain e-communications (EC) traveling through its networks/systems and may produce any such EC to regulators, law enforcement, in litigation and as required by law.
The laws of the country of each sender/recipient may impact the handling of EC, and EC may be archived, supervised and produced in countries other than the country in which you are located. This message cannot be guaranteed to be secure or free of errors or viruses.
References to "Sender" are references to any subsidiary of Bank of America Corporation. Securities and Insurance Products: * Are Not FDIC Insured * Are Not Bank Guaranteed * May Lose Value * Are Not a Bank Deposit * Are Not a Condition to Any Banking Service or Activity * Are Not Insured by Any Federal Government Agency. Attachments that are part of this EC may have additional important disclosures and disclaimers, which you should read. This message is subject to terms available at the following link:
http://www.bankofamerica.com/emaildisclaimer. By messaging with Sender you consent to the foregoing.
On Sun, 20 Nov 2011 20:45:53 +1100, <Xiaodong.Song(a)csiro.au> wrote:
> I have a forest growth model written in Python. Now I am doing a time
> consuming simulation using this model (e.g. run many instances of this
> model, and each instance will consume long time). And I am wondering
> whether it's possible to call this model using CUDA, just like a
> parallel computing mode.
> In conclusion, my intention is not to alter this forest model and call
> it directly from CUDA, is it possible?
1) Please respect my request of having such questions asked on the list
and not of me privately. The list is cc'd now.
2) Even a quick glance at the documentation would have revealed that,
no, this is not possible--unless your code fulfills some *very* specific
conditions. But do look at Copperhead by Catanzaro/Garland, which
compiles a much wider class of codes from Python to CUDA.
Sorry for the noise. I casted correctly my param, but then I made a
division. So the type changed. For those that looked in the code this
is the correct line should be in it:
args += [numpy.intc(i / dtype.itemsize) for i in gpu_val.strides]
args += [numpy.intc(i) / dtype.itemsize for i in gpu_val.strides]
2011/11/18 Frédéric Bastien <nouiz(a)nouiz.org>:
> I have a small example that when run, crash with this error:
> fct(*args, **d)
> File "/u/bastienf/repos/pycuda.git/build.fc9/lib.linux-x86_64-2.5/pycuda/driver.py",
> line 187, in function_call
> LaunchError: cuLaunchGrid failed: launch out of resources
> The problem is that I use only 1 thread per block and only 1 block.
> The code in the gpu function is very simple: "Z = 0;". I think I
> pass correctly the parameters when calling the gpu function. Do
> someone have any idea what could be wrong? I join the example of the
> Frédéric Bastien
I have a small example that when run, crash with this error:
line 187, in function_call
LaunchError: cuLaunchGrid failed: launch out of resources
The problem is that I use only 1 thread per block and only 1 block.
The code in the gpu function is very simple: "Z = 0;". I think I
pass correctly the parameters when calling the gpu function. Do
someone have any idea what could be wrong? I join the example of the
On Thu, 17 Nov 2011 22:44:47 +1300, Igor <rychphd(a)gmail.com> wrote:
> How did you decide about the Modified Reduction.py that Ryan published
> here: http://lists.tiker.net/pipermail/pycuda/2011-June/003196.html
> It promises to do exactly what I want, yet I'm not sure it was
> committed or can still be applied as a patch to the more recent
> release of PyCUDA ReductionKernel. I tried anyway and got compilation
> problems from nvcc, identifier "x" is undefined,
> maxloc = ReductionKernel(dtype_out=numpy.int32, neutral="0",
> reduce_expr="(x[(int)a] > x[(int)b]) ? (int)a : (int)b", map_expr="i",
> arguments="float *x")
What I'd like to do is to allow registration of custom dtypes so that
dtype_to_ctype knows what C type names to spit out. This shouldn't be
much work and will have the benefit of working across all PyCUDA
routines. I'll try to give it a shot tonight.