Hi everybody,
I am new on PyCuda. I just installed everything on Windows XP and, from the installation log, I think that I did it properly. However, I tried to run the test files provided with pycuda and I get this error
Traceback (most recent call last):
File "C:\PyCuda\test\test_gpuarray.py", line 2, in <module>
import pycuda.autoinit
File "C:\PyCuda\pycuda\autoinit.py", line 1, in <module>
import pycuda.driver as cuda
File "C:\PyCuda\pycuda\driver.py", line 1, in <module>
from _driver import *
ImportError: No module named _driver
how can I solve it?
thanks and sorry for the newbieness of this post
den3b
_________________________________________________________________
I tuoi amici sempre a portata di clic, sul nuovo Web Messenger
http://www.windowslive.it/foto.aspx
I think I may be running into a memory leak using GPUarray. I have a
function using GPUarrays that is working stable on single calls. If I loop
this function within python from another script like this:
for i in xrange(m):
do_some_gpuarray_stuff()
I can watch the memory pointers of the gpuarrays increase until I get a
launch error... presumably due to lack of memory. ie I need gpu mem to free
upon exit of do_some_gpuarray_stuff(), so I can repeat same gpu calculation
many times on new data sets.
Can I manually free GPUarray instances? If not, can I somehow manually
remove all PyCUDA stuff from memory? like...
for i in xrange(m):
do_some_gpuarray_stuff()
de_init_pycuda_mem
I could not find this in the docs, and I understand everything is supposed
to be automagically handled by PyCUDA, but manually freeing will be an easy
confirmation/workaround for my problem. I know this can be done with
pycuda.driver completely manually, but gpu_array is already working nicely
and cleanly.... except for this leak. Any input from the experts would be
much appreciated.
Thanks much :)
Garrett Wright
I would like to create different views of a GPUArray. I have no idea how to
do it.
Let A be a 2D float32 c-continuous array. How do I create a 1D view of one
of its rows? I am not sure how to set gpudata in the view. An example would
be great. Is there a pointer arithmetic trick?
Is it possible to create a float32 view of a complex64 array? I am trying to
use gpuarray.multi_take_put to copy slices between float and complex arrays
but it requires dtypes to match.
Is using gpuarray.multi_take_put the best way to copy a slice of an linear
array?
Thanks,
Amir.
Hi Bryan, thanks for the quick reply. I understand the issue now.
So it looks like any devices of compute capability 1.x would not have
support for double precision (from section 5.1.1.1 of the NVIDIA CUDA
Programming Guide).
Does this seem like a reasonable way of checking for double precision
support in the test script? e.g., :
import pycuda.driver as cuda
dev=cuda.Device(0);
if dev.compute_capability() < (2,0):
no double precision support, don't run those tests...
On Fri, Apr 23, 2010 at 9:07 PM, Bryan Catanzaro
<bryan.catanzaro(a)gmail.com>wrote:
> The GPU in your MacBook doesn't support double precision, which is why
> these tests are failing.
> You're of course welcome to change the tests so that they check for double
> precision support before running them - that would probably help out others
> who end up in this situation.
>
> - bryan
>
> On Apr 23, 2010, at 5:41 PM, Chethan Pandarinath <
> chethan.pandarinath(a)gmail.com> wrote:
>
> Hi everybody,
>
> I've been working on installing PyCUDA on my MacBook (Snow Leopard). I
> think I'm close to having it working, it's been quite a long road: getting
> compatible versions of boost, setuptools, pytools... anyway, I think I'm
> past all that.
>
> I can now successfully run test_math.py and test_driver.py with no errors.
>
> I am having trouble, however, with test_gpuarray.py
>
> I'm attaching the output from running the test script, with the standard
> error appended as well.
> $ arch -i386 python test_gpuarray.py >test_gpuarray_output.txt
>
>
> There are a couple types of errors that I can see here:
>
> E AssertionError: (array(-4.2129004090721698e+36),
> -1.2509687918788644e+303, <type 'numpy.float64'>, 'min')
> test_gpuarray.py:328: AssertionError
>
> E assert array(-6.0786212321272663e+144) ==
> -2.6357594520767543e+301
> test_gpuarray.py:357: AssertionError
>
> E LaunchError: cuCtxPopCurrent failed: launch failed
> /Library/Python/2.6/site-packages/pycuda/tools.py:504: LaunchError
>
> E RuntimeError: make_default_context() wasn't able to create a
> context on any of the 1 detected devices
> /Library/Python/2.6/site-packages/pycuda/tools.py:216: RuntimeError
>
>
> To tell you the truth I don't know how to begin debugging this. If anyone
> can point me in the right direction, I'd greatly appreciate it. Willing to
> provide any info that would help...
>
> I'm running this on a MacBook, Snow Leopard (10.6.3), with an NVIDIA
> GeForce 9400M. Using Python 2.6.1.
>
> Thanks.
> Chethan
>
>
>
> --
> Chethan Pandarinath
> <chethan.pandarinath(a)gmail.com>chethan.pandarinath(a)gmail.com
>
> <test_gpuarray_output.txt>
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)host304.hostmonster.com
> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>
>
--
Chethan Pandarinath
chethan.pandarinath(a)gmail.com
Hi everybody,
I've been working on installing PyCUDA on my MacBook (Snow Leopard). I
think I'm close to having it working, it's been quite a long road: getting
compatible versions of boost, setuptools, pytools... anyway, I think I'm
past all that.
I can now successfully run test_math.py and test_driver.py with no errors.
I am having trouble, however, with test_gpuarray.py
I'm attaching the output from running the test script, with the standard
error appended as well.
$ arch -i386 python test_gpuarray.py >test_gpuarray_output.txt
There are a couple types of errors that I can see here:
E AssertionError: (array(-4.2129004090721698e+36),
-1.2509687918788644e+303, <type 'numpy.float64'>, 'min')
test_gpuarray.py:328: AssertionError
E assert array(-6.0786212321272663e+144) ==
-2.6357594520767543e+301
test_gpuarray.py:357: AssertionError
E LaunchError: cuCtxPopCurrent failed: launch failed
/Library/Python/2.6/site-packages/pycuda/tools.py:504: LaunchError
E RuntimeError: make_default_context() wasn't able to create a
context on any of the 1 detected devices
/Library/Python/2.6/site-packages/pycuda/tools.py:216: RuntimeError
To tell you the truth I don't know how to begin debugging this. If anyone
can point me in the right direction, I'd greatly appreciate it. Willing to
provide any info that would help...
I'm running this on a MacBook, Snow Leopard (10.6.3), with an NVIDIA GeForce
9400M. Using Python 2.6.1.
Thanks.
Chethan
--
Chethan Pandarinath
chethan.pandarinath(a)gmail.com
Hi all,
PyCUDA's present release version (0.93) is starting to show its age, and
so I've just rolled a release candidate for 0.94, after tying up a few
loose ends--such as complete CUDA 3.0 support.
Please help make sure 0.94 is solid. Go to
http://pypi.python.org/pypi/pycuda/0.94rc
to download the package, see if it works for you, and report back.
The change log for 0.94 is here:
http://documen.tician.de/pycuda/misc.html#version-0-94
but the big-ticket things in this release are:
- Support for CUDA 3.0
- Sparse matrices
- Complex numbers
Let's make this another another rockin' release!
Thanks very much for your help,
Andreas
So I've been spending the last several hours trying to install PyCUDA on
Snow Leopard. Initially, seeing that CUDA 3.0 was 64-bit on OSX, I attempted
to install everything 64-bit (boost, python, pycuda, etc.). I got as far as
running test_driver.py which froze my machine completely, needing a hard
reboot. No error message or anything (not the ones others have been
reporting).
Ok, take 2, try to install everything 32-bit. I got boost compiled as
32-bit, a fresh install of Sage 32-bit for Python. PyCUDA 0.93 after
patching out setuptools for distutils makes and test_driver.py runs to
success with the following strange warnings:
/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.93-py2.6-macosx-10.5-x86_64.egg/pycuda/compiler.py:11:
UserWarning: call_capture_stdout is deprecated: use call_capture_output
instead
return call_capture_stdout([nvcc, "--version"])
/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.93-py2.6-macosx-10.5-x86_64.egg/pycuda/compiler.py:192:
UserWarning: Reading 'lmem' from cubin failed--SourceModule metadata may be
unavailable.
warn("Reading '%s' from cubin failed--SourceModule metadata may be
unavailable." % key)
/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.93-py2.6-macosx-10.5-x86_64.egg/pycuda/compiler.py:192:
UserWarning: Reading 'smem' from cubin failed--SourceModule metadata may be
unavailable.
warn("Reading '%s' from cubin failed--SourceModule metadata may be
unavailable." % key)
/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.93-py2.6-macosx-10.5-x86_64.egg/pycuda/compiler.py:192:
UserWarning: Reading 'reg' from cubin failed--SourceModule metadata may be
unavailable.
warn("Reading '%s' from cubin failed--SourceModule metadata may be
unavailable." % key)
Which seems to mean mod.smem and all that return None instead of the proper
values. The kernels still seem to work however, so this is vaguely
acceptable at the moment.
Trying to install PyCUDA 0.94rc (from PyPI) also finishes make but many of
the tests now fail with errors like this and nothing works:
E ImportError:
dlopen(/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.94rc-py2.6-macosx-10.5-x86_64.egg/pycuda/_pvt_struct.so,
2): no suitable image found. Did find:
E
/Users/cyrus/sage/local/lib/python2.6/site-packages/pycuda-0.94rc-py2.6-macosx-10.5-x86_64.egg/pycuda/_pvt_struct.so:
mach-o, but wrong architecture
I have no idea why both 0.93 and 0.94 call the directory macosx-10.5-x86_64
instead of 10.6-i386.
The latest git version has strange issues during make install where it ends
with an error about configure.py not having been run the first time I do it
(even though it had been) and then running it again goes to completion. The
test_driver.py errors are the same as 0.94rc.
For reference, PyOpenCL works fine in 64-bit mode on this machine.
Any ideas?
Cyrus
Dear NYC Pythonistas,
We at the Modi Research Group of the Earth Institute at Columbia
University are hiring another 100% Python commando to join our team of
consultants. The Modi Research Group plans energy services and
infrastructure in developing countries.
This is a full-time position with benefits, based on the university
campus. Please apply through the Columbia University website at
http://jobs.columbia.edu/applicants/Central?quickFind=118575
We will be optimizing, web-enabling and scaling our remote sensing
system for finding houses in satellite images and integrating it with
our existing web-based infrastructure planning system. Both systems
are written almost entirely using Python. The web framework is build
on Pylons using jQuery and OpenLayers, scaled locally using RabbitMQ
and AMQP but we are considering Amazon EC2 for the public deployment.
Despite the serious nature of our work and the hectic pace of the lab
around us, our software team maintains a fun and relaxed environment
to maximize creativity and learning. If you are a skilled and
disciplined software engineer, you will appreciate the level of
creativity that this position requires as well as the freedom to
incorporate cutting-edge techniques to solve challenging engineering
and mathematical problems. Our software buildout cycle is rapid and
you will be able to see your work deployed quickly depending on the
quality of your code. Though we work as a team, you will largely be
expected to be self-motivated, self-directed and able to find and fix
problems with minimal supervision. Please be warned -- You must love
mathematics as much as you love writing code.
If you are not a citizen, be sure to include the status of your visa.
The people in the lab are diverse in technical expertise and
background. You can read more about us and our work at
http://modi.mech.columbia.edu
Thank you for considering a position with us.
You're welcome to forward this email to anyone who is qualified and interested.
The full job description is below.
Job Title
GIS Systems Optimizer
Job Requisition Number
058260
Department
664-EARTH INSTITUTE
Location
Morningside
Job Type
Officer Full-Time Regular
Job Family
Engineering
Salary Grade
12
Salary Range
Commensurate with experience
Advertised Summary Job Description
A GIS Systems Optimizer is needed to document, optimize and deploy a
remote sensing system on satellite images from twelve sites of the
Millennium Villages Project, as well as prepare papers and
presentations on the system. To document the performance of the
system, the incumbent will design experiments and run the system on
satellite images from twelve Millennium Village sites. To optimize the
system to scan images faster, s/he will lead a team of developers in
rewriting the image recognition code to run on an NVIDIA GPU (graphics
processing unit) instead of the standard CPU, a step that has been
proven in industry and the literature to accelerate image recognition
by several orders of magnitude. To make the system more robust, s/he
will design unit tests to ensure that future enhancements to the
system will not break existing features. To make the system
accessible, s/he will work with our developers on web-enabling and
deploying the system on the Amazon EC2 cloud computing platform. The
publishing component of the position will consist of submitting papers
to journals and giving demonstrations at conferences. The incumbent
will also train individuals on how to use and improve the system,
record screen casts of the system in use and apply to grants for
possible funding of the next stage of the project.
Minimum Qualifications for Grade
Applicant MUST meet these minimum qualifications to be considered an
applicant Bachelor's degree required, preferably in engineering.
Minimum of 3 years related experience in research or industry
required.
Additional Position-Specific Minimum Qualifications
Applicant MUST meet these minimum qualifications to be considered an
applicant Experience in building a Python-based software system is
essential, as is prior work with GIS data and satellite imagery.
Proven attention to detail and the ability to prioritize and manage
multiple projects simultaneously is a must. Excellent oral and written
communication skills are required. Must be able to work in a team and
be willing to learn new technologies quickly.
Preferred Qualifications
Preference will be given to candidates who possess a master's degree
in computer science, mathematics or electrical engineering and who
have experience with image recognition and cloud deployment.
Requisition Open Date 04-16-2010
Quick Link jobs.columbia.edu/applicants/Central?quickFind=118575
EEO Statement
Columbia University is an Equal Opportunity/Affirmative Action employer.
I find myself out of my depth again. I'm playing with complex numbers
using 0.94rc (on Windows XP with CUDA 2.3). I've successfully used
simple operations (addition, multiplication) on complex numbers, that
resulted in the Mandelbrot example in the wiki.
Now I'm trying sin and log and I'm getting errors and one positive
result (see the end). I'm not sure if these functions are supported
yet? If anyone can point me in the right direction (and perhaps
suggest what needs implementing) then I'll have a go at fixing the
problem.
For reference, you can do the following using numpy on the CPU for verification:
In [117]: numpy.sin(numpy.array(numpy.complex64(1-1j)))
Out[117]: (1.2984576-0.63496387j)
Here are two pieces of test code. At first I used multiplication
(rather than sin()) to confirm that the code ran as expected.
Next I tried creating a simple complex64 array, passing it to the GPU
and then asking for pycuda.cumath.sin() - this results in "Error:
External calls are not supported". Here's the code and error:
========
import pycuda.driver as drv
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import pycuda.cumath
import numpy
a = numpy.array(numpy.complex64(1-1j)) # make list of 1 complex element
a_gpu = gpuarray.to_gpu(a)
pycuda.cumath.sin(a_gpu) # should produce (1.29-0.63j)
print a_gpu.get()
========
In [62]: %run complex_test.py
kernel.cu
tmpxft_00000840_00000000-3_kernel.cudafe1.gpu
tmpxft_00000840_00000000-8_kernel.cudafe2.gpu
./kernel.cu(19): Error: External calls are not supported (found
non-inlined call to _ZN6pycuda3sinERKNS_7complexIfEE)
---------------------------------------------------------------------------
CompileError Traceback (most recent call last)
16 a = numpy.array(numpy.complex64(1-1j)) # make list of 1 complex element
17 a_gpu = gpuarray.to_gpu(a)
---> 18 pycuda.cumath.sin(a_gpu) # should produce (1.29-0.63j)
========
I replaced:
pycuda.cumath.sin(a_gpu) # should produce (1.29-0.63j)
with:
pycuda.cumath.log(a_gpu) # should produce (0.34-0.78j)
and the code ran without an error...but produced the wrong result. It
generates (1-1j) which looks like a no-op.
Next I tried similar functionality using sin() in an
ElementwiseKernel, this generates the same "Error: External calls are
not supported" problem:
========
import pycuda.driver as drv
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import pycuda.cumath
import numpy
from pycuda.elementwise import ElementwiseKernel
complex_gpu = ElementwiseKernel (
"pycuda::complex<float> *z",
"z[i] = sin(z[i]);",
"complex_fn",
"#include <pycuda-complex.hpp>,"
)
a = numpy.array(numpy.complex64(1-1j)) # make list of 1 complex element
a_gpu = gpuarray.to_gpu(a)
#sin((1-1j)) should produce (1.29-0.63j)
#log((1-1j)) should produce (0.34-0.78j)
complex_gpu(a_gpu)
print a_gpu.get()
========
In [56]: %run complex_test.py
*** compiler output in c:\docume~1\parc\locals~1\temp\tmpjqvxpv
kernel.cu
kernel.cudafe1.gpu
kernel.cudafe2.gpu
./kernel.cu(19): Error: External calls are not supported (found
non-inlined call to _ZN6pycuda3sinERKNS_7complexIfEE)
---------------------------------------------------------------------------
CompileError Traceback (most recent call last)
C:\Panalytical\pycuda_git\pycuda0.94\pycuda\examples\complex_test.py
in <module>()
26 "z[i] = sin(z[i]);",
27 "complex_fn",
---> 28 "#include <pycuda-complex.hpp>,"
29 )
....
CompileError: nvcc compilation of
c:\docume~1\parc\locals~1\temp\tmpjqvxpv\kernel.cu failed
[command: nvcc --cubin -arch sm_11
-IC:\Python26\lib\site-packages\pycuda-0.94rc-py2.6-win32.egg\pycuda\..\include\pycud
a --keep kernel.cu]
WARNING: Failure executing file: <complex_test.py>
========
*However* if I replace sin() with log():
from pycuda.elementwise import ElementwiseKernel
complex_gpu = ElementwiseKernel (
"pycuda::complex<float> *z",
"z[i] = log(z[i]);",
"complex_fn",
"#include <pycuda-complex.hpp>,"
)
then I get the correct result!
========
In [126]: %run complex_test.py
(0.346573650837-0.785398185253j)
========
Does anyone know why sin() doesn't work in both cases and log() works
in an ElementwiseKernel but not correctly as a pycuda.cumath call?
Ian.
--
Ian Ozsvald (A.I. researcher, screencaster)
ian(a)IanOzsvald.com
http://IanOzsvald.comhttp://morconsulting.com/http://TheScreencastingHandbook.comhttp://ProCasts.co.uk/examples.htmlhttp://twitter.com/ianozsvald
Hi, Brian --- Thanks for the pointer. Just to make sure I understand what's
going on, from the CUDA documentation, the correct sequence seems to be:
(1) Initialize the runtime, which creates a context.
(2) Attach PyCUDA to this context from the runtime
(3) Run kernel code
(4) Detach
Is this right? If so, is there something about PyCUDA that makes it
difficult to do this, or is it just a matter of setting up and tearing down
the driver instance in the right order? (Another thing I am not totally
clear on is what PyCUDA's lifecycle is for managing contexts. Does it leave
a single context up as long as the python process runs, or does it tear them
down as soon as there aren't any references to gpuarray instances?)
I admit to being something of a novice with PyCUDA and CUDA generally, so
sorry for pestering the list.
^L
On Sun, Apr 18, 2010 at 10:17 AM, Bryan Catanzaro <bryan.catanzaro(a)gmail.com
> wrote:
> It's not a solution, but the workaround I've been using is to use
> context.detach() rather than context.pop() at the end of the
> computation. If you look at pycuda.autoinit, you can see what needs
> to be done to initialize a CUDA context. Do the same thing manually,
> just change the function you register with atexit to be
> context.detach, and the errors should go away.
>
> This is just a workaround, though, and doesn't solve the underlying
> problem.
>
> - bryan
>
> On Apr 18, 2010, at 6:09 AM, Louis Theran <theran(a)temple.edu> wrote:
>
> >
> >> I'm trying to mix some ctypes wrapped CUDA runtime C libraries with
> >> PyCUDA.
> >>
> >> Even using CUDA 3.0, I am getting errors because PyCUDA seems to be
> >> trying to push/pop the current context.
> >>
> >> From the docs, it's not quite clear what I have to do in this
> >> regard. Any help from those of you successfully doing this would be
> >> much appreciated.
> >>
> >> Thanks!
> >>
> >> ^L
> >
> > _______________________________________________
> > PyCUDA mailing list
> > PyCUDA(a)host304.hostmonster.com
> > http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>