Hi everybody,
I am new on PyCuda. I just installed everything on Windows XP and, from the installation log, I think that I did it properly. However, I tried to run the test files provided with pycuda and I get this error
Traceback (most recent call last):
File "C:\PyCuda\test\test_gpuarray.py", line 2, in <module>
import pycuda.autoinit
File "C:\PyCuda\pycuda\autoinit.py", line 1, in <module>
import pycuda.driver as cuda
File "C:\PyCuda\pycuda\driver.py", line 1, in <module>
from _driver import *
ImportError: No module named _driver
how can I solve it?
thanks and sorry for the newbieness of this post
den3b
_________________________________________________________________
I tuoi amici sempre a portata di clic, sul nuovo Web Messenger
http://www.windowslive.it/foto.aspx
Hi all,
PyCUDA's present release version (0.93) is starting to show its age, and
so I've just rolled a release candidate for 0.94, after tying up a few
loose ends--such as complete CUDA 3.0 support.
Please help make sure 0.94 is solid. Go to
http://pypi.python.org/pypi/pycuda/0.94rc
to download the package, see if it works for you, and report back.
The change log for 0.94 is here:
http://documen.tician.de/pycuda/misc.html#version-0-94
but the big-ticket things in this release are:
- Support for CUDA 3.0
- Sparse matrices
- Complex numbers
Let's make this another another rockin' release!
Thanks very much for your help,
Andreas
Hi,
I built the boost 1.38 libraries from source following the instructions
on the wiki, but this generated about 5 GB of material.
Do I need all of it, or can I trim this down?
Thanks!
Hi,
I'd installed latest boost, python2.6 and CUDA 3.0 (all is 64-bit) and
compile latest pycuda 0.94rc.
AFAIK CUDA 3.0 is 64-bit compatible.
But when I try to execute any pycuda code, it raise this exception:
~# python2.6 hello_gpu.py
Traceback (most recent call last):
File "hello_gpu.py", line 3, in <module>
import pycuda.autoinit
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda/autoinit.py",
line 4, in <module>
cuda.init()
pycuda._driver.LogicError: cuInit failed: pointer is 64-bit
How to fix this issue?
PS. Unfortunately, I can't switch to 32-bit due some third-party dependencies.
PPS. PyOpenCL works perfectly on the same machine
--
With best regards,
Andrew Stromnov
I've attached the trace. Lines beginning with ---> are added instrumentation that I put in autoinit.py and cuda.hpp. Also, my workaround has now failed - with some versions of the code the attempt to push a bad context happened in device_allocation::free() - and deleting objects manually helped with that. But other times it fails in ~module(), and I'm not sure how to bypass that one.
Thanks,
bryan
On Mar 25, 2010, at 7:51 PM, Andreas Klöckner wrote:
> On Donnerstag 25 März 2010, Bryan Catanzaro wrote:
>> Hi All -
>> I've been getting problems with the following error:
>>
>> terminate called after throwing an instance of 'cuda::error'
>> what(): cuCtxPushCurrent failed: invalid value
>>
>> After poking around, I discovered that context.pop(), registered using
>> atexit in pycuda.autoinit, is being called *before* all the destructors
>> for various things created during my program.
>
> This is by design. Since destructors may be called on out-of-context
> objects, they need to make sure that 'their' context is active anyway.
> In your case the context looks to have been *destroyed*, not merely
> switched. Can you run your code with CUDA tracing and send the log?
> (CUDA_TRACE=1 in siteconf.py)
>
> Andreas
Hi All -
I've been getting problems with the following error:
terminate called after throwing an instance of 'cuda::error'
what(): cuCtxPushCurrent failed: invalid value
After poking around, I discovered that context.pop(), registered using atexit in pycuda.autoinit, is being called *before* all the destructors for various things created during my program. I can work around this problem by forgoing the use of autoinit and atexit, and instead manually creating the context, manually deleting my GPUArrays, and then manually running context.pop() at the end of my program
But this all seems wrong. Perhaps I'm doing something to confuse atexit so that it's executing before the other destructors? Has anyone else seen these problems?
(Running PyCUDA from git commit b657bb2f3c3da59684c72c419b709b53e2e186aa, Mac OS X 10.6, python 2.6.1)
- bryan
Greetings everyone,
This year, there will be two days of tutorials (June 28th and 29th) before the
main SciPy 2010 conference. Each of the two tutorial tracks (intro, advanced)
will have a 3-4 hour morning and afternoon session both days, for a total of 4
intro sessions and 4 advanced sessions.
The main tutorial web page for SciPy 2010 is here:
http://conference.scipy.org/scipy2010/tutorials.html
We are currently in the process of planning the tutorial sessions. You
can help us in two ways:
Brainstorm/vote on potential tutorial topics
============================================
To help us plan the tutorials, we have setup a web site that allow everyone in
the community to brainstorm and vote on tutorial ideas/topics.
The website for brainstorming/voting is here:
http://conference.scipy.org/scipy2010/tutorialsUV.html
The tutorial committee will use this information to help select the tutorials.
Please jump in and let us know what tutorial topics you would like to see.
Tutorial proposal submissions
=============================
We are now accepting tutorial proposals from individuals or teams that would
like to present a tutorial. Tutorials should be focused on covering a well
defined topic in a hands on manner. We want to see tutorial attendees coding!
We are pleased to offer tutorial presenters stipends this year for the first
time:
* 1 Session: $1,000 (half day)
* 2 Sessions: $1,500 (full day)
Optionally, part of this stipend can be applied to the presenter's
registration costs.
To submit a tutorial proposal please submit the following materials
to 2010tutorials(a)scipy.org by April 15:
* A short bio of the presenter or team members.
* Which track the tutorial would be in (intro or advanced).
* A short description and/or outline of the tutorial content.
* A list of Python packages that attendees will need to have installed to
follow along.
Cheers,
Brian Granger
SciPy 2010, Tutorial Chair
Hi Imran,
Thank you for the info, I'll fix the code - python 2.5 is still widely
used. As for the ATI drivers, I thought the latest release version of
Stream (2.01) supports OpenCL. I wonder if the terrible performance
(these tests run faster on my GF9600) and this deadlock issue are
really caused by the drivers you use... I was actually going to order
server with ATI GPU for my simulations (because of their advertised
Gflops numbers for both single and double precision), but I am
starting to reconsider this decision now.
Best regards,
Bogdan
On Thu, Mar 25, 2010 at 12:13 PM, Imran Haque <ihaque(a)stanford.edu> wrote:
> Hi Bogdan,
>
> I also had to do the following to get the test to run:
>
> - kernel.py:45: change "except AssertionError as e:" to "except
> AssertionError:"
> - plan.py:4: add getRadixArray to import list from .kernel_helpers
>
> I was able to get the following pair of results, but then the test hung. The
> machine has prerelease ATI drivers installed, so that might be the issue.
> However, I've also encountered cases in my own work with code that is
> formally incorrect (e.g., barriers that are not uniformly executed) on which
> the Nvidia runtime does not deadlock but the ATI runtime does, so it might
> be worth checking to see if you have any situations like that.
>
> $ python test_performance.py
> Running performance tests...
> * cl, (16,), batch 131072: 1.85770988464 ms, 22.5778203296 GFLOPS
> * cl, (1024,), batch 2048: 13.0976915359 ms, 8.00580771903 GFLOPS
>
> Cheers,
>
> Imran
>
> Bogdan Opanchuk wrote:
>>
>> Hello Imran,
>>
>> kernel.py requires patching too:
>> - from .kernel_helpers import *
>> + from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo,
>> getPadding, getSharedMemorySize
>>
>> I hope this will be enough. Sorry for the inconvenience, I'm going to
>> commit it in the repository. I need to add some version check too,
>> because there will definitely be other bugs on Python 2.4, which is
>> still used by some Linux distros )
>>
>> Best regards,
>> Bogdan
>>
>> On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <mantihor(a)gmail.com>
>> wrote:
>>
>>>
>>> Hello Imran,
>>>
>>> I tested it only on 2.6, so it can be the case. Thanks for the bug
>>> report though, this sort of compatibility is easy to add. Can you
>>> please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel,
>>> X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line?
>>>
>>> Best regards,
>>> Bogdan
>>>
>>> On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <ihaque(a)stanford.edu>
>>> wrote:
>>>
>>>>
>>>> Didn't work - does it require newer than Python 2.5?
>>>>
>>>> $ python test_performance.py
>>>> Running performance tests...
>>>> Traceback (most recent call last):
>>>> File "test_performance.py", line 57, in <module>
>>>> run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE)
>>>> File "test_performance.py", line 52, in run
>>>> testPerformance(ctx, shape, buffer_size)
>>>> File "test_performance.py", line 22, in testPerformance
>>>> plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True)
>>>> File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in
>>>> getPlan
>>>> import pyfft.cl
>>>> File
>>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py",
>>>> line 9, in <module>
>>>> from .plan import FFTPlan
>>>> File
>>>> "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py",
>>>> line 3
>>>> from .kernel import *
>>>> SyntaxError: 'import *' not allowed with 'from .'
>>>>
>>>>
>>>> Bogdan Opanchuk wrote:
>>>>
>>>>>
>>>>> Hello Imran,
>>>>>
>>>>> (sorry, forgot to add maillist to CC)
>>>>>
>>>>> Thank you for prompt reply, results from 5870 are interesting too. If
>>>>> you have pyopencl installed, just run test_performance.py from
>>>>> pyfft_test folder, located in pyfft package. It will print the results
>>>>> in stdout.
>>>>>
>>>>> Best regards,
>>>>> Bogdan.
>>>>>
>>>>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <ihaque(a)stanford.edu>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hi Bogdan,
>>>>>>
>>>>>> I have access to a Radeon 5870, but it's installed in a slow host
>>>>>> machine
>>>>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a
>>>>>> test
>>>>>> for you if you can send along a quick test case.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Imran
>>>>>>
>>>>>> Bogdan Opanchuk wrote:
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> By the way, if it is not too much to ask: if anybody has access to
>>>>>>> ATI
>>>>>>> 59** series card and/or GTX 295 - could you please run performance
>>>>>>> tests from the module (pyfft_test/test_performance.py) and post the
>>>>>>> results here? I suspect that the poor performance in case of OpenCL
>>>>>>> can be (partially) caused by nVidia drivers.
>>>>>>>
>>>>>>> Thank you in advance.
>>>>>>>
>>>>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk
>>>>>>> <mantihor(a)gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>>>>>>> so it is called just pyfft now (and it sort of resolves the question
>>>>>>>> about including it to PyCuda distribution).
>>>>>>>>
>>>>>>>> At the moment, the most annoying (me, at least) things are:
>>>>>>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>>>>>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>>>>>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>>>>>>> than original Apple's C program with the same FFT algorithm).
>>>>>>>> 2. I tried to support different ways of using plans, including
>>>>>>>> precreated contexts, streams/queues and asynchronous execution. This
>>>>>>>> resulted in quite messy interface. Any suggestions about making it
>>>>>>>> more clear are welcome.
>>>>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>>>>>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>>>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>>>>>>> kernels with different block sizes in order to find maximum
>>>>>>>> occupancy,
>>>>>>>> this makes kernels even slower.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Bogdan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> PyCUDA mailing list
>>>>>>> PyCUDA(a)host304.hostmonster.com
>>>>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>>>>>>>
>>>>>>>
>>>>>>>
>
Hello Imran,
kernel.py requires patching too:
- from .kernel_helpers import *
+ from .kernel_helpers import log2, getRadixArray, getGlobalRadixInfo,
getPadding, getSharedMemorySize
I hope this will be enough. Sorry for the inconvenience, I'm going to
commit it in the repository. I need to add some version check too,
because there will definitely be other bugs on Python 2.4, which is
still used by some Linux distros )
Best regards,
Bogdan
On Thu, Mar 25, 2010 at 11:36 AM, Bogdan Opanchuk <mantihor(a)gmail.com> wrote:
> Hello Imran,
>
> I tested it only on 2.6, so it can be the case. Thanks for the bug
> report though, this sort of compatibility is easy to add. Can you
> please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel,
> X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line?
>
> Best regards,
> Bogdan
>
> On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <ihaque(a)stanford.edu> wrote:
>> Didn't work - does it require newer than Python 2.5?
>>
>> $ python test_performance.py
>> Running performance tests...
>> Traceback (most recent call last):
>> File "test_performance.py", line 57, in <module>
>> run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE)
>> File "test_performance.py", line 52, in run
>> testPerformance(ctx, shape, buffer_size)
>> File "test_performance.py", line 22, in testPerformance
>> plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True)
>> File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in getPlan
>> import pyfft.cl
>> File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/cl.py",
>> line 9, in <module>
>> from .plan import FFTPlan
>> File "/usr/lib/python2.5/site-packages/pyfft-0.3-py2.5.egg/pyfft/plan.py",
>> line 3
>> from .kernel import *
>> SyntaxError: 'import *' not allowed with 'from .'
>>
>>
>> Bogdan Opanchuk wrote:
>>>
>>> Hello Imran,
>>>
>>> (sorry, forgot to add maillist to CC)
>>>
>>> Thank you for prompt reply, results from 5870 are interesting too. If
>>> you have pyopencl installed, just run test_performance.py from
>>> pyfft_test folder, located in pyfft package. It will print the results
>>> in stdout.
>>>
>>> Best regards,
>>> Bogdan.
>>>
>>> On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <ihaque(a)stanford.edu> wrote:
>>>
>>>>
>>>> Hi Bogdan,
>>>>
>>>> I have access to a Radeon 5870, but it's installed in a slow host machine
>>>> (2.8GHz dual core Pentium 4). If this is still useful, I could run a test
>>>> for you if you can send along a quick test case.
>>>>
>>>> Cheers,
>>>>
>>>> Imran
>>>>
>>>> Bogdan Opanchuk wrote:
>>>>
>>>>>
>>>>> By the way, if it is not too much to ask: if anybody has access to ATI
>>>>> 59** series card and/or GTX 295 - could you please run performance
>>>>> tests from the module (pyfft_test/test_performance.py) and post the
>>>>> results here? I suspect that the poor performance in case of OpenCL
>>>>> can be (partially) caused by nVidia drivers.
>>>>>
>>>>> Thank you in advance.
>>>>>
>>>>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <mantihor(a)gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>>>>> so it is called just pyfft now (and it sort of resolves the question
>>>>>> about including it to PyCuda distribution).
>>>>>>
>>>>>> At the moment, the most annoying (me, at least) things are:
>>>>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>>>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>>>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>>>>> than original Apple's C program with the same FFT algorithm).
>>>>>> 2. I tried to support different ways of using plans, including
>>>>>> precreated contexts, streams/queues and asynchronous execution. This
>>>>>> resulted in quite messy interface. Any suggestions about making it
>>>>>> more clear are welcome.
>>>>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>>>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>>>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>>>>> kernels with different block sizes in order to find maximum occupancy,
>>>>>> this makes kernels even slower.
>>>>>>
>>>>>> Best regards,
>>>>>> Bogdan
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> PyCUDA mailing list
>>>>> PyCUDA(a)host304.hostmonster.com
>>>>> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>>>>>
>>>>>
>>
>