I tested it only on 2.6, so it can be the case. Thanks for the bug
report though, this sort of compatibility is easy to add. Can you
please just put "from .kernel import GlobalFFTKernel, LocalFFTKernel,
X_DIRECTION, Y_DIRECTION, Z_DIRECTION" instead of this line?
On Thu, Mar 25, 2010 at 11:19 AM, Imran Haque <ihaque(a)stanford.edu> wrote:
Didn't work - does it require newer than
$ python test_performance.py
Running performance tests...
Traceback (most recent call last):
File "test_performance.py", line 57, in <module>
run(isCudaAvailable(), isCLAvailable(), DEFAULT_BUFFER_SIZE)
File "test_performance.py", line 52, in run
testPerformance(ctx, shape, buffer_size)
File "test_performance.py", line 22, in testPerformance
plan = ctx.getPlan(shape, context=ctx.context, wait_for_finish=True)
File "/home/ihaque/pyfft-0.3/pyfft_test/helpers.py", line 116, in getPlan
line 9, in <module>
from .plan import FFTPlan
from .kernel import *
SyntaxError: 'import *' not allowed with 'from .'
Bogdan Opanchuk wrote:
(sorry, forgot to add maillist to CC)
Thank you for prompt reply, results from 5870 are interesting too. If
you have pyopencl installed, just run test_performance.py from
pyfft_test folder, located in pyfft package. It will print the results
On Thu, Mar 25, 2010 at 11:11 AM, Imran Haque <ihaque(a)stanford.edu> wrote:
> Hi Bogdan,
> I have access to a Radeon 5870, but it's installed in a slow host machine
> (2.8GHz dual core Pentium 4). If this is still useful, I could run a test
> for you if you can send along a quick test case.
> Bogdan Opanchuk wrote:
>> By the way, if it is not too much to ask: if anybody has access to ATI
>> 59** series card and/or GTX 295 - could you please run performance
>> tests from the module (pyfft_test/test_performance.py) and post the
>> results here? I suspect that the poor performance in case of OpenCL
>> can be (partially) caused by nVidia drivers.
>> Thank you in advance.
>> On Sat, Mar 20, 2010 at 10:36 PM, Bogdan Opanchuk <mantihor(a)gmail.com>
>>> Hello all,
>>> I fixed some bugs in my pycudafft module and added PyOpenCL support,
>>> so it is called just pyfft now (and it sort of resolves the question
>>> about including it to PyCuda distribution).
>>> At the moment, the most annoying (me, at least) things are:
>>> 1. OpenCL performance tests show up to 6 times slower speed as
>>> compared to Cuda. Unfortunately, I still can't find the reason.
>>> (The interesting thing is that PyOpenCL is still noticeably faster
>>> than original Apple's C program with the same FFT algorithm).
>>> 2. I tried to support different ways of using plans, including
>>> precreated contexts, streams/queues and asynchronous execution. This
>>> resulted in quite messy interface. Any suggestions about making it
>>> more clear are welcome.
>>> 3. Currently, the only criterion for kernel's block sizes is maximum
>>> allowed by the number of used registers. Resulting occupancy in Cuda
>>> kernels is 0.25 - 0.33 most of the time. But when I try to recompile
>>> kernels with different block sizes in order to find maximum occupancy,
>>> this makes kernels even slower.
>>> Best regards,
>> PyCUDA mailing list