Sven Schreiber schrieb:
Andrew Straw schrieb:
> Sven Schreiber wrote:
>>>
>> Yes, absolutely! So really my question was meant as:
>>
>> Ubuntu 9.10 and Nvidia SDK howto?
>>
> The examples are working for me on Karmic with the attached siteconf.py,
> but I haven't gone any further. I'm using amd64 arch, the 195.30 beta
> drivers, and a GeForce GTX 260.
>
Thanks, I will keep the possibility in mind to upgrade to the beta
drivers. However, searching for "nvidia sdk gcc 4.4" I found some
instructions how to get the cuda sdk up and running on ubuntu 9.10. I'll
try these soon and probably report back here to leave some hints for
future readers with the same problem.
Ok, so I have pyopencl-0.91.4 as well as pycuda-0.93 now up and running
here.
The combination is (short version):
* Driver 195.30 beta
* gcc symlink pointing to gcc-4.4 for compiling the driver kernel
module, but pointing to gcc-4.3 for the rest
* cudatoolkit_2.3_linux_32_ubuntu9.04.run
* gpucomputingsdk_2.3b_linux.run
* (for the Cuda stuff, following the advice in
http://moelhave.dk/2009/12/nvidia-cuda-on-ubuntu-karmic-koala/; and for
Nvidia's OpenCL examples I also changed the CXX, CC, and LINK lines in
OpenCL/common/common_opencl.mk)
I had problems with the 190.29 drivers, and while pycuda worked with the
190.53 drivers, (py)opencl didn't -- I guess the latter is expected. So
for me indeed only the 195.30 beta drivers seem to work with both.
BTW, a remark about the benchmark-all.py example file. I think the speed
comparison there is a little biased in favor of pyopencl. It compares
(almost) pure Python with pyopencl, but IMHO the more meaningful
comparison would be between Numpy vectorized code and pyopencl. AFAICS
the numpy equivalent of the pure Python code would be:
for j in range(1000): # number of iterations, just for comparability
n_result = (a+b)**2 * (a/2.0)
At least the results seem to agree when checked afterwards. On my test
system I get the following timings:
* pure Python: 20.85s
* vectorized Numpy on CPU: 0.044s
* pyopencl on GPU: 0.034s
Of course I'm *not* saying that the pyopencl approach isn't fast and
useful. (My test graphics card is very low end and is on the slow PCI
bus.) But the first one or two orders of magnitude can be achieved
already without any GPU magic.
thank you for these very cool tools,
sven