I've pulled latest git version and built PyCUDA on Debian
unstable. I've tested on two machines - one with Fermi (GTX 460)
and one with ION (9400M). In both cases all tests pass for Python 2.7.3
and only 2 tests fail for Python 3.2.
Failing tests test transfer of data from numpy to PyCUDA GPUArray;
there is some problem with C++ code not recognizing numpy object type
(bytearray vs. bytes).
a) try to build PyCUDA on Python3 and confirm
b) look at those failing tests?
But the good news is that we have (almost) working PyCUDA on Python3!
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
Dear Mr. Maze,
"Mr. Maze" <void(a)madmaze.net> writes:
> I am currently working on an application where I need to retrieve the index
> of the max value.
> Is there a way to get the index along with the max of a gpuarray?
> At the moment I am returning the array back to the host just to locate
> where the max value is.
you should be able to adapt this code here:
to not collect min and max but max and max-index.
I am currently working on an application where I need to retrieve the index
of the max value.
Is there a way to get the index along with the max of a gpuarray?
At the moment I am returning the array back to the host just to locate
where the max value is.
What is the maximum attachment size for this list? I made a file with the
demo file and result. Some (many) demos gave errors. It could be that
there is a misconfiguration on my part, so I thought it may be better to
send the file, rather than piece by piece. It is around 20kB, if I recall
Ubuntu 12.10, cuda 4.2
So I've got this program using Elementwise and I want to up the performance one more level. Nobody to my knowledge has written about using shared memory, but that does not mean it can't be done in an Elementwise program. How can shared memory be used in an elementwise program without completely rewriting the thing as SourceModule? That is, how to get an incremental improvement in my existing ElementwiseKernel program, with the least code change?
I suspect shared memory is the key. I have lots of array work in my program, naturally. To use shared memory, I imagine that the program would need to detect how many i's per block there are, because shared memory is block scoped (by i I mean the magic i that's passed in by the pycuda system to an ElementwiseKernel), and this value would be used as the size of the array of shared memory to be allocated. I'm also not sure which thread should allocate the memory; probably only one thread per block should do this but I don't know how that could be achieved. Is the thread having the index 0 for x be the key here? And how would an ElementwiseKernel reference that x value?
Would any of the cuda wizkids like to propose how a program might detect the number of i's in a block of ElementwiseKernel, and show how to use the shared memory in it?
Malcolm Tobias <mtobias(a)wustl.edu> writes:
> Sorry to bug you, but I run a cluster at Washington Univ. in St. Louis and we recently added some GPU nodes. We have several python users, and one has requested that I install PyCUDA on our system. For the first attempt, I tried building against CUDA 5.0:
> [root@login002 pycuda-2012.1]# /export/epd-7.0.2/bin/python configure.py --cuda-root=/export/cuda-5.0
> The build failed when it attempted to link against libcuda:
> [root@login002 pycuda-2012.1]# make install
> lib -lcuda -lcurand -lpython2.7 -o build/lib.linux-x86_64-2.7/pycuda/_driver.so
> /usr/bin/ld: cannot find -lcuda
> collect2: ld returned 1 exit status
> error: command 'g++' failed with exit status 1
> make: *** [install] Error 1
> I had seen this problem before with another CUDA application. AFAICT, when going from CUDA 4.x to 5.0 they've dropped libcuda? I can't find this documented anywhere, but in 4.x they had separate installers for "toolkit" and "driver" whereas with 5.0 they packages this all in a single installer. I know the driver must be installed, since I can run simple commands like 'nvidia-smi' and more complicated GPU applications without issues, but there's no libcuda:
> [root@gpu001 ~]# ls /usr/local/cuda-5.0/lib64/
> libcublas_device.a libcufft.so.5.0 libcusparse.so.5.0
> libcublas.so libcufft.so.5.0.35 libcusparse.so.5.0.35
> libcublas.so.5.0 libcuinj64.so libnpp.so
> libcublas.so.5.0.35 libcuinj64.so.5.0 libnpp.so.5.0
> libcudadevrt.a libcuinj64.so.5.0.35 libnpp.so.5.0.35
> libcudart.so libcurand.so libnvToolsExt.so
> libcudart.so.5.0 libcurand.so.5.0 libnvToolsExt.so.5.0
> libcudart.so.5.0.35 libcurand.so.5.0.35 libnvToolsExt.so.5.0.35
> libcufft.so libcusparse.so
> I tried hacking the PyCUDA build script to ignore the -libcuda, which allowed the build to succeed, but (not surprisingly) complained about a missing symbol when I tried running the tests:
> [root@login002 test]# /export/epd-7.0.2/bin/python test_driver.py
> ImportError: /export/epd-7.0.2/lib/python2.7/site-packages/pycuda-2012.1-py2.7-linux-x86_64.egg/pycuda/_driver.so: undefined symbol: cuMemAllocPitch_v2
> I tried building against the 4.2 version of CUDA (I was assuming that since the PyCUDA release was ~1 year old it might not support 5.0), but while that builds, it wont run since the driver doesn't match the library version number.
> Any suggestions you could give about building PyCUDA would be greatly appreciated. If this is better directed to a mailing list, please let me know.
(I've cc'd the list.)
I'm guessing the reason why you're not seeing libcuda.so is because that
actually gets installed along with the GPU driver, not the CUDA
Toolkit. In other words, you either have to build PyCUDA on a machine
that has a GPU, or you somehow have to force the driver to install in
the absence of a GPU.
Hope that helps,
Bruce Labitt <bdlabitt(a)gmail.com> writes:
> Having troubles sending to list. I received msgs but have yet to be able to
FWIW, I'm hearing you loud and clear. Also, if your messages show up
you can assume that they've gotten picked up and distributed.
Not sure if I have an installation issue or something entirely else. Been
testing all of the demo programs from the wiki. Many fail from the IPython
environment. Not sure if there is an interaction, or ? I have a file with
my results (attached).
Are these issues expected? I did try to modify one of the programs to plot
to matplotlib. That didn't work, I think because I couldn't force it to
run in an interactive mode.
Some of the programs failed in both ipython and as $ python demo_prog.py
MeasureGpuarraySpeedRandom.py works in python but generates a MemoryError
Traceback in ipython.
Demo3DSurface.py gives a Segmentation Fault in both ipython and python
If someone can give me a few tips, it would be greatly appreciated.