[PyCUDA] Parallel prefix scan

Tomasz Rybak bogomips at post.pl
Tue Dec 21 14:21:59 PST 2010


Hello.
I have wrote attached code that calculates parallel prefix sum.
There are two variants - the first one (exclusive) is based on article
by Mark Harris from NVIDIA, second one (inclusive) is based on diagram
from Wikipedia.
Inclusive scan could be optimised with regard to shared memory access
conflicts, similarly to the exclusive version.
Also it seems that inclusive scan is less stable numerically - results
differ from CPU version when calculating scan of large arrays.

I would like to put this code into PyCUDA, to serve similar
purpose as reduction kernels - that's why I tried to keep similar API.

Please give feedback.
Regards.

-- 
Tomasz Rybak <bogomips at post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A  488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
-------------- next part --------------
A non-text attachment was scrubbed...
Name: prefix.py
Type: text/x-python
Size: 27496 bytes
Desc: not available
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20101221/44b8dcbd/attachment-0001.py>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20101221/44b8dcbd/attachment-0001.pgp>


More information about the PyCUDA mailing list