On Montag 19 Oktober 2009, Michael Rule wrote:
I'm convinced that I need a prefix scan that gives me access to
resultant prefix scanned array.
so, for example, using addition, I would like a function that takes :
1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9
It seems like this data should be generated as an intermeidiate step in
executing a ReductionKernel. I have not been able to figure out how this
data is accessed by browsing the GPUArray documentation. Am I missing
something obvious ?
Parallel Prefix Scan is presently not implemented in PyCUDA. While reduction
is related, the scan is actually a somewhat different animal. PScan would be a
most welcome addition to PyCUDA, however. Mark Harris has written a good
introduction on how to implement it:
If you decide to follow Mark's guide, almost half your work is already done
for you--reduction occurs as part of the prefix scan, so you'll be able to
recycle a fair bit of code.