Dnia 2010-12-26, nie o godzinie 17:42 +0100, Andreas Kloeckner pisze:
On Tue, 21 Dec 2010 23:21:59 +0100, Tomasz Rybak <bogomips(a)post.pl> wrote:
I have wrote attached code that calculates
parallel prefix sum.
There are two variants - the first one (exclusive) is based on article
by Mark Harris from NVIDIA, second one (inclusive) is based on diagram
Inclusive scan could be optimised with regard to shared memory access
conflicts, similarly to the exclusive version.
Also it seems that inclusive scan is less stable numerically - results
differ from CPU version when calculating scan of large arrays.
Again, thanks for your contribution! Here are a few comments on the
- It doesn't seem like the inclusive and the exclusive version are so
dissimilar. As such, I don't think we should duplicate code for the
two. If necessary, I'd even prefer to make this code depend on Mako or
Jinja (template engines) to avoid code duplication.
Fixed code repetition, added class inheritance.
I am not sure about joining exclusive and inclusive scans - they
use different algorithms.
For now I have added functions and classes to pycuda.reduction - feel
free to move them to another module.
- Use warnings.warn, not plain print, for warnings.
- Tests should go into tests/test_gpu_array or some such.
Done, also added documentation (see patch).
- Formal nitpicks: Please indent comments with the rest of the code.
PEP 8 says a=value (no spaces) for keyword arguments. Camel case in C
is yucky, too. :)
- 'Sum' is poor wording for the general associative operation that the
scan uses--use scan_op perhaps.
Tomasz Rybak <bogomips(a)post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860