On Tue, 17 Jan 2012 15:05:00 +0100, Tomasz Rybak
On Mon, 2012-01-16 at 20:58 -0500, Andreas
I think I found it.
Like in CUDA reduction bug (related to Fermi) it again seems
to be related to too eager concurrency when reducing results.
According to http://oscarbg.blogspot.com/2009/10/news-from-web.html
"Actually the wavefront size is only 64 for the highend cards(48XX,
58XX, 57XX), but 32 for the middleend cards and 16 for the lowend
IMO we should use PREFERRED_WORK_GROUP_SIZE_MULTIPLE to get
non_sync_size. At the same size we lose SIMD CPU optimisation,
but I do not know for now how to fix those two at the same time.
Attached patch fixes problem on Loveland, not breaking anything on
Investigating this I have found another problem with reasonable_work_*
function. First, dev.warp_size_nv was raising LogicError (not
AttributeError) so I have changed it to be the same as in
get_simd_group_size. Second, there was problem with getting attributes
from compiled but not build kernel. I had to add prg.build() and
__kernel and __global - without those I was getting SEGFAULT
from AMD OpenCL libraries.
Thank you very much for investigating this, and for your fixes. I've
changed your fix slightly, in that get_simd_group() now *uses*
reasonable_work_group_size_multiple to find its best guess at the AMD
GPU wavefront size.
I'd much appreciate if you could check the current code and report
back. We can then debate what to do about PyOpenCL 2012.1 (yes, it'll be
Code works OK on both Loveland and ION (all tests except image on CPU
pass). I had to add pyopencl.characterize to setup.py (patch attached)
for package to install characterize on Debian after your changes
Good catch, thanks. Applied. Now there are two options: Release as-is,
or add a bit more 'scan magic'. By that I mean a) segmented scan and b)
all those little scan-based magic tricks that Thrust can do--copy_if,
unique_by_key, etc. Given that we have a working scan, those aren't hard
to add. It would take about a week, I guess. I'll leave the choice up to
I am not in hurry, and Debian will not freeze for some time,
so in my opinion we can wait for 2012.1.
It there some description of planned scan improvements - I would
like to help.
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860