[PyOpenCL] PyOpenCL 2011.2
bogomips at post.pl
Tue Jan 17 15:06:09 PST 2012
On Tue, 2012-01-17 at 10:01 -0500, Andreas Kloeckner wrote:
> On Tue, 17 Jan 2012 15:05:00 +0100, Tomasz Rybak <bogomips at post.pl> wrote:
> > On Mon, 2012-01-16 at 20:58 -0500, Andreas Kloeckner wrote:
> > > Hi Tomasz,
> > >
> > > >
> > > > I think I found it.
> > > > Like in CUDA reduction bug (related to Fermi) it again seems
> > > > to be related to too eager concurrency when reducing results.
> > > > According to http://oscarbg.blogspot.com/2009/10/news-from-web.html
> > > > "Actually the wavefront size is only 64 for the highend cards(48XX,
> > > > 58XX, 57XX), but 32 for the middleend cards and 16 for the lowend
> > > > cards."
> > > > IMO we should use PREFERRED_WORK_GROUP_SIZE_MULTIPLE to get
> > > > non_sync_size. At the same size we lose SIMD CPU optimisation,
> > > > but I do not know for now how to fix those two at the same time.
> > > > Attached patch fixes problem on Loveland, not breaking anything on
> > > > NVIDIA ION.
> > > >
> > > > Investigating this I have found another problem with reasonable_work_*
> > > > function. First, dev.warp_size_nv was raising LogicError (not
> > > > AttributeError) so I have changed it to be the same as in
> > > > get_simd_group_size. Second, there was problem with getting attributes
> > > > from compiled but not build kernel. I had to add prg.build() and
> > > > __kernel and __global - without those I was getting SEGFAULT
> > > > from AMD OpenCL libraries.
> > >
> > > Thank you very much for investigating this, and for your fixes. I've
> > > changed your fix slightly, in that get_simd_group() now *uses*
> > > reasonable_work_group_size_multiple to find its best guess at the AMD
> > > GPU wavefront size.
> > >
> > > I'd much appreciate if you could check the current code and report
> > > back. We can then debate what to do about PyOpenCL 2012.1 (yes, it'll be
> > > that).
> > Code works OK on both Loveland and ION (all tests except image on CPU
> > pass). I had to add pyopencl.characterize to setup.py (patch attached)
> > for package to install characterize on Debian after your changes
> > though.
> Good catch, thanks. Applied. Now there are two options: Release as-is,
> or add a bit more 'scan magic'. By that I mean a) segmented scan and b)
> all those little scan-based magic tricks that Thrust can do--copy_if,
> unique_by_key, etc. Given that we have a working scan, those aren't hard
> to add. It would take about a week, I guess. I'll leave the choice up to
I am not in hurry, and Debian will not freeze for some time,
so in my opinion we can wait for 2012.1.
It there some description of planned scan improvements - I would
like to help.
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 836 bytes
Desc: This is a digitally signed message part
More information about the PyOpenCL