Well.. *that* worked. ;-)
I have no clear I ideal what it all means... I'll have to track down some docs.
thanks for the tip!
On Jan 20, 2012, at 10:55 AM, Andreas Kloeckner wrote:
On Thu, 19 Jan 2012 15:18:55 -0700, Steve Spicklemire <steve(a)spvi.com> wrote:
First, thanks much for your reply. I tried the
luxury dial.. (set it
to zero) and got a factor of 3 speedup! So that's encouraging. My
comparison is a similar approach with weave.inline, not threaded, all
[ 22 more
citation lines. Click/Enter to show. ]
CPU giving me 10**8 x,y pairs and computing pi in
more like 2.8
seconds wall time.
I guess I was hoping for a significant speedup going to a GPU
approach. (note I'm naturally uninterested in the actual value of pi!
I'm just trying to figure out how to get results out of a GPU. I'm
building a small cluster with 6 baby GPUs and I'd like to get smart
about making use of the resource)
I'm also a little worried about the warning I'm getting about "can't
query SMD group size". Looking at the source it appears the platform
is returning "Apple" as a vendor, and that case is not treated in the
code that checks.. so it just returns None. When I run
'dump_properties' I see that the max group size is pretty big!
Anyway.. I'll try your idea of using enqueue_marker to try to track
down what's really taking the time. (I guess 60% of it *was*
generating excessivly luxurious random numbers!) But I still feel I
should be able to beat the CPU by quite a lot.
and rerun your code. The driver will have written a profiler log file
that breaks down what's using time on the GPU. (This might not be true
on Apple CL if you're on a MacBook, not sure if that provides an
equivalent facility. If you find out, please report back to the list.)
Next, take into account a GT330M lags by a factor of ~6-7 compared to a
'real' discrete GPU, firstly in mem bandwith (GT330M: 25 MB/s, good
discrete chip: ~180 MB/s), and, less critically, in processing
power. Also consider that your CPU can probably get to ~10 MB/s mem
bandwidth if used well.