[PyOpenCL] Trouble understanding/applying ReductionKernel
lists at informa.tiker.net
Thu Jan 19 12:27:38 PST 2012
On Thu, 19 Jan 2012 12:30:54 -0700, Steve Spicklemire <steve at spvi.com> wrote:
> opencl/cuda Newbie here.. trying to use pyopencl/pycuda to learn my way around (use python a lot!) I have examples of what I've been trying to do to get familiar with the software. I'm trying to do an MC calculation of pi using the ReductionKernel. Here's what I've found:
> I'm running on a macbook pro with GeForce GT 330M graphics.
> I must be missing something basic. Both of these approaches are very
I.e. 10**8 samples in 15s, that's 6M samples/s. What's your reference
value? Also note that clrandom has a 'luxury' value that can be turned
down to get random numbers faster. Further, it might be good to know
what part is slow. Python profiles are unfortunately unhelpful, as the
GPU runs asynchronously and only blocks on the outbound data transfer
(that's clearly visible in the CL profile, PyCUDA seems a bit more
Use cl.enqueue_marker with a profiling-enabled command queue to figure
out what is actually taking the time, the reduction or the RNG.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the PyOpenCL