Thanks for the tips, they helped a lot. I've got the main loop down to this:
# set-args ahead of time
queues = [p._enqueue_args for p in plans]
kerns = [p._enqueue_args for p in plans]
gsize = [p._enqueue_args for p in plans]
lsize = [p._enqueue_args for p in plans]
[map(cl.enqueue_nd_range_kernel, queues, kerns, gsize, lsize)
for i in xrange(n_calls)]
This appears to be low-enough overhead for the graph I'm working with, the
end-to-end wall time is about the same as the total time in the kernels, as
far as I can tell with all the noise in the timing measurements.
On Tue, May 7, 2013 at 6:56 PM, Andreas Kloeckner
James Bergstra <james.bergstra(a)gmail.com> writes:
Hi, I have written an opencl program that
involves relatively small
kernels. For a certain benchmarking script, I have added up the time used
by kernels as 0.06 seconds, while the tightest python loop I can think of
still requires .2 seconds to execute the 5000-or-so kernel calls. The
program involves repeatedly looping through the same kernels, with the
arguments, so I was wondering if there was a way
to enqueue several nd
range kernels at once, at least from Python's perspective. Is there such
In other words, supposing I have kernels A and B, taking arguments x and
my program consists of:
A(x); B(y); A(x); B(y); ....
Ideally, I would like to enqueue 100 copies of the kernel sequence [(A,
(B, y)], but being able to enqueue even [(A, x),
(B, y)] with one call
instead of 2 could be a big help.
What you're saying is that Kernel.__call__ is too slow for your current
First off, it'd be great if you could take a look at Kernel.set_args:
to see if there's any fat that could be trimmed from your
perspective. I've tried to keep this code path as quick as I could, but
there might be something I've overlooked.
Next, if there's nothing to be had in that direction, you can simply
call Kernel.set_args once and then repeatedly call
cl.enqueue_nd_range_kernel() as done in Kernel.__call__ (see source link
above). That should get reasonably close to the rate that the OpenCL API
itself can sustain.
Hope that helps,