Why should the overhead be measured separately? For users of these
systems, the Python overhead is unavoidable. The time spent running
on the GPU alone is an important implementation detail for people
improving systems like PyCUDA, but users of these systems see overhead
costs exposed in their overall application performance, and so I don't
see how the overhead can be ignored.
- bryan
On Wed, May 30, 2012 at 9:47 PM, Andreas Kloeckner
<kloeckner(a)cims.nyu.edu> wrote:
On Wed, 30 May 2012 20:31:40 -0700, Bryan Catanzaro
<bcatanzaro(a)acm.org> wrote:
Hi Igor -
I meant that it's more useful to know the execution time of code
running on the GPU from Python's perspective, since Python is the one
driving the work, and the execution overheads can be significant.
What timings do you get when you use timeit rather than CUDA events?
Also, what GPU are you running on?
timeit isn't really the right way to measure this, I think. There's some
amount of Python overhead, of course, and it should be measured
separately (and of course reduced, if possible). Once that's done, see
how long the GPU works on its part of the job for a few vector sizes,
and then figure out the vector size above which the Python time is as
long as the GPU time and see where that sits compared to your typical
data size.
That would be more useful, IMO.
Andreas
--
Andreas Kloeckner
Room 1105A (Warren Weaver Hall), Courant Institute, NYU
http://www.cims.nyu.edu/~kloeckner/
+1-401-648-0599