On Wed, 30 May 2012 21:58:13 -0700, Bryan Catanzaro <bcatanzaro(a)acm.org> wrote:
Why should the overhead be measured separately? For users of these
systems, the Python overhead is unavoidable. The time spent running
on the GPU alone is an important implementation detail for people
improving systems like PyCUDA, but users of these systems see overhead
costs exposed in their overall application performance, and so I don't
see how the overhead can be ignored.
Because whether the overhead matters or not depends on data size. Since
the overhead is constant across all data sizes, that overhead is going
to be mostly irrelevant for big data, whereas for tiny data it might
well be a dealbreaker.
That's why I think a single number doesn't cut it.
In addition, there's an underlying assumption that you'll keep the GPU
busy for a while, i.e. keep the GPU queue saturated. If you do that (the
ability to do that being related, again, to data size), then on top of
that anything Python does runs in parallel to the GPU--and your net run
time will be exactly the same as if the overhead never happened.