[PyCUDA] pyCUDA and streams
paulsson.m at gmail.com
Mon Mar 21 12:20:39 PDT 2011
The visual profiler shows overlaping mem-copies and execution for the
Working.py. You are probably staring at your computer so if you are in
doubt, try it :D
(and this was one of my original questions ... how do you profile the
code if the profiler is obviously broken?)
On Mon, Mar 21, 2011 at 8:04 PM, Andreas Kloeckner
<lists at buster.tiker.net> wrote:
> On Mon, 21 Mar 2011 19:55:31 +0100, Magnus Paulsson <paulsson.m at gmail.com> wrote:
>> > Wild theory: Maybe the print statements introduce GPU synchronization?
>> > Does your observation change with multiple loops through the code?
>> > Also note that the profiler won't help you debug overlap. If it is
>> > active, all GPU activity is synchronous.
>> > Andreas
>> No. None of the above. The "Working.py" code runs overlapping using
>> the profiler including print statments.
> CUDA 4.0 programming guide, 184.108.40.206:
> "When an application is run via a CUDA debugger or profiler (cuda-gdb, CUDA
> Visual Profiler, Parallel Nsight), all launches are synchronous."
> (and that sentence has been around for a few versions)
> Either you are or that sentence is wrong. :)
School of Computer Science, Physics and Mathematics
More information about the PyCUDA