[PyCUDA] Do gpu function calls and context syncing show in cProfile?

Bogdan Opanchuk mantihor at gmail.com
Wed Apr 4 07:11:17 PDT 2012


Hello Tomi,

On Thu, Apr 5, 2012 at 12:01 AM, Tomi Pieviläinen
<tomi.pievilainen at iki.fi> wrote:
> On Wed, Apr 04, 2012 at 04:35:21PM +0300, Tomi Pieviläinen wrote:
>> Hi all,
>>
>> after profiling my script, pstats shows that significant amount of
>> time is spent in a function that only calls
>> pycuda.autoinit.context.synchronize() and few cuda kernel calls
>> depending on a setting (different calls in brances of if).
>>
>> Is it really possible, that most of the time is spent processing that
>> if, or are syncing or gpu-function calls somehow skipped in cProfile?

Synchronization call on host waits for all queued kernels. This is why
you see it in the profiler — all kernel invocations are quite fast, so
the host code just reaches the synchronization call and waits there.
If you want to profile the calls themselves, temporary replace stream
argument to kernels by None, which will make them run synchronously.

> Well, I ran it through the line_profiler, and it seems like 99% of the
> time is spent on synchronization calls. Weird that the syncthreads
> within the kernel code doesn't cause much delays (I'm running only one
> block, so they should be equivalent, right?).

Synchronization call inside the kernel synchronizes warps inside the
block and does not have anything to do with your problem.



More information about the PyCUDA mailing list