I have definitely used parallel send,retreive and GPU calculations with
PyOpenCL on Nvidia devices using multiple queues, although I have done
the profiling via wrapping the pyopenCL events on the python host an
enriching timing information with different queues. It does work with
image objects as far as I know, if you are interested in a MWE, please
let me know.
On 04/24/2018 09:54 PM, pyopencl-request(a)tiker.net wrote:
Send PyOpenCL mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of PyOpenCL digest..."
1. Profiling events in PyOpenCL (Jerome Kieffer)
2. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
3. Re: Profiling events in PyOpenCL (Vincent Favre-Nicolin)
4. Re: Profiling events in PyOpenCL (Jerome Kieffer)
5. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
6. Re: Profiling events in PyOpenCL (Jerome Kieffer)
Date: Fri, 20 Apr 2018 17:26:15 +0200
From: Jerome Kieffer <Jerome.Kieffer(a)esrf.fr>
Subject: [PyOpenCL] Profiling events in PyOpenCL
Content-Type: text/plain; charset=UTF-8
As some of you may have noticed, Nvidia dropped the capability to
profile OpenCL code since Cuda8. I am looking into the profiling info
available in PyOpenCL's events if it would be possible to re-gernetate
Did anybody look into this ? It would prevent me from re-inventing the wheel.
I found some "oddities" while trying to profile mulit-queue processing.
I collected ~100 events, evenly distributed in 5 queues.
Every single event has a different command queue (as obtained from
event.command_queue) but they all point to the same object at the
C-level according to their event.command_queue.int_ptr.
This would be consistent with the fact that using multiple queues works
exactly at the same speed as using only one :(
Did anybody manage to (actually) interleave sending buffers, retrieving
buffers and calculation on the GPU with PyOpenCL ?
Thanks for you help