Hi everybody,
I have definitely used parallel send,retreive and GPU calculations with
PyOpenCL on Nvidia devices using multiple queues, although I have done
the profiling via wrapping the pyopenCL events on the python host an
enriching timing information with different queues. It does work with
image objects as far as I know, if you are interested in a MWE, please
let me know.
Regards
Jonathan
On 04/24/2018 09:54 PM, pyopencl-request(a)tiker.net wrote:
> Send PyOpenCL mailing list submissions to
> pyopencl(a)tiker.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.tiker.net/listinfo/pyopencl
> or, via email, send a message with subject or body 'help' to
> pyopencl-request(a)tiker.net
>
> You can reach the person managing the list at
> pyopencl-owner(a)tiker.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of PyOpenCL digest..."
>
>
> Today's Topics:
>
> 1. Profiling events in PyOpenCL (Jerome Kieffer)
> 2. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
> 3. Re: Profiling events in PyOpenCL (Vincent Favre-Nicolin)
> 4. Re: Profiling events in PyOpenCL (Jerome Kieffer)
> 5. Re: Profiling events in PyOpenCL (Andreas Kloeckner)
> 6. Re: Profiling events in PyOpenCL (Jerome Kieffer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 20 Apr 2018 17:26:15 +0200
> From: Jerome Kieffer <Jerome.Kieffer(a)esrf.fr>
> To: pyopencl(a)tiker.net
> Subject: [PyOpenCL] Profiling events in PyOpenCL
> Message-ID: <20180420172615.6b1072d9(a)lintaillefer.esrf.fr>
> Content-Type: text/plain; charset=UTF-8
>
> Dear all,
>
> As some of you may have noticed, Nvidia dropped the capability to
> profile OpenCL code since Cuda8. I am looking into the profiling info
> available in PyOpenCL's events if it would be possible to re-gernetate
> this file.
>
> Did anybody look into this ? It would prevent me from re-inventing the wheel.
>
> I found some "oddities" while trying to profile mulit-queue processing.
> I collected ~100 events, evenly distributed in 5 queues.
>
> Every single event has a different command queue (as obtained from
> event.command_queue) but they all point to the same object at the
> C-level according to their event.command_queue.int_ptr.
>
> This would be consistent with the fact that using multiple queues works
> exactly at the same speed as using only one :(
>
> Did anybody manage to (actually) interleave sending buffers, retrieving
> buffers and calculation on the GPU with PyOpenCL ?
>
> Thanks for you help
>
Dear all,
As some of you may have noticed, Nvidia dropped the capability to
profile OpenCL code since Cuda8. I am looking into the profiling info
available in PyOpenCL's events if it would be possible to re-gernetate
this file.
Did anybody look into this ? It would prevent me from re-inventing the wheel.
I found some "oddities" while trying to profile mulit-queue processing.
I collected ~100 events, evenly distributed in 5 queues.
Every single event has a different command queue (as obtained from
event.command_queue) but they all point to the same object at the
C-level according to their event.command_queue.int_ptr.
This would be consistent with the fact that using multiple queues works
exactly at the same speed as using only one :(
Did anybody manage to (actually) interleave sending buffers, retrieving
buffers and calculation on the GPU with PyOpenCL ?
Thanks for you help
--
Jérôme Kieffer