> Am 31.10.2017 um 03:37 schrieb Seth Thompson <thompssb(a)uah.edu>:
>
> Hello All,
>
> I have been trying to debug the following test kernel and I get an out of resources error on my Nvidia GTX 560M (its old!). When I switch to the Intel CPU, the out of resources error goes away but the code takes a very long time to return from the wait command. What I can see on the forums it seems there are three possibilities:
>
> 1. A mis-referenced pointer (I tested it on a smaller problem and no errors)
> 2. An overrun on an array (I tested it on a smaller problem and it gives correct results)
> 3. A watch dog timer is being tripped (~5 sec) for Nvidia
>
I tested your code with an Nvidia GT 750M with 2Gb of memory on macOS without problem, total run time about 1s.
Could it be that your memory resources of your GPU are exhausted? (Often you are not allowed to allocate all the installed memory)
> I do have an atomic add command in the kernel, which can cause a slowdown. I just don't think it would be slow enough to trip a timer but honestly I don't know and need a second pair of eyes. This also could be tied to a fundamental misconception on my part:
>
> I am using one queue. If I understand it correctly each kernel call will be run in order of the queue and call from the host. I added a wait event to my kernel call, when I remove the wait command from my kernel call I get dramatic speed up in kernel run time. However I cannot retrieve the data from the gpu if I remove the wait command (as if the kernel is still running but the host has been returned control of program flow). This slow data retrieval leads me to needing either a wait or finish command but both of these are slow. This brings me back around to the out of resources error. Where am I going wrong/misunderstanding? Does control to the host get returned prior to the completion of a kernel call? If it does not why would it take a long time for the data to be returned to the host in an .get() call? Do I need a wait/finish command?
>
>
>
Calling a kernel immediately returns, kernel execution takes place asynchronously to the host control program. But transferring data with .get() waits for the completion of the transfer (and all preceding tasks in the queue), thus it appears slow if you don’t wait before for the kernel execution to finish.
Gregor
> I have attached the python function and kernel files. I apologize if this has been answered elsewhere but I have searched and come up empty, of course I maybe searching for the wrong things.
>
>
> Thanks & Regards
>
>
> Seth
>
> <cell_count.cl><hang_test.py>_______________________________________________
> PyOpenCL mailing list
> PyOpenCL(a)tiker.net
> https://lists.tiker.net/listinfo/pyopencl
Hello All,
I have been trying to debug the following test kernel and I get an out of
resources error on my Nvidia GTX 560M (its old!). When I switch to the
Intel CPU, the out of resources error goes away but the code takes a very
long time to return from the wait command. What I can see on the forums it
seems there are three possibilities:
1. A mis-referenced pointer (I tested it on a smaller problem and no errors)
2. An overrun on an array (I tested it on a smaller problem and it gives
correct results)
3. A watch dog timer is being tripped (~5 sec) for Nvidia
I do have an atomic add command in the kernel, which can cause a slowdown.
I just don't think it would be slow enough to trip a timer but honestly I
don't know and need a second pair of eyes. This also could be tied to a
fundamental misconception on my part:
I am using one queue. If I understand it correctly each kernel call will be
run in order of the queue and call from the host. I added a wait event to
my kernel call, when I remove the wait command from my kernel call I get
dramatic speed up in kernel run time. However I cannot retrieve the data
from the gpu if I remove the wait command (as if the kernel is still
running but the host has been returned control of program flow). This slow
data retrieval leads me to needing either a wait or finish command but both
of these are slow. This brings me back around to the out of resources
error. Where am I going wrong/misunderstanding? Does control to the host
get returned prior to the completion of a kernel call? If it does not why
would it take a long time for the data to be returned to the host in an
.get() call? Do I need a wait/finish command?
I have attached the python function and kernel files. I apologize if this
has been answered elsewhere but I have searched and come up empty, of
course I maybe searching for the wrong things.
Thanks & Regards
Seth
Hi everybody.
I have again a problem on my Nvidia graphics cards an pyopencl. I wrote a simple kernel, that computes the (pixelwise) median
of an image and outputs that to another image. I also wrote an easy event visualiser in python's matplotlib
to have an idea about execution time.
I have a simple setup, where I have one context on one device with two queues.
I enqueue my copies to one queue and my kernel execution to the other queue, where
the copies are associated with an event for the kernel to wait for. What I expect is,
that when I repeat this process several times, the copies should be executed in parallel
to the kernel execution on the other queue. What I see is, that during kernel execution,
there is no parallel work on the other queue.
Is this a problem of my code or of Nvidias implementation?
I tried this on the following setups:
TITAN Xp on NVIDIA CUDA (driver version 384.81)
Tesla K10.G2.8GB on NVIDIA CUDA (driver version 375.39)
GeForce GTX TITAN on NVIDIA CUDA (driver version 384.81)
GeForce GTX TITAN on NVIDIA CUDA (driver version 375.66)
Here's an image showing the timeline of the execution of a median
on a numpy array with (512,512,512) size and a median window of 7.
[cid:d776dbf2-e974-4194-a929-5b12290956de]
All code is attached. To run it the tested cards have to be able to write to images and for the profiling
visualisation pyplot.matplotlib is needed.
Any help is very welcome, especially if this behaviour can't be reproduced on other (probably non-NVIDIA) setups.
Best
Jonathan Schock
La Forma Rúa <lcrua(a)utp.edu.co> writes:
> Hi i try to run pyopencl, but i get what's shown in the screenshoot.
> My python version is 3.6, i have installed the AMD SDK.
> What else i can try?
Find the PyOpenCL dll (_cffi.pyd or _cffi.dll) and use Dependency Walker
to see what DLL it's not finding.
Andreas
Hi i try to run pyopencl, but i get what's shown in the screenshoot.
My python version is 3.6, i have installed the AMD SDK.
What else i can try?
--
El contenido de este mensaje y sus anexos son únicamente para el uso del
destinatario y pueden contener información clasificada o reservada. Si
usted no es el destinatario intencional, absténgase de cualquier uso,
difusión, distribución o copia de esta comunicación.
Evan Sims <wx3(a)msn.com> writes:
> I still haven't been able to get this going.
>
>
> I have tried reinstalling and I am pretty sure that I only have one PyOpenGL. I do think that I needed to add PyOpenGL Accelerate to have the buffer available. I am pretty sure that Conda uses pip installed packages, but it does not seem to automatically use apt installed packages. Conda list does not show PyOpenCL, but that seems to be due to a formatting issue. I do think it could be an installation problem, but I am out of ideas on what it could be.
>
>
> The using PyOpenGL-Accelerate seems to get rid of the buffer attribute problem, but I still just get the following.
>
>
> Traceback (most recent call last):
> File "gl_particle_animation.py", line 147, in <module>
> cl_gl_position = cl.GLBuffer(context, mf.READ_WRITE, int(gl_position.buffer))
> File "/home/g2/Downloads/pyopencl/pyopencl/cffi_cl.py", line 2373, in __init__
> ptr, context.ptr, flags, bufobj))
> File "/home/g2/Downloads/pyopencl/pyopencl/cffi_cl.py", line 649, in _handle_error
> raise e
> pyopencl.cffi_cl.LogicError: clCreateFromGLBuffer failed: INVALID_CONTEXT
How many OpenCL implementations do you have installed?
gl_interop_demo.py silently uses the first one (that should probably be
fixed, patches welcome), but if that is not the AMD one (/the one
corresponding to the GL driver), then that would
explain your error.
HTH,
Andreas
PS: PLEASE keep the list cc'd on problems like this. For archival, and
also to help reduce my workload. Thanks!
Pascal Dupuis <cdemills(a)gmail.com> writes:
> 2017-10-11 11:39 GMT+02:00 Pascal Dupuis <cdemills(a)gmail.com>:
>> Hello Andreas,
>>
>> As there is no pyopencl package in CentOS 7, I compiled it from the
>> checkout sources on github.
>>
>> At first, compilation failed with lots of error. On Centos, software
>> are rather conservatives, gcc is
>> version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) . It seems that it
>> does enable c++-2011 by default. This can easily be solved as:
>> env CFLAGS="-std=c++11" make
>>
>>
>
> Hello,
>
> found the issue.
> 1) I compiled pycuda. This creates '/etc/aksetup-defaults.py' with
> CXXFLAGS = []
> 2) I compiled pyopencl. This reuses the defaults from '/etc/aksetup-defaults.py'
>
> It seems pycuda doesn't require c++11, so it is unset. When compiling
> pyopenc, it hangs.
Glad to hear you were able to sort out the issue. I've cc'd the list on
this reply for archival.
Best,
Andreas