Justin Heinermann <Justin.Heinermann@...> writes:
thanks again for the suggestions. Altough your assumption was right,
the problem is a different (see below)
Am 12.03.2013 20:47, schrieb Andreas Kloeckner:
> See if the output of 'dmesg' contains lines with 'NVRM Xid ... 13'.
> so, then that's the GPU equivalent of a segmentation fault. If not, then
> something else may be up. In any case, the fact that the code executes
> as desired on the CPU does not prove that it is correct...
When our application crashes with "invalid command queue" error,
there also appears the mentioned line in dmesg, so it really looks like a
We also tried a version which holds the temporary lists only in private
the input buffers are READ_ONLY and the output buffers are WRITE_ONLY,
but the problem remains.
We found out, that the problem had s.th. to do with a big number of
which lead to a long run time. On nVidia Platform (both Windows and
Linux, don't know about Mac)
the processes on the GPU are killed after a time like 5-10 seconds or
so. I think the intention is to keep the GUI responsive.
The problem can be found here, too:
The funny thing is, that this behaviour only occurs when the X11-Server
If I kill X11 Server and run our application on console, it runs as
Ok, maybe a better solution is to only use kernels which are fast enough.
We can divide our algorithm into smaller steps, which was planned anyway.
So the problem is fixed, but if anyone has other ideas to solve the
runtime problem for nVidia devices,
let us know. This might also be useful for other OpenCL programmers.
I wanted to thank you for this point as I have run into this same problem a
few times during the last few months, when writing experimental not-
necessarily-optimal code. A very annoying feature and one everyone working
with nVidia GPUs, OpenCL and some OS with X11 server should be aware of.
The problem itself can be very painful to identify if the timeout point does
not pop into one's mind, as removing a line there and another here might
just squeeze the execution of the kernel to the right side of the timeout
and one will faultly think that e.g. "hmm, ok, so I can't perform
normalize() inside three for loops? That's a tad strange..."