thanks again for the suggestions. Altough your assumption was right,
the problem is a different (see below)
Am 12.03.2013 20:47, schrieb Andreas Kloeckner:
See if the output of 'dmesg' contains lines with 'NVRM
Xid ... 13'. If
so, then that's the GPU equivalent of a segmentation fault. If not, then
something else may be up. In any case, the fact that the code executes
as desired on the CPU does not prove that it is correct...
When our application
crashes with "invalid command queue" error,
there also appears the mentioned line in dmesg, so it really looks like a
We also tried a version which holds the temporary lists only in private
the input buffers are READ_ONLY and the output buffers are WRITE_ONLY,
but the problem remains.
We found out, that the problem had s.th. to do with a big number of
which lead to a long run time. On nVidia Platform (both Windows and
Linux, don't know about Mac)
the processes on the GPU are killed after a time like 5-10 seconds or
so. I think the intention is to keep the GUI responsive.
The problem can be found here, too:
The funny thing is, that this behaviour only occurs when the X11-Server
If I kill X11 Server and run our application on console, it runs as
Ok, maybe a better solution is to only use kernels which are fast enough.
We can divide our algorithm into smaller steps, which was planned anyway.
So the problem is fixed, but if anyone has other ideas to solve the
runtime problem for nVidia devices,
let us know. This might also be useful for other OpenCL programmers.