I think this is a good, simple & platform independent approach. There are very few options when using pyopencl on nvidia GPUs.

On Fri, Mar 9, 2018 at 10:09 PM, Karl Czajkowski <karlcz@isi.edu> wrote:
On Mar 09, aseem hegshetye modulated:
> Hi,
> Is there a way to debug and/or print intermediate variables while running a c kernel on GPu via pyopencl.

This may sound funny, but if the problem isn't too irregular, I have
sometimes found it valuable to use ndarray outputs as a "printf".
Each kernel task has a different index into the array and outputs its
intermediate variable(s) along another temporal axis.  Then, I
visualize the ndarray on the host by applying color mapping and
slicing through the array to see patterns where each kernel has a
state mapped to a pixel.