On Sun, 25 Mar 2012 21:11:31 +0400, "Alexander Kiselyov"
I found a frustrating problem in pyopencl - after each kernel
host memory consumption increases by approx 1.5 MB. Taking into account
program workflow (modelling some amount of iterations on card, finishing
kernel, reading data via enqueue_copy, writing it to file, then starting
kernel again), I run out of my 4 GB of RAM after some thousands of such
In a previous project which was written on C++ I solved this problem this
way. An event object was dynamically created and passed to
enqueueNDRangeKernel(). After reading data from GPU event object was
deleted. Obviously it's impossible to use this method in Python.
Also it's worth to notice that the memory leak occures when using both CPU
or GPU. I'm using Intel CPU and nVidia GPU.
What can be done to fix the problem?
You had me scared there for a second. I was able to reproduce the
phenomenon you describe. Fortunately, it has nothing to do with
The issue is that OpenCL does not limit the size of the queue you build
up, and this is what's causing the growth in used memory. In my
experiments, if I run a 'queue.finish()' every 100 or so submitted
kernels, a) the resulting code runs faster and b) memory usage is flat.
It would be possible to stick this type of auto-finish behavior into
PyOpenCL as a non-default option, but I'm not fully convinced I should
Hope this helps,