Michael Boulton <michael.boulton(a)bristol.ac.uk> writes:
I'm in Simon McIntosh-Smith's group at Bristol university and
using PyOpenCL for a couple of weeks to convert some old fortran code,
but I'm having an issue with it and Simon suggested that I talk to you
Sure. I've cc'd the list--hope you don't mind.
The problem is that whenever I get the OpenCL platforms (whether it
indirectly by doing create_some_context() or by directly calling
get_platforms()) it allocates either 32 or 64 gigabytes of memory
(seemingly at random depending on the system and type of devices). If I
try to delete the platform objects then the memory still stays there, so
it means that whenever I start a run I'm allocating a huge chunk of
memory that I can never deallocate.
I don't think those are "real" memory allocations in the sense that they
are backed by physical system memory. You're probably seeing them in
"top" (or similar). I'm guessing they might be some sort of aperture
into which the driver maps GPU memory and various other stuff. Looking
at /proc/self/maps from within the process should give you a better idea
of what exactly is being mapped.
I'm trying to do something with multiple threads at the moment
am looking what devices are available in the main thread, and spawning
one more thread for each device. if there's 2 GPUs and a CPU on the
system, this results in it allocating over 200 GB of memory instantly
which is obviously not intended. Whenever I try to create a context
after this happens then it throws a "RuntimeError: Context failed: out
Are you sure you're putting the contexts onto different devices?
Contexts are quite memory-hungry on the device side (on Nvidia).
(which I'm also guessing should actually show up as a
Could you check the type of the exception? I don't see how the current
code would throw a non-pyopencl exception.
Before I was getting the devices like this I was trying to do it
way, but I was running into another problem which I think may be related
to some weird internal python thing. I was initially trying to create a
context/command queue for each device in the main thread then sending it
to each spawned thread (I assume it pickles it to do this - I'm not that
well versed on the internals of python)
No, threads share data and address space directly. No pickles.
then the command queue will 'become' invalid at some point.
queue.finish() would throw an 'invalid queue' exception, but trying to
launch a kernel using the queue would cause it to just hang silently
and I'd have to kill the process in linux.
That's also how (Nvidia) OpenCL "reports" segmentation faults (for
instance), i.e. bugs in your code. Are you sure there aren't any bugs in
your code that might cause the device to crash?
Alternatively, have you looked at the output of 'dmesg' to see if
there's anything incriminating? (The messages may look like gibberish,
but they might say something important.)
Hope this helps,