first off, please make sure the list stays cc'd so that there is a
permanent record of what we find.
Michael Boulton <michael.boulton(a)bristol.ac.uk> writes:
I'm not sure what ICD loader it's using, whatever the default
one is on
the systems. Is there a way to find out?
What's the last CL runtime that you installed? (Check the timestamp of
libOpenCL.so.1 to match up with install dates.) Perhaps try 'strings
This is a greatly stripped down version of the code that causes the
The way it's used in the original code is that I have ~200 things to
run, and using a processing pool then I can limit it to only run as many
threads as there are devices. I then use itertools.cycle to create a
cycling iterator over the device ids (I use xrange in this example)
which passes the next free device id to each thread so it know which
device id to use (in the real code I'm using a semaphore to make
absolutely sure they're not being used at the same time, but I don't
think it's needed?). If I'm doing something really stupid then that
would be good to know!
I've tried this with CPU-only devices, and it's fine. I believe that the
reason this fails with GPUs is because fork() is unsafe once the Nvidia
ICD is initialized. I imagine that the this happens on the very first CL
call. The initialization probably maps some memory from the GPU into the
process's address space, and it's unclear what it means for two
processes to be fighting over a single map, if the map even survives the
fork. I've asked Nvidia this a long while back, and their answer was,
"don't do it."
One other thing I forgot to mention is that I find it a bit confusing
that platform.get_devices throws an exception when there are no devices
of the specified type available, when it seems like it would make more
sense to just return an empty list. Is that just so that it causes some
kind of explicit error like how clGetDeviceIDs will return
Fixed in git, thanks.
Hope that helps,