Dear OpenCL users,
This issue is not directly related to pyopencl but rather to nvidia's OpenCL driver.
I discovered that the the support for OpenCL is variable depending on the underlying hardware.
The typical example is the `enqueue_fill_buffer` from OpenCL which is used for the initialisation in pyopencl.array.
This function is properly exposed in recent hardware (Kepler and newer)
but not is elder Fermi cards (and we still have a cluster full of Tesla
In : ary = pyopencl.array.zeros(queue, (10,10), "float32")
LogicError Traceback (most recent call last)
<ipython-input-7-d85341757b00> in <module>()
----> 1 ary = pyopencl.array.zeros(queue, (10,10), "float32")
/usr/lib/python3/dist-packages/pyopencl/array.py in zeros(queue, shape, dtype, order, allocator)
1973 result = Array(queue, shape, dtype,
1974 order=order, allocator=allocator)
-> 1975 result._zero_fill()
1976 return result
/usr/lib/python3/dist-packages/pyopencl/array.py in _zero_fill(self, queue, wait_for)
1191 cl.enqueue_fill_buffer(queue, self.base_data, np.int8(0),
-> 1192 self.offset, self.nbytes, wait_for=wait_for))
1194 zero = np.zeros((), self.dtype)
/usr/lib/python3/dist-packages/pyopencl/__init__.py in enqueue_fill_buffer(queue, mem, pattern, offset, size, wait_for)
1850 pattern = np.asarray(pattern)
-> 1852 return _cl._enqueue_fill_buffer(queue, mem, pattern, offset, size, wait_for)
1854 # }}}
LogicError: clEnqueueFillBuffer failed: INVALID_OPERATION
The same "bug" occurs in the PoCL driver when addressing nvidia GPU,
since the corresponding low-level primitive is absent in NVVM.
I wonder if we should best address this issue within our code or it
could be addressed at a higher level. Getting from nvidia that they fix
their code to conform for the specification is an illusion. But does it
make sense to address this as part of pyopencl ?
If so, I am willing to contribute with a patch.
Thanks for your advice,
I’m interested in using PyOpenCL with a Bittware 520MX board that has an Intel Stratix 10 with HBM memory. This shows up as 32 banks of 256 MB rather than as a single DDR memory system.
In order to use it, you must specify the flag CL_MEM_HETEROGENEOUS_INTELFPGA in calls to clCreateBuffer.
This seems like an easy addition to PyOpenCl and I wonder if anyone has done it already?
Otherwise, I suppose I need to get a PyOpenCL source kit, add the flag and use that rather than install with PIP? So far I haven’t found instructions for how to install from source.