I am experimenting with convolution kernels for 3D arrays. I have a
scalar version that seems to work, but as I went about trying to
reduce extra data copies, I ran into problems. I isolated the change
in my Python code to two version that get the source Numpy array into
a PyOpenCL array, before calling the exact same OpenCL program.
First, the data-loading that seems to work on every driver:
# src is a sliced view on a larger Numpy array
src = src.astype(float32, copy=True)
src_dev = cl_array.to_device(clq, src)
This version makes an explicit host copy to consolidate the source
data into one contiguous buffer. The to_device() call would otherwise
throw an exception if called on the source.
Second, the version that works faster on Intel and AMD CPU drivers but
shows non-determinism on the NVIDIA GPU driver:
# src is a sliced view on a larger Numpy array
src = src.astype(float32, copy=False)
src_dev = cl_array.empty(clq, src.shape, float32)
src_tmp = src_dev.map_to_host()
src_tmp[...] = src[...]
This version avoids the host copy because the original source is
already in float32 format, and then it consolidates the data while it
is copied into the host-mapped buffer.
My tests have suggested that my OpenCL program is racing with the host
to GPU data copy, as it seems to see the leading portion of the
src_dev array filled with proper values while the trailing portion
looks uninitialized. Through repeated tests with varying problem
sizes, I managed to observe this on some rather small test arrays that
I could inspect manually.
Is there some other synchronization call that I am supposed to make
when writing into a host-mapped array as above? Or does this look
like a bug in the interaction between PyOpenCL and the NVIDIA OpenCL
driver?
Thanks,
Karl
Hello!
I've tried to use Amazon's AWS g2.2xlarge instance and found that it
works with half of it's calculating power.
Then i found that DeviceInfo.py shows that platform.version is "OpenCL
1.1 CUDA 6.5.20", when the same script on AMD card shows "OpenCL 1.2"
Does this version difference affect the behaviour of K520 ?
According to GRID K520 parameters, it is 2 GPUs in one product. Does
OpenCL 1.1 see only 1 GPU on K520?
How can i load K520 to its full power?
Thank you.