I did some more digging on this error... first of all, doing a C++
re-implementation of the code I was not able to reproduce it, so at this
point I'm assuming it's a bug in PyCUDA.
By enabling CUDA traces in PyCUDA, I was able to nail the error down to
a cuMemFree call that fails with code 700 (which is a
CUDA_ERROR_ILLEGAL_ADDRESS error). Interestingly, the error goes away if
I manually delete the memory. Meaning the following code runs through
without a hitch:
import pycuda.autoinit
from pycuda import gpuarray
from scikits.cuda.cublas import cublasSgemm
import scikits.cuda.autoinit
from scikits.cuda.misc import _global_cublas_handle as handle
n, m, k = 131, 2483, 3
for i in range(5):
print i
s = slice(128, n)
b = gpuarray.empty((m, k), dtype=np.float32)
c = gpuarray.empty((m, m), dtype=np.float32)
a = gpuarray.zeros((n, m), dtype=np.float32)
ks = a[s].shape[0]
cublasSgemm(handle, 'n', 'n', m, m, ks, np.float32(1.0),
a[s].gpudata, m, b.gpudata, k, np.float32(0.0), c.gpudata, m)
del(c, a, b)
However, if I comment out the `del` statement on the last line, the
error re-appears. If I switch to using a DeviceMemoryPool allocator, the
error will appear as soon as I call `DeviceMemoryPool.free_held()`.
Cheers
Thomas