Hi PyCUDA community and author, thanks for your continued fantastic
support. Today I am running into an unexpected problem when I try to bind a
big chunk of memory to a texture. I've reduced the problem to the following
stand-alone illustrative example, but I apologize, because the example
contains some specifics about my usecase (float2 data, 2D texture---the end
use of all this is to do matrix filtering):
### code starts
import pycuda.driver as cuda
import pycuda.autoinit
from pycuda.compiler import SourceModule
# borrowed from PyCUDA test
mod = SourceModule("""
#define CHANNELS 2
texture<float2, 2, cudaReadModeElementType> mtx_tex;
__global__ void copy_texture(float *dest)
{
int row = threadIdx.x;
int col = threadIdx.y;
int w = blockDim.y;
float2 texval = tex2D(mtx_tex, row, col);
dest[(row*w+col)*CHANNELS + 0] = texval.x;
dest[(row*w+col)*CHANNELS + 1] = texval.y;
}
""")
mtx_tex = mod.get_texref("mtx_tex")
width = 32*1024
height = 4330
nbytes = width * height * 2 * 4 # 2 floats per float2, 4 bytes per float
gpu = cuda.mem_alloc(nbytes)
HACK = 3
mtx_tex.set_address(gpu, nbytes / HACK)
### code ends
The above code runs on my Telsa C2050, for values of "HACK" >= 3, that is,
asking set_address() to bind only one-third (or less) of the device memory
needed into the texture unit. I need it to work for HACK=1, but I get an
error:
### error begins
Traceback (most recent call last):
File "texbind.py", line 26, in <module>
mtx_tex.set_address(gpu, nbytes / HACK)
pycuda._driver.LogicError: cuTexRefSetAddress failed: invalid value
### error ends
I have verified (using the deviceQuery SDK demo) that these sizes are not
too large for this C2050, and I note that I can seem to bind the texture to
freshly-copied data, i.e., I can successfully run the following without
error after the above:
### begin code
import numpy
data = (numpy.ones((height,width)) + 1j *
numpy.ones((height,width))).astype(numpy.complex64)
carr = cuda.make_multichannel_2d_array(numpy.asarray(numpy.concatenate(
(data.real[:,:,numpy.newaxis], data.imag[:,:,numpy.newaxis]),2),
order='C'), order='C')
cuda.bind_array_to_texref(carr, mtx_tex)
### end code
The above's not the prettiest way to load a complex array from host into a
2D float2 texture but it doesn't produce any errors and theoretically
allocates the same amount of memory as in the first code example (NB: I
haven't verified the texture's contents with a kernel yet).
Any hints as to what I'm doing wrong?
Thanks,
Ahmed