On Mon, 11 Apr 2011 15:25:22 +0200, Martin Hammerschmied <gestatten(a)gmail.com>
i'm having an issue using a kernel i wrote. the goal is to simulate a
recurrent neural network. the experimental code is attached. it
contains two sequential simulations, the first one runs on the GPU,
the second one on the CPU. just for performance comparison.
everything seems to work just fine as long as:
- the number of timesteps (num_samples) is below 20k
- the size of the network (res_dim) is below 300 nodes
both results are the same, everything works as expected (GPU -> cracy
fast, CPU -> ridiculously slow). but as soon as these numbers get too
high, something strange happens. the computation seems to work but i
receive the following exception when copying the results back to
python via gpuarray.get()
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
line 122, in runfile
File "/home/.../sandbox.py", line 71, in <module>
tmp = gpu_x.get()
line 115, in get
LaunchError: cuMemcpyDtoH failed: launch timeout
when i leave out the get() line everything seems to work. the same
thing happens whithout X running.
does someone have an idea what's going on?
CUDA reports errors belatedly. (In your case, the 'segfault' in your
kernel is reported on the next memory transfer.) If you'd like to
convince yourself that the error is there before the transfer, call
driver.Context.synchronize() right after your kernel call.