Received from Ahmed Fasih on Wed, Nov 07, 2012 at 11:07:31PM EST:
Thanks Lev! These gists were really useful in
understanding how to use
these functions, and they work for me too. Nonetheless, I tried and
succeeded in breaking the second one: see
First, I had to add "assert" in the calls to np.allclose to make sure
I'd be informed if things weren't all close. Then I extended the
kernel to work with multiple blocks, and finally I moved the unpinned
test first. As I increased N from 20 to 22, both tests passed. But at
N=23 (23 by 23 array), although the unpinned version works, the pinned
assertion fails and PyCUDA complains that cleanup operations failed.
I can't find any documented limit on the size of page-locked memory
allocations, but it ought to be >3kb, right?
I'm not aware of any such limits.
Ubuntu 11.10, NVIDIA driver 304.51, CUDA 5, PyCUDA
C2050. If you or any other kind soul is able to successfully run this
gist, let me know! https://gist.github.com/4036693
When N*N > 512, the mismatch between array size
(np.double().nbytes*N*N) and the default alignment assumed by
pycuda.driver.aligned_empty() (4096) prevents all of the array elements from
being properly updated; if you preallocate a device-mapped array, you
don't need to worry about setting the alignment.