[PyCUDA] global memory alignment issue
Yifei Li
yifli82 at gmail.com
Sat Jan 21 10:43:24 PST 2012
On Sat, Jan 21, 2012 at 12:17 PM, Andreas Kloeckner <lists at informa.tiker.net
> wrote:
> On Sat, 21 Jan 2012 11:24:33 -0500, Yifei Li <yifli82 at gmail.com> wrote:
> > On Fri, Jan 20, 2012 at 9:58 PM, Anthony LaTorre <tlatorre9 at gmail.com
> >wrote:
> >
> > > add the padding field!
> >
> > I know this will work. But without the padding field, the same program
> > written in C does not have any issue.
>
> I'm really not sure what you're trying to do here. Can you explain
> properly:
>
Sorry for the confusion.
>
> 0) what are you trying to do?
>
I'm trying to see if the struct example in the tutorial still works without
padding. And following is the kernel function I use:
__global__ void test(DoubleOperation *a) {
a = &a[blockIdx.x];
printf("block %d: %d\n", blockIdx.x, a->datalen);
The kernel is launched using 2 blocks, each of which has one thread.
> a) what is the problem?
>
I actually have two questions.
1) The example fails to work without padding, the second block prints the
wrong 'datalen'.
However, if I use CUDA runtime API instead of pycuda, the result is correct
even without the padding.
2) Since the size of the struct without padding is 12 bytes, I tried a
different struct but of the same size:
struct DoubleOperation{
float x;
float y;
float z;
}
And the kernel function is changed to
__global__ void test(DoubleOperation *a) {
a = &a[blockIdx.x];
printf("block %d: %f %f %fn", blockIdx.x, a->x, a->y, a->z);
But this time the values of x, y and z are printed correctly by both
blocks. So why does it work even though the struct's size is the same as
before?
> b) what is this mysterious 'C program' you keep referring to? CUDA C?
>
I translate the code using pycuda into the one using CUDA runtime API
> c) what have you tried?
>
I tried changing the order of the fields in the struct, but the second
block still prints the wrong 'datalen'
struct DoubleOperation {
float *ptr;
int datalen;
};
class VecStruct:
mem_size = 4 + numpy.intp(0).nbytes
def __init__(self, array, struct_arr_ptr):
self.data = cuda.to_device(array)
cuda.memcpy_htod(int(struct_arr_ptr) , numpy.intp(int(self.data)))
cuda.memcpy_htod(int(struct_arr_ptr)+8, numpy.int32(array.size))
> d) what were the outcomes?
>
> Andreas
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.tiker.net/pipermail/pycuda/attachments/20120121/7335b10d/attachment-0001.html>
More information about the PyCUDA
mailing list