I forgot to reply to the list.
On Wed, Jan 18, 2012 at 3:01 AM, Andreas Kloeckner
On Tue, 17 Jan 2012 16:55:22 -0500, Yifei Li
the '__padding' from the structure
definition and got incorrect result.
The kernel is launched with 2 blocks and one thread in each block.
Each thread prints the 'len' field in structure, which should be 3 for
block 0 and 2 for block 1. However, the result I got is:
block 1: 2097664
block 0: 3
No such problem if I write the following program using C. Any help is
It seems CUDA doesn't automatically align the pointer, without being
How do I tell CUDA to align data automatically? If this is a CUDA problem,
how come the C program does not have any issue?
If I replace the structure
with a different structure of the same size (12 bytes)
float x, y, z;
The values of x, y and z are printed correctly.