On Sat, 28 Jan 2012 09:51:20 +1100, Bogdan Opanchuk <mantihor(a)gmail.com> wrote:
On Sat, Jan 28, 2012 at 3:23 AM, Andreas Kloeckner
Indeed, inserting __syncthreads() after the
shared array declaration brings the error down to more reasonable values
for me. Jesse, my recommendation would be to use that as a workaround
while we figure out a more permanent fix.
Can't we do this:
> 1. Using "extern __shared__ out_type
sdata" and setting the size of
> shared memory when preparing the kernel.
We can pass dtype instead of ctype
get_reduction_kernel_and_types(), and convert it to ctype + data size
This issue has 'ick' written all over it. Unfortunately,
doesn't appear to quite cut it, as Thrust uses this bit of code here:
which casts form int4 and apparently serves to ensure alignment. (Nathan
pointed me to this.) I'm not sure why the native type wouldn't quite be
correctly aligned, so I guess I'm not fully understanding...