The main problem with fancy indexing is that the transfer to and from
global memory becomes inefficient if you are not accessing successive
memory elements from successive threads (somewhat simplified; read about
coalescing for more details). So you can easily implement something like
`a[:,1] = 0`, but you will have to remember that it may be slower than `b =
a.transpose(); b = [1,:]=0; a = b.transpose()`. Same applies to random
access indexing like `b[a]` where `a` is an array.
Allocating memory for temporary arrays may be an issue too, because GPU
memory pools are not as large as typical RAM amounts, and there's no swap
file to help (although if you hit swap in numerical calculations you're
already doing something wrong).
On Sat, Jan 16, 2016 at 2:30 AM, Zac Diggum <Diggum(a)gmx.de> wrote:
thank you for your suggestions. I must admit I'd rather stick with using
high level functions coming with pyopencl or reikna. Writing my own
opencl kernels is a little out of reach for me. I'll deal with this when
I have more complex sub tasks to solve. That transposing thing of mine
works reasonably well and is still faster than padding on the host.
Newbie question: is it even possible that fancy indexing will work one
day on GPUs?
PyOpenCL mailing list