Just wondering if there are plans/timelines for implementing the Numpy multidimensional
and fancy indexing/slicing in PyCUDA? We use this feature a lot to do repetitive data
processing. There is not much of a performance improvement from indexing on CPU then
passing to GPU, processing, then passing back. It seems to me that passing a single
multidimensional array to GPU memory then indexing/slicing and processing entirely on the
GPU would provide substantially greater performance improvement?
Show replies by date