On Tue, Aug 25, 2009 at 10:10 PM, Andreas
On Dienstag 25 August 2009, James Bergstra wrote:
Does PyCuda support broadcasting? How can I add
a vector the rows or
columns of a 2 or 3-dimensional GpuArray?
Related: does PyCuda support viewing of sub-regions of other
GpuArrays? Like, can I operate on just the first few rows or columns
of a matrix?
Sub-region views are implemented in 1D, but no feature that requires "true"
multidimensional arrays is implemented just yet. However, that functionality
is definitely in the plan. If you need it sooner, patches are welcome.
I've been working on this sort of thing in my own corner, and was
hoping today that PyCUDA might already have done some of the
optimization of elementwise functions for different kinds of memory
layouts and broadcasting patterns. It's not straightforward.
It probably requires the expertise of a few people to get the design
right, so I'm reluctant even to try to put a patch together. First,
it requires some changes to the data container. Some of the issues
that come up are:
- what should be the strides for broadcastable dimensions (I like 0,
but numpy does it differently)
- should strides be in data-type units or byte units
- should strides and dimensions be stored in host memory, device
memory, or both (how/when should they be synchronized?)
As the data structure gets more complicated, the kernels become more
complex too. My experience is that all kernels have to have a
"general" version that is pretty slow, and progressively, more and
more special cases get optimized. Kernel code generators get bloated.
How many kinds of kernels are there in PyCUDA right now? (Given that
the same code-generator can produce many elementwise kernels, I mean
to count that as one *kind* of kernel.) How many things would break
if arrays were strided?