On Wed, 11 May 2011 13:53:41 -0400, James Bergstra <james.bergstra(a)gmail.com>
Another possibility is to start a new project that is
distinct from PyCUDA and PyOpenCL, although certainly built with
interoperability in mind. PyCUDA and PyOpenCL could be seen as
lightweight wrappers around CUDA and OpenCL. Since neither of these
standards has an n-dimensional tensor datatype, then it could be seen
as a loss of focus to add one.
Disagreed, given that
a) they both already have one
b) almost everyone using them will be processing some sort of array
At the same time, a standalone project could provide a
datatype that is
compatible with *both* CUDA and OpenCL.
Again, disagreed--there are enough interface differences to make this
annoying to write, ranging from explicit-vs-implicit context, to
scheduling, to CL needing buffer+index where as CUDA is ok with just a
What I think *is* possible is for there to be some common
infrastructure--and that's exactly where I'm headed with this package
'compyte' that you may have noticed.
The datatype consists essentially of
a float pointer and shape and stride info. Since host-allocated OpenCL
buffers can be de-referenced from non-OpenCL code, and I understand that in
some cases OpenCL-allocated buffers can be passed to CUDA routines (is this
(not as far as I know)
it would be nice if the data-type helped to gloss over
differences as much as possible.
Agreed--the two types should be as similar as humanly possible, which is
perhaps technically feasible by deriving from the same base class.
So how about starting a new project with the datatype
that Fred describes,
with the small addition of like a 'context' pointer or something, and
possibly a 'context_type' enum that says whether it was an OpenCL or a CUDA
or a HOST pointer that is allocated.
I'd like such a project--but I think it makes more sense to put this
into 'common' infrastructure rather than on top as a separate package.
The type can provide a python binding in C or python
that supports numpy
compatibility in cases where the data is stored in a way that is compatible
with host-side pointers.
Easily doable by implementing the PEP 3118 array interface.
The library can provide native C implementations of
calls to CUBLAS,
CUSPARSE, CUFFT, CURAND, and possibly some basic unary and binary
I'm hesitant to glue CU* into the interface, but I support being able to
abstractly ask for these things. (As in, "FFT this array.")
I think the main value proposition comes from implementing basic
arithmetic (with broadcasting, views, and strides) well, the FFT,
sparse, etc are just cherries on top.
This would essentially be a C package at first I
think, and so
operations that require any amount of code generation should be handled by
higher level packages in different programming languages (e.g. python).
This sounds a bit like a proposal to reimplement numpy
It absolutely is--but numpy is simply not a GPU/CL-capable package, and
it won't easily grow to be one.
but the basics seem straightforward and in fact
already done between
Theano and PyCUDA.
Right. (Famous last words.)
Also there's I think not the same pressure on the
interface in Python
to be good - downstream projects such as PyCUDA, Theano, and gnumpy
can make the interface more or less pythonic. It's just a matter of
refactoring. I don't know of another project that has done this.
I guess our ideas about this aren't as different as I thought at
first--the fact that you say PyCUDA is 'downstream' of that seems to say
you're talking about the same type of 'common infrastructure' that I had
We could do a Skype conference to kick this off.