Looking at old mailing list messages, it looked like there had been past
interest in wrapping the AMD OpenCL BLAS library, but that no-one had
gotten around to it yet:
So I wrote my own wrapper the other day, only to find that Lars Ericson has
just done a very similar thing:
There are some differences, though. I'm interfacing with the new clBLAS
library as available on GitHub (https://github.com/clMathLibraries/clBLAS
The interfaces are also different: Lars incorporates a lot more stuff in
the Cython files, so that you can write your own Cython programs to use the
BLAS functions, whereas my approach is just to make basic wrappers for the
functions so that they're easy to call on PyOpenCL Arrays from Python.
I haven't done a lot of profiling yet, but the performance seems to be
pretty good on my NVIDIA GPU, especially for larger matrices (I haven't
done a comparison with cuBLAS, though).
My wrapper is available here:
Hopefully it's not too difficult to install; I've only tried on Linux.