[PyCUDA] cuBLAS on gpuarray
Paul Northug
pnorthug at gmail.com
Sat Apr 17 15:29:18 PDT 2010
Ying Wai (Daniel) Fan <yfan at ...> writes:
>
>
> > 2. When I do sgemm(a, b, c) where a and b are gpuarray's, I am getting
> > c = np.dot(b, a) instead of c = np.dot(a, b). Does gpuarray convert
> > row major format to something else (column?) in its internal
> > representation? Or am I calling sgemm incorrectly?
> >
> I have a wrapper for CUBLAS in my Python package PARRET. I know exactly
> what is happening here. Let me just quote what I have in PARRET's
> documentation.
>
> Since GPUArray stores matrix entries in row-major ordering, but CUBLAS
> assumes column-major ordering, caution need to be taken when passing
> GPUArray objects as arguments.
>
> * No change need to be made for BLAS 1 functions.
> * For BLAS 2 functions, the matrix is interpreted as transposed
> matrix, so the transp flag need to be set accordingly.
> * For BLAS 3 functions, the input matrices and output matrix are
> interpreted as transposed matrices, so the order of matrix
> multiplication need to switched, while the transp flags should
> remain unchanged.
>
> Ying Wai (Daniel) Fan
>
>
That answers my question, thanks. This behavior is what confused me:
a = np.random.randn(4).astype(np.float32).reshape((2,2), order='f')
np.all(a.T == gpuarray.to_gpu(a).get())
True
Thanks for pointing me to PARRET. You have already done what I wanted to do, I
hope. I may also use convolution instead of what I was doing as I am not getting
the desired speedup with sgemm.
There is a small typo where you load cuda libraries with ctypes. If the platform
is not Linux, the final 'else' always gets executed.
More information about the PyCUDA
mailing list