I modified add_dot() to use cublas.xt.cublasXtSgemm. I don't think I
need to modify dot() because its calling add_dot at the end. Its not
calling cublasxt.cublasXtsgemm directly unless my matrix is 1d (which
it isn't) Correct?
BTW, smaller matrices work fine its just for larger matrices.
On Mon, Nov 23, 2015 at 11:35 AM, Lev Givon <lev(a)columbia.edu> wrote:
Received from Keith Brown on Mon, Nov 23, 2015 at
I have a 2 small matrix (160080,3) of type
float32 and I am
calculating their dot product. While doing this, I keep getting
pycuda.__driver.MemoryError: cuMemAlloc failed out of memory.
I have 2 cards, each with 3GB of memory. Each matrix takes about 1875
kilobytes. I am not sure why this is occuring.
c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle)
My handle is a cublasxt (not regular cublas since blasxt apprently
does better memory handling).
Any idea what is going on?
Did you also modify skcuda.linalg.dot() to explicitly call the cublasXt*gemm
functions rather than the stock cublas*gemm functions? The cublasXt*gemm
functions expect host memory pointers as their arguments, not GPU memory
Bionet Group | Neurokernel Project