my experience with trying to cuda-ize svd/nmf calculations is that
they're not really a good fit for cuda. specifically, most of your
expensive operations are matrix multiplications over very long and
narrow matrices. (mxk or kxn), where m~=n (within an order of mag) but
k<<(m|n). even when m~=2^16 (the max for cublas matrices) and k<2^8, i
was barely breaking even with normal cpu-based blas libs.
ananth ranga wrote:
I am Ranga a new member to the group. I have a problem of
finding svd of a matrix of size 120*100. On a CPU with the VTK
implemented version its taking about 5 ms for evaluation. So I was
wondering if a pycuda version of it could give me abetter reult
regarding the speed.
If any one has a pycuda version of SVD calculation could you please help me out.
PyCUDA mailing list