[PyCuda] A little test I did...
Vincent.Favre-Nicolin at cea.fr
Thu May 28 07:01:23 PDT 2009
Le jeudi 28 mai 2009 15:34:57, Hua Wong a écrit :
> Thanks, I'm also puzzled by the results because I thought a 1e4*1e4
> matrix was already ginormous...
> I expected something like a 49 time speedup like in the
> test_gpuarray_speed_random.py (size ~16000000 give a x49 speedup).
> So I guess I'm doing something wrong somewhere. I will check the test
You should ask on the CUDA forums - but it's not very likely you could get a
really large speedup - you are performing elementwise multiplications on two
arrays of size N - which means you need 2N main -> gpu memory transfers, then
2N memory reads and N memory writes, then again N gpu-> main memory transfer,
all that for only N floating point operations !
In other words, you're pretty much limited by memory transfers. If you had
a better operations/memory transfer ratio in your kernel (such as for a matrix
multiplication) you'd get a better speedup though.
Vincent Favre-Nicolin http://inac.cea.fr
CEA/Grenoble Institut Nanosciences & Cryogénie
Laboratoire SP2M/Nano-structures et Rayonnement Synchrotron
17, rue des Martyrs
38054 Grenoble Cedex 9 - France
Université Joseph Fourier http://www.ujf-grenoble.fr
tél: (+33) 4 38 78 95 40 fax: (+33) 4 38 78 51 38
More information about the PyCUDA