"Roberto Colistete Jr." <roberto.colistete(a)gmail.com> writes:
It is my first post here in this PyCUDA group. I
am using PyCUDA x
CUDA x Mathematica 8 CUDA to compare performance in some problems in
Until CC 1.3, the performance ratio of PyCUDA between DP/SP
(FP64/FP32) was as expected (near 1/8 or 1/12), comparable when running
CUDA or Mathematica 8 CUDA.
But using the same source code on any GPU device with CC 2.0/2.1
(Fermi), the performance in FP32 (SP) is poor with :
- DP/SP ratio of approx. 1/3 to 1/2;
- better GPU device (Tesla C2050, CC2.0) being slower (0.77s x 0.33s) in
FP32 than older GPU (Tesla C1060, CC1.3)), while in FP64 it is faster
(0.89s x 4.48s).
The same behaviour happens with other CC2.x GPU devices (GTX 480,
GT 540M, etc) and any Linux (Ubuntu, Fedora, etc).
Do you have some explanation about this issue ? And recomendation
to solve it ?
It's not really likely that PyCUDA has much to do with this issue. It
might be that the compiler flags that PyCUDA passes to nvcc are to
blame. You can find the nvcc command line by sticking a print statement
on line 113 (or thereabouts) of pycuda/compiler.py. The resulting binary
should perform just as well as the corresponding CUDA C implementation
compiled with the same flags. If you'd like to pass different flags,
just pass an 'options' kwarg to SourceModule.