Jerome Kieffer <Jerome.Kieffer(a)esrf.fr> writes:
I am looking for a way to find the best device in a computer in order
to be able to select it for processing.
PyOpenCL offers me a max_clock_frequency and a max_compute_units for the device. Nice!
Unfortunately on a dual-Xeon5520 + Fermi; the product
max_clock_frequency*max_compute_units is in favour of the CPU but the GPU is clearly
I have calculated the FLOPS per compute unit per Hz for a few devices and I got:
NVidia Fermi (GTX580): 64 FLOPS/Unit/Hz
NVidia Tesla (GT285): 24 FLOPS/Unit/Hz
NVidia GT9600: 24 FLOPS/Unit/Hz
Intel CPU: 4 FLOPS/Unit/MHz (I usually get less)
According to some readings on the web for Kepler cards, it should be 384 FLOPS/Unit/MHz
I have no figures for AMD cards, I would be interested in getting some
of them; and would like to be able to discriminate the various NVidia
generations within pyopencl (via compute_capability_major_nv &
Any ideas are welcome.
This spreadsheet may be of interest:
along with the following formulas:
- mem bw
bus bits * bus clock /1e3 / 8
- flops per core, per clock
fpus * 2
- flop rate [gflops]
cores * core clock * fpus * 2 * 1e6 / 1e9
- how many scheduling slots per core?
warp size * (# warps/core)
- how many scheduling slots total?
-> and that's just what the hardware does!
- how much register file per work item?
-> "I'm going to make a mistake."
reg file * 1024 / # fpus
reg file * 1024 / # work items
- smem bw / WHAT?
lmem bw / (#fpus * 2)
- gmem bw / flop?
(gmem bw *1e9) / (#cores) / (core clock *1e6) / (#fpus * 2)
For reference, I put this together for a class I taught. You can watch
me blab about that here: