On Mon, 18 Oct 2010 21:47:54 +0700, arief nur andono <ariefnurandono(a)gmail.com>
is there anybody has implementation of LINPACK in pycuda??
There's an autotuned matrix multiply on the wiki, here:
how to count gpu flops compared to CPU like in NVIDIA slides and
Flop counting on GPUs is mainly a manual affair. Timing can be done with
events or the profiler.