On Thu, 12 Jun 2014 03:45:09 +0530
Abhilash Dighe <abhilash.dighe(a)gmail.com> wrote:
I was hoping to get some insight on my observations. I am using PyOpenCL
version 2 with NVIDIA Tesla M2090 to run my kernel which runs SHA1
algorithm over variably sized data blocks. I'm running the same kernel I'm
trying to find the execution time for my kernel. But I'm getting different
readings for time for when I use the PyOpenCL's profiling tool and when I
use the standard python time library. My code is structured as:
hash_start = time.time()
hash_event = prog.sha1( queue , shape , None , in_buf , out_buf , ..<other
hash_end = time.time()
add_hash_CPU_time( hash_end - hash_start )
add_hash_GPU_time( 1e-9 * ( hash_event.profile.end -
hash_event.profile.start ) )
These are the results for a test case of size 3 GB. The kernel gets called
64 times and runs 12288 threads each time.
Total OpenCL profiling time = 1.56s
Total CPU wall clock time = 13.79s
I needed some help understanding what the cause for this inconsistency is.
Or is there any mistake I'm making in recording the data.
Is your GPU in persistent mode ? (nvidia-smi)
If not, the loading/unloading of the nvidia kernel driver can last for multiple seconds.
Data analysis unit - ESRF