On Dienstag 02 Februar 2010, Bonnel wrote:
I was just playing with the profiler from nvidia and
I'm wondering why
all data from the graphic card are read back. I though memory was read
back only when using cl.enqueue_read_buffer. Here is the result I get
from the profiling of matrix-multiply.py :
method memory transfert size
As there is only one cl.enqueue_read_buffer call, there should be only
one memcpyDtoHasync call.
I recently had an informative conversation with someone on the Nvidia
driver team, and they indicated that CL may 'transparently' issue
transfers after kernel launches based on the flags with which the buffer
Now I'm faced with two problems. First, all the Nvidia profiler does for
me is crash. I've figured out that I can invoke it from the command line
and then find data in "opencl_profile_0.log". However no matter what I
put in temp_cl_profiler.conf, I can't see the extra transfers you are
seeing. Can you grab and post the generated config file, perhaps by
import os; print open(os.environ["OPENCL_PROFILE_CONFIG"],
That would be very helpful. (If you could generate a survey of what the
file can look like, that would of course help even more!)
As far as flags were concerned, COPY_HOST_PTR was a natural suspect, but
removing that didn't change the timings. It would really help if I could
observe the extra transfers.
Thanks for posting your observations!