Hi Nicolas,

What are the parameters that you pass to enqueue_read_buffer_call? In particular, how many megabytes are you reading back?

I notice that the call to memcpyHtoDasync is 512MB whereas the two memcpyDtoHasync are 256MB each.



On Tue, Feb 2, 2010 at 11:08 AM, Bonnel <nicolas.bonnel@univ-ubs.fr> wrote:

I was just playing with the profiler from nvidia and I'm wondering why all data from the graphic card are read back. I though memory was read back only when using cl.enqueue_read_buffer. Here is the result I get from the profiling of matrix-multiply.py :

method                        memory transfert size
memcpyHtoDasync      5.12e+06                               memcpyHtoDasync      5.12e+06                               memcpyDtoHasync      2.56e+06                               memcpyDtoHasync      5.12e+06                               memcpyDtoHasync      2.56e+06                               memcpyDtoHasync      5.12e+06                              
As there is only one cl.enqueue_read_buffer call, there should be only one memcpyDtoHasync call.

Nicolas Bonnel

PyOpenCL mailing list