First of all, I noticed I put a wrong title to my mail : the aditionnal
memory transfert is from device to host.
The buffer d_c_buf is read back, in the file matrix-multiply.py in the
examples packaged with pyopencl. As you will see, it's size is 2 times
smaller than the size of d_a_buf and d_b_buf.
The 2 memcpyHtoDasync of 512 MB are the transfert of the matrix A and B
(filling d_a_buf and d_b_buf) and there should be one
memcpyDtoHasync of 256 MB (reading back d_c_buf to get the matrix C).
It seems the 3 buffer d_a_buf, d_b_buf and d_c_buf are read back another
David Garcia a écrit :
What are the parameters that you pass to enqueue_read_buffer_call? In
particular, how many megabytes are you reading back?
I notice that the call to memcpyHtoDasync is 512MB whereas the two
memcpyDtoHasync are 256MB each.
On Tue, Feb 2, 2010 at 11:08 AM, Bonnel <nicolas.bonnel(a)univ-ubs.fr
I was just playing with the profiler from nvidia and I'm wondering
why all data from the graphic card are read back. I though memory
was read back only when using cl.enqueue_read_buffer. Here is the
result I get from the profiling of matrix-multiply.py :
method memory transfert size
As there is only one cl.enqueue_read_buffer call, there should be
only one memcpyDtoHasync call.
PyOpenCL mailing list