On Tue, 27 Mar 2012 12:06:31 -0600, Ryan Haynes <rhaynesak(a)gmail.com> wrote:
I have 4 54 megabyte buffers which I want to perform
byte by byte
analysis on. I can copy the data in roughly 100msec, this seems like
decent tranfer time ~2gbyte/ second. However, when I go to execute my
kernel the overhead passing in my device pointers is huge. Something
like 500msec even on a no-op kernel.
What implementation are you using?