[PyCUDA] pyCUDA and streams
paulsson.m at gmail.com
Mon Mar 21 11:02:21 PDT 2011
> If by 'working' you mean 'actually overlapping', here's an additional
> subtlety. If 'exec' includes any kind of memory allocations, those are
> implicitly synchronization points--so you might be synchronizing without
> even seeing it. A memory pool would be a good solution for that (but
> would only help on the second run through).
pyFFT (and my toy code) only allocate memory at the start. Otherwise
we would not see overlap in the "Working.py".
> If however 'not working' means 'wrong results', then something's even
> more fishy.
By working I mean overlapping exec and mem-copy.
More information about the PyCUDA