[PyCUDA] PyCuda 3x slower than nvcc

Bogdan Opanchuk mantihor at gmail.com
Wed Apr 4 03:55:14 PDT 2012


Hello Michiel,

On Wed, Apr 4, 2012 at 8:39 PM, Michiel Bruinink
<Michiel.Bruinink at mapperlithography.com> wrote:
> I don't think streams will do any good, because I have seen that the memcpy
> time is a small part of the total time and it is the same for nvcc and
> pyCuda.

Streams can be used for kernels too, not only for operations with
memory. But I agree, from your explanations it seems that streams are
not the issue here.

> The larger pyCuda execution time is pure calculation time.
> In fact, when I comment out a section of the device code, the nvcc and
> pyCuda times are almost equal.

This sounds interesting, could you possibly quote this section here?
Or, even better, construct two simple programs, in Python and in C,
which reproduce this effect?



More information about the PyCUDA mailing list