[PyCUDA] Amelioration of GPU speed using pycuda functions
lists at informa.tiker.net
Sun Mar 13 17:13:54 PDT 2011
On Sun, 13 Mar 2011 16:04:06 -0700 (PDT), elafrit <afrit.mariem at gmail.com> wrote:
> I woder if I can ameliorate the pycuda code by editing the number of maximum
> threads in the gpuarray.py ?
The only way to find out is to try. If you do find a way to improve the
speed, please do let the list know.
I imagine that a better approach might be to try and introduce some
instruction-level parallelism. (or at least create some wiggle room for
the insn scheduler in ptxas) That, unfortunately, is sort of difficult.
> And I can't understand what's really happening when I use the methods of
> gpuarray to multiply a matrix with a scalar ? Is the scalar sent to the GPU
> for each element of the matrix or it's sent only the first time ? And is it
> sent as scalar or as gpuarray ?
CPU scalars are sent as kernel parameters, which is a fairly efficient
way of broadcasting to all thread blocks.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
More information about the PyCUDA