That's an excellent problem for a GPU. However, because each problem uses a
fair amount of memory being careful about how the memory is accessed will
dominate your performance gains (as is typical when using a GPU). For
example tf won't fit in the shared memory or cache of a multi-processor so
you'll also want to divide the problem again.
If you don't need to get this working for routine usage though, you might
just try using numba primitives to move it to a GPU. I haven't used them,
so I can't attest that it will give you a good answer. On the other hand,
this is the sort of problem that makes learning CUDA and PyCUDA easy, so
you might as well give it a shot.
On Sat, Mar 28, 2015 at 8:29 AM Bruce Labitt <bdlabitt(a)gmail.com> wrote:
From reading the documentation, I am confused if
paralleling of this kind
of function is worth doing in pycuda.
I'm trying to add the effect of phase noise in to a radar simulation. The
simulation is written in Scipy/numpy. Currently I am using joblib to run
multiple cores. It is too slow for the scenarios I wish to try. It does
work for a small number of targets and reduced phase noise array sizes.
The following is the current approach:
Function to parallelize
def MSIN( farray, Mf, tf, jj ):
farray, Mf, tf, ii
farray array of frequencies (size = 10000)
Mf array of coefficients (size = 10000)
tf 2D array ~[2048 x 256] of time
jj list of indices (fraction of the problem to solve)
Msin = 0.0
for ii in jj:
Msin = Msin + Mf[ii] * 2.0*cos( 2.0*pi*farray[ii]*tf )
Current method to call function in parallel (multiprocessing)
Parallel computes the function MSIN with njobs cores
MMM = Parallel(n_jobs=njobs, max_nbytes=None)\
(delayed(MSIN)( f, aa, tf1, ii ) for ii in idx)
Msin = reduce(add, MMM) # add all the results of the cores together
Any suggestions to port this to pycuda? Reasonable candidate?
In essence, it is accumulating a scalar weighted cos function for many
elements of a 2D array. It 'feels' like it should be portable. Any road
blocks forseen? The 2D array of times is continuous in the sense of
stride. But there are discontinuous jumps in time values in the array,
which I do not think is a problem.
I have from DumpProperties.py
Device #0: GeForce GTX 680M
Compute Capability: 3.0
Total Memory: 4193984 KB
Thanks in advance for any insight, or suggestions on how to attack the
PyCUDA mailing list