Not sure about the CUDA limitations, I'll let others speak to that...
But in developing the mne-python CUDA filtering code, IIRC the primary
limitation was (by far) transferring the data to and from the GPU. The FFT
computations themselves were a fraction of the total time. I suspect using
multiple jobs won't help CUDA filtering very much since the jobs would
presumably compete for the same memory bandwidth, but I would love to be
wrong about this. If it works better, it would be great to open an
mne-python issue for it, as we are always looking for speedups :)
On Nov 1, 2014 7:21 PM, "kjs" <bfb(a)riseup.net> wrote:
I have written an MPI routine in Python that sends jobs to N worker
processes. The root process handles file IO and the workers do
computation. In the worker processes calls are made to the cuda enabled
GPU to do FFTs.
Is it safe to have N processes potentially making calls to the same GPU
at the same time? I have not made any amendments to the cuda code,
and have little knowledge of what could possibly go wrong.
 I am using python-mne with cuda enabled to call scikits.cuda.fft
PyCUDA mailing list