If it's not too much hassle, you could try uninstalling all CUDA5-related system packages to ensure that PyCUDA is linking to the appropriate CUDA6 library, headers, etc., but I doubt that's actually your problem.

Eric


On Thu, Nov 6, 2014 at 8:10 AM, kjs <bfb@riseup.net> wrote:
In the routine I describe below, I am beginning to see the following
error. Please note, I was able to successfully run this routine all the
way though when PyCUDA was linked to system CUDA5. The errors started
popping up after I installed CUDA6 system wide and thus recompiled
PyCUDA. I am running Debian Testing.

Traceback (most recent call last):
  File "feature_extractor.py", line 475, in <module>
    main()
  File "feature_extractor.py", line 467, in main
    fe.set_features(fname[0])
  File "feature_extractor.py", line 51, in set_features
    self.apply_filters()
  File "feature_extractor.py", line 99, in apply_filters
    n_jobs='cuda', copy = False, verbose=False)
  File "<string>", line 2, in band_stop_filter
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/utils.py",
line 509, in verbose
    return function(*args, **kwargs)
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
line 742, in band_stop_filter
    xf = _filter(x, Fs, freq, gain, filter_length, picks, n_jobs, copy)
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
line 345, in _filter
    n_jobs=n_jobs)
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
line 141, in _overlap_add_filter
    n_segments, n_seg, cuda_dict)
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/filter.py",
line 173, in _1d_overlap_filter
    prod = fft_multiply_repeated(h_fft, seg, cuda_dict)
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/mne-0.9.git-py2.7.egg/mne/cuda.py",
line 196, in fft_multiply_repeated
    x = np.array(cuda_dict['x'].get(), dtype=x.dtype, subok=True,
  File
"/home/kjs/py-virt-envs/dreateam/local/lib/python2.7/site-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.py",
line 264, in get
    drv.memcpy_dtoh(ary, self.gpudata)
pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: launch timeout
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: launch timeout

Thanks,
Kevin


kjs wrote:
>
>
> Eric Larson wrote:
>> Hey Kevin,
>>
>> Not sure about the CUDA limitations, I'll let others speak to that...
>>
>> But in developing the mne-python CUDA filtering code, IIRC the primary
>> limitation was (by far) transferring the data to and from the GPU. The FFT
>> computations themselves were a fraction of the total time. I suspect using
>> multiple jobs won't help CUDA filtering very much since the jobs would
>> presumably compete for the same memory bandwidth, but I would love to be
>> wrong about this. If it works better, it would be great to open an
>> mne-python issue for it, as we are always looking for speedups :)
>>
>> Cheers,
>> Eric
>> On Nov 1, 2014 7:21 PM, "kjs" <bfb@riseup.net> wrote:
>>
>>> Hello,
>>>
>>> I have written an MPI routine in Python that sends jobs to N worker
>>> processes. The root process handles file IO and the workers do
>>> computation. In the worker processes calls are made to the cuda enabled
>>> GPU to do FFTs.
>>>
>>> Is it safe to have N processes potentially making calls to the same GPU
>>> at the same time? I have not made any amendments to the cuda code[0],
>>> and have little knowledge of what could possibly go wrong.
>>>
>>> Thanks much,
>>> Kevin
>>>
>>> [0] I am using python-mne with cuda enabled to call scikits.cuda.fft
>>> https://github.com/mne-tools/mne-python/blob/master/mne/cuda.py
>>>
>>> _______________________________________________
>>> PyCUDA mailing list
>>> PyCUDA@tiker.net
>>> http://lists.tiker.net/listinfo/pycuda
>>>
>>>
>>
>
> Thanks Andreas, this is good to know. I noticed that even though pycuda
> is currently only using one of two GPUs, that GPU is only ever at ~35%
> memory and ~22% processing utilization. This could be related to Eric's
> observation that the PCI-e 16x bus bandwidth reaches capacity while the
> GPU is pushing out fast FFT'ed arrays. Thus allowing for only one or two
> arrays in the GPU at the same time.
>
> From what I have seen, using cuda speeds up my FFTs ~2x. Though, the
> workers do many other computations on the CPU. It's a worst case
> scenario that all N workers are trying to send data to the GPU at the
> same time.
>
> -Kevin
>
>
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA@tiker.net
> http://lists.tiker.net/listinfo/pycuda
>

_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
http://lists.tiker.net/listinfo/pycuda