Received from Gavin Weiguang Ding on Mon, Oct 06, 2014 at 11:55:49PM EDT:
> Hi Lev,
> Thanks for the reply!
> My GPUs do support GPUDirect. And I've tested using the "simpleP2P" from
> cuda samples.
> I've been trying a little bit on that but without success. I'm new to
> pycuda and multiprocessing, excuse me if I ask dumb questions.
> If I understand it right, I need to do pycuda.driver.init() and
> make_context() inside each process.
> But to use pycuda.driver.memcpy_peer, I need to pass the context defined in
> one process to another. But when I try to pass the context with Pipe or
> Queue from multiprocessing, it returns pickling error.
> Is this the right way of doing it? assuming pickling error can be solved.
Since CUDA contexts are private, you can't use the context set up in one process
in another. In recent versions of CUDA, you can use its IPC API to transfer a
GPU memory address from one GPU to another. See
https://gist.github.com/lebedov/6408165 for an example of how to use the API
Bionet Group | Neurokernel Project
Received from Gavin Weiguang Ding on Mon, Oct 06, 2014 at 04:30:46PM EDT:
> I'm trying to do p2p communication between 2 GPUs without going through the
> CPU memory.
> And I need to communicate between 2 processes (due to one process in Theano
> can only use 1 GPU), is that possible with pycuda? or
> specifically pycuda.driver.memcpy_peer?
Yes, but your GPU needs to support GPUDirect peer-to-peer communication.
Assuming that you have the appropriate hardware, you can also use mpi4py to
transfer data between GPUs if it has been compiled against an MPI implementation
that supports GPU-to-GPU communication (e.g., OpenMPI 1.8, MVAPICH2 2.0).
Bionet Group | Neurokernel Project
I'm trying to do p2p communication between 2 GPUs without going through the
And I need to communicate between 2 processes (due to one process in Theano
can only use 1 GPU), is that possible with pycuda? or
Dnia 2014-09-29, pon o godzinie 14:35 +1300, Chris O'Halloran pisze:
> Hi Tomasz,
> Thanks for the confirmation. Hopefully you'll be able resolve this
> without too much trouble.
> For now I've just installed pycuda from the 2014.1 tar file. I didn't
> uninstall the ubuntu pycuda package so I've probably overwritten some
> things but I hope that this action has not ruined other things.
No, rather not.
Thanks for the patience. I've just uploaded fixed version of PyCUDA
to Debian, version 2014.1-2. I've tested it with CUDA 5.5 and 6.0,
both versions available in Debian, and it works. I'm not sure
when this PyCUDA version will migrate to Ubuntu though.
At the same time it does not fix two rather important problems.
First - I do not "feel" Ubuntu way, and my attempts to contact people
dealing with GPU in Ubuntu failed, and I do not know what I should've
done differently. I have account on Launchpad,
but I do not receive any notifications of bugs. I've just went
there and there are 5 bugs opened against PyOpenCL! But I do not
use Ubuntu, I do not know what they've changed in graphics stack
that PyOpenCL tries to install NVIDIA ICD by default.
The other problem is that I do not use PyCUDA/PyOpenCL daily. I
currently work on other stuff, and it does not leave much time
or mental energy to switch to different field. I try to react to bugs,
but it's rather mechanical now. My latest GPU is two generations behind
(Fermi, not Kepler not Maxwell) and I do not see the need to buy the new
one. At the same time I do not want to orphan my packages; the queue
of packages waiting for adoption is really long and I do not want to
So, I have the task for members of the list. If you are using PyCUDA
or PyOpenCL on Debian or Ubuntu and there is something odd or wrong
with packages, please let me know. Preferably on the mailing list, or
fill the bug against appropriate package in Debian bug tracker, so
others have the chance to comment on it. I might not respond to it right
away, but I'll try to fix it.
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
On 29/09/14 16:17, Andreas Kloeckner wrote:
> GPUArrays don't actually care who "owns" the data, so if you're OK with
> building a GPUArray as a 'descriptor' structure (which is quick and
> lightweight) without moving any data around, then that would likely be a
> reasonable way of going about this.
> How does that sound?
That sounds like exactly what I am looking for. Do you have an API
reference for wrapping an allocation with a GPUArray structure?