Zhangsheng Lai <dunno.noe(a)gmail.com> writes:
> My 'can access' simply means that I'm able to access the values in the
> variable in python by typing x1 or x2. My understanding is that if the
> variables are stored on different GPUs, then I should be able to type x1
> and get its values when ctx1 is active and similarly, I can type x2 and get
> the x2 values when ctx2 is active, not when ctx1 is active.
You could measure bandwidths to between host/presumed gpu1/presumed gpu2
to ascertain where the data actually resides if you have doubts about
that/don't trust the API.
Andreas
Zhangsheng Lai <dunno.noe(a)gmail.com> writes:
> with the setup above, I tried to check by poping ctx2 and pushing ctx1, can
> I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I can
> access x2 and not x1. However, I realise that I can access x1 and x2 in
> both contexts.
Can you clarify what you mean by 'can access'? I'm guessing 'submit
kernel launches with that pointer as an argument'?
Andreas
Hi,
I'm trying to create different GPU arrays on different GPUs.
```
import pycuda
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
import pycuda.curandom as curandom
d = 2 ** 15
cuda.init()
dev1 = cuda.Device(1)
ctx1 = dev1.make_context()
curng1 = curandom.XORWOWRandomNumberGenerator()
x1 = curng1.gen_normal((d,d), dtype = np.float32) # so x1 is stored in GPU
1 memory
ctx1.pop() # clearing ctx of GPU1
dev2 = cuda.Device(1)
ctx2 = dev2.make_context()
curng2 = curandom.XORWOWRandomNumberGenerator()
x2 = curng2.gen_normal((d,d), dtype = np.float32) # so x2 is stored in GPU 2
```
with the setup above, I tried to check by poping ctx2 and pushing ctx1, can
I access x1 and not x2 and vice versa, popping ctx1 and pushing ctx2, I can
access x2 and not x1. However, I realise that I can access x1 and x2 in
both contexts.
Thus I'm wondering if my assumptions of x1 stored in GPU1 and x2 stored in
GPU2 are correct, or if it is actually the UVA and peer access that allows
me to access both x1 and x2 even if only one of the two ctx is active.
Thanks,
Zhangsheng
Dear Anthony,
"Anthony Pleticos" <anthonypleticos(a)unswalumni.com> writes:
> I would like to know where people can go for 'assistance' in difficulties in
> applying the pycuda.
>
> I could not find it in the
> https://wiki.tiker.net/PyCuda/FrequentlyAskedQuestions and StackExchange
> does not answer my specific issue, especially when running either the
> tutorial and/or examples such as demo.py and hello_gpu in
> C:\Python36\pycuda-2017.1.1\examples.
>
>
>
> I tried the follow step-by-step the tutorial at
> https://documen.tician.de/pycuda/tutorial.html#where-to-go-from-here .
>
> The problem comes under the heading "Executing a Kernel" where I have a c++
> like module in the py file.
>
> mod = SourceModule("""
>
> __global__ void method(args)
>
> {
>
> C++ like code
>
> }
>
> """)
>
> It happens under the sample code in your tutorial or the
> pycuda-2017.1.1\examples I get the error message
>
> nvcc fatal : Value 'sm_21' is not defined for option 'gpu-architecture'
Generally, the mailing list (cc'd, needs subscription to post) is a good
place for requests like this. In your case, you seem to have fairly old
GPU ("sm_21") that's no longer supported by your compiler
(nvcc). Downgrading the CUDA toolkit may help.
Andreas
I'm currently trying to build a simple FEA solver in python using an
incomplete Cholesky decomposition preconditioned conjugate gradient method.
I have exported an example stiffness matrix from my old (and slow) code into
a symmetric .mtx file. This matrix is imported into the example code at
https://wiki.tiker.net/PyCuda/Examples/SparseSolve.
Running the code resulted in errors. The first part was solved by pymetis.
However now a new problem appeared. I get a type error: "TypeError: No
registered converter was able to produce a C++ rvalue of type int from this
Python object of type numpy.int32".
The full traceback:
In [1]:
runfile('/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py',
wdir='/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src')
starting...
building...
Traceback (most recent call last):
File "<ipython-input-56-3395cc199316>", line 1, in <module>
runfile('/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py',
wdir='/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src')
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py",
line 705, in runfile
execfile(filename, namespace)
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py",
line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File
"/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py", line
71, in <module>
main_cg()
File
"/home/bram/Documenten/TUDelft/Thesis/Phyton/topopt/src/cgCUDATest.py", line
21, in main_cg
spmv = PacketedSpMV(csr_mat, 'symmetric', csr_mat.dtype)
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/sparse/packeted.py",
line 127, in __init__
xadj=adj_mat.indptr, adjncy=adj_mat.indices)
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/pymetis/__init__.py",
line 120, in part_graph
return part_graph(nparts, xadj, adjncy, vweights, eweights, recursive)
TypeError: No registered converter was able to produce a C++ rvalue of type
int from this Python object of type numpy.int32
Do I need to change the python files in pymetis?
--
Sent from: http://pycuda.2962900.n2.nabble.com/
Zhangsheng Lai <dunno.noe(a)gmail.com> writes:
> Hi,
>
> I'm trying to do some updates to a state which is a binary array. gputid is
> a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/MultipleThreads)
> and it stores the state and the index of the array to be updated in another
> class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu
> respectively. Below is my kernel that takes in the gputid and performs the
> update of the state. However, it the output of the code is not consistent
> as it runs into errors and executes perfectly when i run it multiple times.
> The error msg makes no sense to me:
>
> File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line
> 447, in function_prepared_call
> func._set_block_shape(*block)
> pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource
> handle
I think the right way to interpret this is that if you cause an
on-device segfault, the GPU context dies, and all the handles of objects
contained in it (including the function) become invalid.
HTH,
Andreas
Hi,
I'm trying to do some updates to a state which is a binary array. gputid is
a GPU thread class (https://wiki.tiker.net/PyCuda/Examples/MultipleThreads)
and it stores the state and the index of the array to be updated in another
class which can be accessed with gputid.mp.x_gpu and gputid.mp.neuron_gpu
respectively. Below is my kernel that takes in the gputid and performs the
update of the state. However, it the output of the code is not consistent
as it runs into errors and executes perfectly when i run it multiple times.
The error msg makes no sense to me:
File "/root/anaconda3/lib/python3.6/site-packages/pycuda/driver.py", line
447, in function_prepared_call
func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource
handle
My code:
def local_update(gputid):
mod = SourceModule("""
__global__ void local_update(int *x_gpu, float *n_gpu)
{
int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid == (int)(n_gpu[0]))
{
x_gpu[tid] = 1 - x_gpu[tid];
}
}
""")
gputid.ctx.push()
x_gpu = gputid.mp.x_gpu
n_gpu = gputid.mp.neuron_gpu
func = mod.get_function("local_update")
func.prepare("PP")
grid = (1,1)
block = (gputid.mp.d,1,1)
func.prepared_call(grid, block, x_gpu, n_gpu)
gputid.ctx.pop()
print ('1Pain')
My goal is to speed up my python FEA (finite elements analysis) with my
quadro GPU. I however have issues when I import pycuda.autoinit or
pycuda.driver into my python code. See the example from my Console:
**code
In [6] import pycuda.autoinit
Traceback (most recent call last):
File "<ipython-input-7-78816ba4a0fc>", line 1, in <module>
import pycuda.autoinit
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/autoinit.py",
line 2, in <module>
import pycuda.driver as cuda
File
"/home/bram/.anaconda3/lib/python3.6/site-packages/pycuda-2017.1-py3.6-linux-x86_64.egg/pycuda/driver.py",
line 5, in <module>
from pycuda._driver import * # noqa
ImportError: libcurand.so.8.0: cannot open shared object file: No such file
or directory
/***code
Some details of my setup:
- HP Zbook Studio G3 (Quadro M1000M) Ubuntu 18.04
- Cuda 9.1 (.run installer)(I added the path variables to ~/.bashrc
- nvidia-driver-390 as driver
pycuda 2017.1 (from anaconda)
I've tried the solutions proposed by people encountering similar issues when
using tensorflow-gpu: It was proposed to make a softlink from
libcurand.se.8.0 to the libcurand.se.9.1 using the terminal:
"user@device:~$ sudo ln -s libcublas.so.9.1 libcublas.so.8.0" This did not
help however.
I've checked the installation of CUDA by running a simple vectorAdd example
in Exlips. That worked without any issues and when profiling it showed that
the gpu was working as expected.
I probably made a mistake somewhere and tell me if you need more information
--
Sent from: http://pycuda.2962900.n2.nabble.com/