Alex Park <alex(a)nervanasys.com> writes:
> Was wondering if it would be possible for you to submit the more recent
> pycuda version up to pypi to serve as the default version. We've made
> nervanagpu dependent on some of the async features and are concerned that
> some users might have trouble figuring out how to grab the github or tiker
> versions.
Done, 2015.1 is out.
Andreas
Scott,
Scott Gray <sgray(a)nervanasys.com> writes:
> I'm curious if you noticed this little project:
>
> https://github.com/NVIDIA/pynvrtc
>
> With cuda 7, it seems like that could be leveraged to replace forking off
> an instance of nvcc. I think compiling cuda-c in this way should be much
> faster. I'm too busy to play with it right now, but I was wondering if you
> had any plans to integrate this?
I wasn't aware of this, thanks for pointing it out! It'd be great to
integrate this. I personally can't spare the time right now, but I'd
love to take a patch.
>
> Btw, thanks for all the work you've put into pycuda. It's truly a pleasure
> to program in. I only wish I'd written my assembler in python to start out
> with so I could integrate it with pycuda and get dynamically generated
> assembly at run time. One of these days I'll port it over. I think my
> perl days are over.
Glad to hear you're finding it useful!
Andraes
I'm trying to understand shared memory by playing with the following code:
#=========================================================
import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
import numpy
from pycuda.compiler import SourceModule
src='''
__global__ void reduce0(float *g_idata, float *g_odata) {
extern __shared__ float sdata[];
// each thread loads one element from global to shared mem
unsigned int tid = threadIdx.x;
unsigned int i = blockIdx.x*blockDim.x + threadIdx.x;
sdata[tid] = g_idata[i];
__syncthreads();
// do reduction in shared mem
for(unsigned int s=1; s < blockDim.x; s *= 2) {
if (tid % (2*s) == 0) {
sdata[tid] += sdata[tid + s];
}
__syncthreads();
}
// write result for this block to global mem
if (tid == 0) g_odata[blockIdx.x] = sdata[0];
}
'''
mod = SourceModule(src)
reduce0=mod.get_function('reduce0')
a = numpy.random.randn(400).astype(numpy.float32)
dest = numpy.zeros_like(a)
reduce0(drv.In(a),drv.Out(dest),block=(400,1,1))
#=========================================================
I can't see anything obviously wrong with this, but I keep getting
synchronization errors and it doesn't run. I'm hoping it's something
simple that I don't understand.
Any help appreciated.
Thanks!
Bruno Villasenor <br1villasen(a)gmail.com> writes:
> I’m trying to compile the simplest code that uses dynamic parallelism using
> the regular SorceModule, my code:
>
> ------------------------------------------------------------------------
> import numpy as np
> import pycuda.driver as cuda
> from pycuda.compiler import SourceModule
> import pycuda.autoinit
>
> cudaCodeString = """
> __global__ void ChildKernel(void* data){
> //Operate on data
> }
>
> __global__ void ParentKernel(void *data){
> if (threadIdx.x == 0) {
> ChildKernel<<<1, 32>>>(data);
> cudaThreadSynchronize();
> }
> __syncthreads();
> //Operate on data
> }
> """
> cudaCode = SourceModule(cudaCodeString, options=['-rdc=true' ,'-lcudart' ],
> arch='compute_35' )
>
> -------------------------------------------------------------------------------
>
> I get the next error:
> ---------------------------------------------------------------------------------
> pycuda.driver.CompileError: nvcc compilation of /tmp/tmpJJo9kU/kernel.cu
> failed
> [command: nvcc --cubin -rdc=true -lcudart -arch compute_35 -I/usr
> /local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/
> pycuda/cuda kernel.cu]
> [stderr:
> nvcc fatal : Option '--cubin (-cubin)' is not allowed when compiling for
> a virtual compute architecture
>
> -----------------------------------------------------------------------------------
This came up before. Unfortunately, it requires a change to how PyCUDA
compiles code, and that hasn't been done yet. I'd be happy to take
patches though.
Andreas