Hello , i am a bit confused about measuring time,so i need a little help.
I have a code like :
....
Rs_gpu=gpuarray.to_gpu(np.random.rand(numPointsRs*3).astype(np.float32))
Rp_gpu=gpuarray.to_gpu(np.random.rand(3).astype(np.float32))
....
start = drv.Event()
end = drv.Event()
mod =SourceModule("""
__global__ void compute(float *Rs_mat, ...., float *Rp,.)
""")
#call the function(kernel)
func = mod.get_function("compute")
start.record() # start timing
func(Rs_gpu,..Rp_gpu...)
end.record() # end timing
# calculate the run length
end.synchronize()
secs = start.time_till(end)*1e-3
#----- get data back from GPU-----
Rs=Rs_gpu.get()
Rp=Rp_gpu.get()
print "%s, %fsec, %s" % ('Time for Rs = ',secs,str(Rs))
print "%s, %fsec, %s" % ('Time for Rp = ',secs,str(Rp)) //here i am
computing the same thing!
My questions are:
1) Is this right correct for measuring the gpu time?
2) How can i distinguish the results for Rs and for Rp (if it can be done)
Thanks!
--
View this message in context: http://pycuda.2962900.n2.nabble.com/how-to-measure-time-tp7208379p7208379.h…
Sent from the PyCuda mailing list archive at Nabble.com.
Hello , i am a bit confused about measuring time,so i need a little help.
I have a code like :
....
Rs_gpu=gpuarray.to_gpu(np.random.rand(numPointsRs*3).astype(np.float32))
Rp_gpu=gpuarray.to_gpu(np.random.rand(3).astype(np.float32))
....
start = drv.Event()
end = drv.Event()
mod =SourceModule("""
__global__ void compute(float *Rs_mat, ...., float *Rp,.)
""")
#call the function(kernel)
func = mod.get_function("compute")
start.record() # start timing
func(Rs_gpu,..Rp_gpu...)
end.record() # end timing
# calculate the run length
end.synchronize()
secs = start.time_till(end)*1e-3
#----- get data back from GPU-----
Rs=Rs_gpu.get()
Rp=Rp_gpu.get()
print "%s, %fsec, %s" % ('Time for Rs = ',secs,str(Rs))
print "%s, %fsec, %s" % ('Time for Rp = ',secs,str(Rp)) //here i am
computing the same thing!
My questions are:
1) Is this right correct for measuring the gpu time?
2) How can i distinguish the results for Rs and for Rp (if it can be done)
Thanks!
--
View this message in context: http://pycuda.2962900.n2.nabble.com/how-to-measure-time-tp7208367p7208367.h…
Sent from the PyCuda mailing list archive at Nabble.com.
I forgot to reply to the list.
On Wed, Jan 18, 2012 at 3:01 AM, Andreas Kloeckner
<lists(a)informa.tiker.net>wrote:
> On Tue, 17 Jan 2012 16:55:22 -0500, Yifei Li <yifli82(a)gmail.com> wrote:
> > Hi all,
> >
> > I modified the example
> > http://documen.tician.de/pycuda/tutorial.html#advanced-topics by
> removing
> > the '__padding' from the structure definition and got incorrect result.
> > The kernel is launched with 2 blocks and one thread in each block.
> >
> > Each thread prints the 'len' field in structure, which should be 3 for
> > block 0 and 2 for block 1. However, the result I got is:
> >
> > block 1: 2097664
> > block 0: 3
> >
> > No such problem if I write the following program using C. Any help is
> > appreciated.
>
> It seems CUDA doesn't automatically align the pointer, without being
> told to?
> https://en.wikipedia.org/wiki/Data_structure_alignment
How do I tell CUDA to align data automatically? If this is a CUDA problem,
how come the C program does not have any issue?
If I replace the structure
struct Vec{
int len;
float* ptr;
};
with a different structure of the same size (12 bytes)
struct Vec{
float x, y, z;
};
The values of x, y and z are printed correctly.
Yifei
>
>
> Andreas
>
>
On Tue, 17 Jan 2012 13:50:47 -0800, Jesse Lu <jesselu(a)stanford.edu> wrote:
> Hi everyone,
>
> Quick question, how should I copy the data from one GPUArray to another? Is
> there something better than
>
> y.set(x.get())
You can do a device-to-device copy from x.gpudata to y.gupdata.
Andreas
Hi all,
I modified the example
http://documen.tician.de/pycuda/tutorial.html#advanced-topics by removing
the '__padding' from the structure definition and got incorrect result.
The kernel is launched with 2 blocks and one thread in each block.
Each thread prints the 'len' field in structure, which should be 3 for
block 0 and 2 for block 1. However, the result I got is:
block 1: 2097664
block 0: 3
No such problem if I write the following program using C. Any help is
appreciated.
Yifei
#include <stdio.h>
struct Vec {
* int len;*
float* data;
};
__global__ void test(Vec *a) {
Vec v = a[blockIdx.x];
printf("block %d: %d\n", blockIdx.x, v.len);
}
-------------------------------------------------- end of kernel
---------------------------------------------------------------
import numpy
import pycuda.autoinit
import pycuda.driver as cuda
from pycuda.compiler import SourceModule
class DoubleOpStruct:
# mem_size = 8 + numpy.intp(0).nbytes
*mem_size = 4 + numpy.intp(0).nbytes*
def __init__(self, array, struct_arr_ptr):
data = cuda.to_device(array)
cuda.memcpy_htod(int(struct_arr_ptr), numpy.int32(array.size))
#cuda.memcpy_htod(int(struct_arr_ptr) + 8, numpy.intp(int(data)))
*cuda.memcpy_htod(int(struct_arr_ptr) + 4, numpy.intp(int(data)))*
struct_arr = cuda.mem_alloc(2 * DoubleOpStruct.mem_size)
do2_ptr = int(struct_arr) + DoubleOpStruct.mem_size
array1 = DoubleOpStruct(numpy.array([1, 2, 3], dtype=numpy.float32),
struct_arr)
array2 = DoubleOpStruct(numpy.array([0, 4], dtype=numpy.float32),
do2_ptr)
with open('test.cu', 'r') as f:
src = f.read()
mod = SourceModule(src)
func = mod.get_function("test")
func(struct_arr, block = (1, 1, 1), grid=(2, 1))
Hi,
I am trying to build pyCUDA on my machine with the following siteconf.py:
BOOST_INC_DIR = ['/usr/include']
BOOST_LIB_DIR = ['/usr/lib']
BOOST_COMPILER = 'gcc43'
USE_SHIPPED_BOOST = False
BOOST_PYTHON_LIBNAME = ['boost_python']
BOOST_THREAD_LIBNAME = ['boost_thread']
CUDA_TRACE = False
CUDA_ROOT = '/usr/local/cuda'
CUDA_ENABLE_GL = False
CUDA_ENABLE_CURAND = True
CUDADRV_LIB_DIR = ['/usr/lib']
CUDADRV_LIBNAME = ['cuda']
CUDART_LIB_DIR = ['${CUDA_ROOT}/lib64', '${CUDA_ROOT}/lib']
CUDART_LIBNAME = ['cudart']
CURAND_LIB_DIR = ['${CUDA_ROOT}/lib64', '${CUDA_ROOT}/lib']
CURAND_LIBNAME = ['curand']
CXXFLAGS = []
LDFLAGS = []
If I search in /usr/lib for the BOOST_PYTHON_LIBNAME and BOOST_THREAD_LIBNAME
and even though both appear they cause an error to be thrown when attempting to
build the installer. In order to cause that error with ld to go away I manually
changed the siteconf.py to what you see above. Now, the installer builds
properly.
When I run:
sudo python setup.py install
I receive the following error:
*** CUDA_ROOT not set, and nvcc not in path. Giving up.
This is weird to me because both the $CUDA_ROOT env variable and nvcc are set.
If I echo $CUDA_ROOT, I see /usr/local/cuda. I have tried this as root and
using sudo, and all print the same directory. The command nvcc also works from
the shell.
I have seen some other people posting with this problem and they all had
problems with the siteconf.py file. No matter what options I try giving the
build I always end up with the same error when attempting to install.
Cheers,
William Savran
Hi,
I have a question on the example here
http://documen.tician.de/pycuda/tutorial.html#advanced-topics. In the
__init__ function of DoubleOpStruct:
def __init__(self, array, struct_arr_ptr):
# this copies array on the host to device and returns a 'pointer' to
the array on the device, correct?
self.data = cuda.to_device(array)
# why memcpy_htod? memcpy_dtod would make more sense to me because
self.data refers to something on the device
cuda.memcpy_htod(int(struct_arr_ptr)+8, numpy.intp(int(self.data)))
Thanks
Yifei
On Thu, 12 Jan 2012 19:39:19 +0100, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> Seems like it is a 32 bit bug, I replicated it on another 32 bit
> machine and filed a bug report:
> http://projects.scipy.org/numpy/ticket/2017
>
> As for a temporary fix, I also register uintp32 (and intp32 for good
> luck) to DTYPES_TO_CTYPES which seems to do the trick (on 64 bit it
> will just overwrite the key but link it to the same value).
You mean uintp (not uintp32), right? I've made that fix in compyte. Can
you please verify? (requires a submodule update, fixed in both PyOpenCL
and PyCUDA)
I was a bit unsure what C type to map this to, but decided in favor of
uintptr_t, even though that requires the user to have stdint.h included,
which none of the other types do. Hope that's ok, but I am open to
suggestions.
Andreas
Can you also send an email to the numpy mailing list? This will help
make the problem be fixed faster. I'm not sure they use
frequently/check frequently new bug report.
Fred
On Thu, Jan 12, 2012 at 1:39 PM, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> Seems like it is a 32 bit bug, I replicated it on another 32 bit
> machine and filed a bug report:
> http://projects.scipy.org/numpy/ticket/2017
>
> As for a temporary fix, I also register uintp32 (and intp32 for good
> luck) to DTYPES_TO_CTYPES which seems to do the trick (on 64 bit it
> will just overwrite the key but link it to the same value).
>
> On Thu, Jan 12, 2012 at 7:08 PM, Josh Bleecher Snyder
> <josharian(a)gmail.com> wrote:
>>> Can anyone test if this replicates with his numpy (mine is 1.6.1)?
>>
>> For what it's worth:
>>
>>
>> Ubuntu, 64bit, Python 2.6.5:
>>
>>>>> np.version.version
>> '1.3.0'
>>>>> np.dtype(np.uintp)
>> dtype('uint64')
>>>>> np.dtype(np.uintp) == np.dtype(np.uint64)
>> True
>>>>> hash(np.dtype(np.uintp))
>> 1667532113121643636
>>>>> hash(np.dtype(np.uint64))
>> 1667532113121643636
>>
>>
>> Ubuntu, 64bit, Python 2.6.5:
>>
>>>>> np.version.version
>> '1.6.0'
>>>>> np.dtype(np.uintp)
>> dtype('uint64')
>>>>> np.dtype(np.uintp) == np.dtype(np.uint64)
>> True
>>>>> hash(np.dtype(np.uintp))
>> -7981643793158015352
>>>>> hash(np.dtype(np.uint64))
>> -7981643793158015352
>>
>>
>> OS X, 64bit, Python 2.6.5:
>>
>>>>> np.version.version
>> '1.6.0'
>>>>> np.dtype(np.uintp)
>> dtype('uint64')
>>>>> np.dtype(np.uintp) == np.dtype(np.uint64)
>> True
>>>>> hash(np.dtype(np.uintp))
>> -7981643793158015352
>>>>> hash(np.dtype(np.uint64))
>> -7981643793158015352
>>
>> So...haven't been able to reproduce it here.
>>
>> -josh
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda