Re: [PyCUDA] kernel argument double2 alignment
by Andreas Kloeckner

On Mon, 7 Nov 2011 14:17:03 +1300, Igor <rychphd(a)gmail.com> wrote:
> I found a place to patch in pycuda-2011.1.2\pycuda\gpuarray.py,
> class vec:
> ...
> for base_name, base_type, counts in [
> ('char', np.int8, [1,2,3,4]),
> ('uchar', np.uint8, [1,2,3,4]),
> ('short', np.int16, [1,2,3,4]),
> ('ushort', np.uint16, [1,2,3,4]),
> ('int', np.uint32, [1,2,3,4]),
> ('uint', np.uint32, [1,2,3,4]),
> ('long', long_dtype, [1,2,3,4]),
> ('ulong', ulong_dtype, [1,2,3,4]),
> ('longlong', np.int64, [1,2]),
> ('ulonglong', np.uint64, [1,2]),
> ('float', np.float32, [1,2,3,4]),
> ('ulonglong', np.float64, [1,2]),
> ]:
> ...
> Can I just change the second occurrence of ulonglong to double? Do I
> need to recompile anything then?
Thanks for the report--I've fixed this upstream. Hope this solves your
issue.
Best,
Andreas
7 years, 5 months

kernel argument double2 alignment
by Igor

Hi,
How can I make a python type that corresponds to the device built-in
double2 both in that it has x,y fields _and_ aligned on 16 and not 8
bytes? I am passing it as an argument to the kernel that expects to
receive double2 instead it receives whatever is derived from
k = int(1)
l = int(2)
# how do I align the following
dbl2 = [('x','float64'), ('y','float64')]
a2 = np.array((-0.5,-0.5), dtype=dbl2)
...
kernel(k, l, a2, ...,arr.gpudata, block=(int(16),int(16),int(1)))
It either crashes or doesn't access properly arr.gpudata. I think what
happens is that a2 is not aligned when pushed into parameters stack as
expected by the kernel declaration in CUDA:
__global__ void kernel(int k, int l, double2 a2, ..., double2 *arr) {
if instead,
struct my_double2 {double x,y;};
__global__ void kernel(int k, int l, my_double2 a2, ..., double2 *arr) {
then it works.
What is the best way to pack arguments currently in PyCUDA?
Thanks
Igor
7 years, 5 months

Multiplying GPUArray returns unexpected result
by Peter17

Hi, everybody,
I'm new to pycuda and I encounter problems when multiplying a Numpy
array with a GPUArray.
This kind of multiplication is used, for instance, in
hedge/hedge/discretization/__init__.py:67 in CUDA mode.
I believe the following script returns unexpected result:
----
# -*- coding: utf-8 -*-
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy
a = numpy.array([1.,2.,3.,4.]).astype(numpy.float32)
a_gpu = gpuarray.to_gpu(a)
b = a
b_gpu = gpuarray.to_gpu(b)
print "Numpy NP dot NP:", numpy.dot(a, b)
print "Numpy NP dot GPU:", numpy.dot(a, b_gpu)
print "Numpy GPU dot NP:", numpy.dot(a_gpu, b)
print "Numpy GPU dot GPU:", numpy.dot(a_gpu, b_gpu)
print "GPUArray GPU dot GPU:", gpuarray.dot(a_gpu, b_gpu).get()
----
will print:
Numpy NP dot NP: 30.0
Numpy NP dot GPU: [[ 1. 2. 3. 4.] [ 2. 4. 6. 8.] [ 3. 6. 9.
12.] [ 4. 8. 12. 16.]]
Numpy GPU dot NP: [[ 1. 2. 3. 4.] [ 2. 4. 6. 8.] [ 3. 6. 9.
12.] [ 4. 8. 12. 16.]]
Numpy GPU dot GPU: [ 1. 4. 9. 16.]
GPUArray GPU dot GPU: 30.0
In my opinion, the expected result would be:
Numpy NP dot NP: 30.0
Numpy NP dot GPU: 30.0
Numpy GPU dot NP: 30.0
Numpy GPU dot GPU: 30.0
GPUArray GPU dot GPU: 30.0
This can be obtained with the following (probably dirty) code:
----
def do_sum(data):
res = 0
if type(data).__name__ == 'GPUArray':
res = gpuarray.sum(data)
elif type(data).__name__ == 'ndarray':
for i in range(len(data)):
gpu_vec = data[i].get()
res += gpu_vec[i]
else:
res = data
return res
print "Numpy NP dot NP:", do_sum(numpy.dot(a, b))
print "Numpy NP dot GPU:", do_sum(numpy.dot(a, b_gpu))
print "Numpy GPU dot NP:", do_sum(numpy.dot(a_gpu, b))
print "Numpy GPU dot GPU:", do_sum(numpy.dot(a_gpu, b_gpu))
print "GPUArray GPU dot GPU:", gpuarray.dot(a_gpu, b_gpu).get()
----
Am I wrong?
I'm not sure about which one among PyCUDA's gpuarray.py and Numpy
should be responsible for calculating the sum...
What do you think?
Thanks in advance
--
Peter Potrowl
7 years, 5 months