On Mon, 7 Nov 2011 14:17:03 +1300, Igor <rychphd(a)gmail.com> wrote:
> I found a place to patch in pycuda-2011.1.2\pycuda\gpuarray.py,
> class vec:
> ...
> for base_name, base_type, counts in [
> ('char', np.int8, [1,2,3,4]),
> ('uchar', np.uint8, [1,2,3,4]),
> ('short', np.int16, [1,2,3,4]),
> ('ushort', np.uint16, [1,2,3,4]),
> ('int', np.uint32, [1,2,3,4]),
> ('uint', np.uint32, [1,2,3,4]),
> ('long', long_dtype, [1,2,3,4]),
> ('ulong', ulong_dtype, [1,2,3,4]),
> ('longlong', np.int64, [1,2]),
> ('ulonglong', np.uint64, [1,2]),
> ('float', np.float32, [1,2,3,4]),
> ('ulonglong', np.float64, [1,2]),
> ]:
> ...
> Can I just change the second occurrence of ulonglong to double? Do I
> need to recompile anything then?
Thanks for the report--I've fixed this upstream. Hope this solves your
issue.
Best,
Andreas

I found a place to patch in pycuda-2011.1.2\pycuda\gpuarray.py,
class vec:
...
for base_name, base_type, counts in [
('char', np.int8, [1,2,3,4]),
('uchar', np.uint8, [1,2,3,4]),
('short', np.int16, [1,2,3,4]),
('ushort', np.uint16, [1,2,3,4]),
('int', np.uint32, [1,2,3,4]),
('uint', np.uint32, [1,2,3,4]),
('long', long_dtype, [1,2,3,4]),
('ulong', ulong_dtype, [1,2,3,4]),
('longlong', np.int64, [1,2]),
('ulonglong', np.uint64, [1,2]),
('float', np.float32, [1,2,3,4]),
('ulonglong', np.float64, [1,2]),
]:
...
Can I just change the second occurrence of ulonglong to double? Do I
need to recompile anything then?
On Mon, Nov 7, 2011 at 11:26 AM, Igor <rychphd(a)gmail.com> wrote:
> Thanks. The reason I felt I had to define my own double2 in PyCUDA is
> that it's not among the types exposed by gpuarray.vec... if it is,
> then how do I use it because the following (second line) does not
> work?
>
> print gpuarray.vec.make_float3(1,2,3)
> print gpuarray.vec.make_double2(1,2)
>
> (1.0, 2.0, 3.0)
> Traceback (click to the left of this block for traceback)
> ...
> AttributeError: class vec has no attribute 'make_double2'
>
>
>
>
> On Mon, Nov 7, 2011 at 9:19 AM, Andreas Kloeckner
> <lists(a)informa.tiker.net> wrote:
>> On Sun, 6 Nov 2011 19:23:10 +1300, Igor <rychphd(a)gmail.com> wrote:
>>> Hi,
>>> How can I make a python type that corresponds to the device built-in
>>> double2 both in that it has x,y fields _and_ aligned on 16 and not 8
>>> bytes? I am passing it as an argument to the kernel that expects to
>>> receive double2 instead it receives whatever is derived from
>>>
>>> k = int(1)
>>> l = int(2)
>>>
>>> # how do I align the following
>>> dbl2 = [('x','float64'), ('y','float64')]
>>> a2 = np.array((-0.5,-0.5), dtype=dbl2)
>>> ...
>>> kernel(k, l, a2, ...,arr.gpudata, block=(int(16),int(16),int(1)))
>>>
>>> It either crashes or doesn't access properly arr.gpudata. I think what
>>> happens is that a2 is not aligned when pushed into parameters stack as
>>> expected by the kernel declaration in CUDA:
>>>
>>> __global__ void kernel(int k, int l, double2 a2, ..., double2 *arr) {
>>>
>>> if instead,
>>> struct my_double2 {double x,y;};
>>> __global__ void kernel(int k, int l, my_double2 a2, ..., double2 *arr) {
>>>
>>> then it works.
>>>
>>> What is the best way to pack arguments currently in PyCUDA?
>>
>> Try using the vector types:
>> http://documen.tician.de/pycuda/array.html#vector-types
>>
>> HTH,
>> Andreas
>>
>

On Sun, 6 Nov 2011 19:23:10 +1300, Igor <rychphd(a)gmail.com> wrote:
> Hi,
> How can I make a python type that corresponds to the device built-in
> double2 both in that it has x,y fields _and_ aligned on 16 and not 8
> bytes? I am passing it as an argument to the kernel that expects to
> receive double2 instead it receives whatever is derived from
>
> k = int(1)
> l = int(2)
>
> # how do I align the following
> dbl2 = [('x','float64'), ('y','float64')]
> a2 = np.array((-0.5,-0.5), dtype=dbl2)
> ...
> kernel(k, l, a2, ...,arr.gpudata, block=(int(16),int(16),int(1)))
>
> It either crashes or doesn't access properly arr.gpudata. I think what
> happens is that a2 is not aligned when pushed into parameters stack as
> expected by the kernel declaration in CUDA:
>
> __global__ void kernel(int k, int l, double2 a2, ..., double2 *arr) {
>
> if instead,
> struct my_double2 {double x,y;};
> __global__ void kernel(int k, int l, my_double2 a2, ..., double2 *arr) {
>
> then it works.
>
> What is the best way to pack arguments currently in PyCUDA?
Try using the vector types:
http://documen.tician.de/pycuda/array.html#vector-types
HTH,
Andreas

Hi,
How can I make a python type that corresponds to the device built-in
double2 both in that it has x,y fields _and_ aligned on 16 and not 8
bytes? I am passing it as an argument to the kernel that expects to
receive double2 instead it receives whatever is derived from
k = int(1)
l = int(2)
# how do I align the following
dbl2 = [('x','float64'), ('y','float64')]
a2 = np.array((-0.5,-0.5), dtype=dbl2)
...
kernel(k, l, a2, ...,arr.gpudata, block=(int(16),int(16),int(1)))
It either crashes or doesn't access properly arr.gpudata. I think what
happens is that a2 is not aligned when pushed into parameters stack as
expected by the kernel declaration in CUDA:
__global__ void kernel(int k, int l, double2 a2, ..., double2 *arr) {
if instead,
struct my_double2 {double x,y;};
__global__ void kernel(int k, int l, my_double2 a2, ..., double2 *arr) {
then it works.
What is the best way to pack arguments currently in PyCUDA?
Thanks
Igor

Hi, everybody,
I'm new to pycuda and I encounter problems when multiplying a Numpy
array with a GPUArray.
This kind of multiplication is used, for instance, in
hedge/hedge/discretization/__init__.py:67 in CUDA mode.
I believe the following script returns unexpected result:
----
# -*- coding: utf-8 -*-
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy
a = numpy.array([1.,2.,3.,4.]).astype(numpy.float32)
a_gpu = gpuarray.to_gpu(a)
b = a
b_gpu = gpuarray.to_gpu(b)
print "Numpy NP dot NP:", numpy.dot(a, b)
print "Numpy NP dot GPU:", numpy.dot(a, b_gpu)
print "Numpy GPU dot NP:", numpy.dot(a_gpu, b)
print "Numpy GPU dot GPU:", numpy.dot(a_gpu, b_gpu)
print "GPUArray GPU dot GPU:", gpuarray.dot(a_gpu, b_gpu).get()
----
will print:
Numpy NP dot NP: 30.0
Numpy NP dot GPU: [[ 1. 2. 3. 4.] [ 2. 4. 6. 8.] [ 3. 6. 9.
12.] [ 4. 8. 12. 16.]]
Numpy GPU dot NP: [[ 1. 2. 3. 4.] [ 2. 4. 6. 8.] [ 3. 6. 9.
12.] [ 4. 8. 12. 16.]]
Numpy GPU dot GPU: [ 1. 4. 9. 16.]
GPUArray GPU dot GPU: 30.0
In my opinion, the expected result would be:
Numpy NP dot NP: 30.0
Numpy NP dot GPU: 30.0
Numpy GPU dot NP: 30.0
Numpy GPU dot GPU: 30.0
GPUArray GPU dot GPU: 30.0
This can be obtained with the following (probably dirty) code:
----
def do_sum(data):
res = 0
if type(data).__name__ == 'GPUArray':
res = gpuarray.sum(data)
elif type(data).__name__ == 'ndarray':
for i in range(len(data)):
gpu_vec = data[i].get()
res += gpu_vec[i]
else:
res = data
return res
print "Numpy NP dot NP:", do_sum(numpy.dot(a, b))
print "Numpy NP dot GPU:", do_sum(numpy.dot(a, b_gpu))
print "Numpy GPU dot NP:", do_sum(numpy.dot(a_gpu, b))
print "Numpy GPU dot GPU:", do_sum(numpy.dot(a_gpu, b_gpu))
print "GPUArray GPU dot GPU:", gpuarray.dot(a_gpu, b_gpu).get()
----
Am I wrong?
I'm not sure about which one among PyCUDA's gpuarray.py and Numpy
should be responsible for calculating the sum...
What do you think?
Thanks in advance
--
Peter Potrowl