Hi,
I have a kernel as follows:
if (condition is true):# is always true
x=array[0,0];
if (cond2 is true):
x[i]++;
global_x[i]=x[i];
This gives me negative values in global_x. But when i declare x outside the
condition and copy x to global_x outside the if condition universally, i
get correct results. This made sense if my condition was not always true.
But since condition 1 is always true, can someone please explain why this
happens.
thanks
Aseem

my Pyopencl code works perfect when ran on CPU by looping on thread_id. but
when ran on gpu on aws p2.xlarge ubuntu, an array who is specifically
initialized with 0's returns -1 instead of 0's. this arrays is of type
ctypes.c_int.
is there some other datatype for c int that I should be using.
thanks
Aseem

Jérôme Parent <jerome.parent(a)lynceetec.com> writes:
> Hello,
>
> I am new to Opencl, I use to code using numpy and I am dealing with stack
> of images. At the end I have 3D float array of the size (m,n,n).
> I made my first customized kernel and it works like a charm and it's so
> fast, I love it!
>
> Now I am converting other algorithm and many of them has operation over a
> given axis. I also realized that I need to use scan.
>
> Now, I am stuck with cumsum along a given axis : data is a float array
> (m,n,n). I would like to compute like in numpy : numpy.cumsum(data, axis=0)
> to perform for every pixel of my 3D stack a cumsum along m direction
> In the Pyopencl documentation :
> https://documen.tician.de/pyopencl/a...edefined-scans
> <https://documen.tician.de/pyopencl/algorithm.html#predefined-scans>, there
> is an example :
>
> Code :
>
> knl = InclusiveScanKernel(context, np.int32, "a+b")
>
> n = 2**20-2**18+5
> host_data = np.random.randint(0, 10, n).astype(np.int32)
> dev_data = cl_array.to_device(queue, host_data)
>
> knl(dev_data)
> assert (dev_data.get() == np.cumsum(host_data, axis=0)).all()
>
>
> This code works for 1D input data, but I do not know how ot perform scan
> along a given direction of n dim array.
> I hope someone could help me and teach the right "opencl" approach for such
> operation
First of all, if there is enough parallelism available in the rest
(i.e. the remaining axes) of your array, you'd be crazy to use parallel
scan! "Embarrassing" parallelization of sequential for loops for scan
will beat a parallel scan any day of the week.
More generally, (i.e. if things aren't quite that simple, or if you
can't get the data layout to play ball), the best solution I can offer
is loopy [1], a code generator for computational code with multi-dimensional
arrays. It is capable of emitting and transforming scans (grep the
tests). (This will be better documented once the related paper is
submitted, hopefully this summer.)
[1] https://github.com/inducer/loopy
HTH,
Andreas

Hello,
I am new to Opencl, I use to code using numpy and I am dealing with stack
of images. At the end I have 3D float array of the size (m,n,n).
I made my first customized kernel and it works like a charm and it's so
fast, I love it!
Now I am converting other algorithm and many of them has operation over a
given axis. I also realized that I need to use scan.
Now, I am stuck with cumsum along a given axis : data is a float array
(m,n,n). I would like to compute like in numpy : numpy.cumsum(data, axis=0)
to perform for every pixel of my 3D stack a cumsum along m direction
In the Pyopencl documentation :
https://documen.tician.de/pyopencl/a...edefined-scans
<https://documen.tician.de/pyopencl/algorithm.html#predefined-scans>, there
is an example :
Code :
knl = InclusiveScanKernel(context, np.int32, "a+b")
n = 2**20-2**18+5
host_data = np.random.randint(0, 10, n).astype(np.int32)
dev_data = cl_array.to_device(queue, host_data)
knl(dev_data)
assert (dev_data.get() == np.cumsum(host_data, axis=0)).all()
This code works for 1D input data, but I do not know how ot perform scan
along a given direction of n dim array.
I hope someone could help me and teach the right "opencl" approach for such
operation
Thanks,
Jérôme