[PyOpenCL] Learning OpenCL through PyOpenCL
patric.hol at gmail.com
Mon Feb 21 12:21:11 PST 2011
Thank you Andreas and Keith!
I think I just have a hard time getting my head around how I should
set up my workgroup size/local size, thread strides and so on, but
I'll get around it eventually I guess.
The basic examples covers how to do operations on vectors. I'm trying
to increase the dimensions and want to do operations on a matrix/array
of lets say Nx6 elements.
1. Is there anything wrong with the following approach? (syntax? am I
"thinking wrong" when passing arrays like this to the kernel?)
2. How can I split the other dimension between threads and workgroups
in an optimal way (like x[gid_1][gid_2] for example)? Example code
would be very appreachiated.
3. I suspect that I have a syntax error in the OpenCL code when
accessing elements, like x[gid] for example. What is the correct
(Notice that this might not be the best example as the calculations
could be reduced to vectors, but I will need to learn how to do
similiar stuf eventually.)
#### Python ####
N = 10**6
x = numpy.random.random(N,6).astype(numpy.float32) # Allocate a Nx6
array of random numbers
x_buf = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=x)
fx_buf = cl.Buffer(ctx, mf.WRITE_ONLY, x.nbytes)
prg.multidim_mc(queue, x.shape, None, x_buf, fx_buf, numpy.uint32(N))
#### OpenCL Kernel ####
prg = cl.Program(ctx, """
__kernel void multidim_mc(__global const float *x, __global float
*fx, uint const N)
int gid = get_global_id(0);
if(gid >= N)
fx[gid] = (x[gid]-x[gid])^2 + (x[gid]-x[gid])^2 +
Thanks a lot for a great community!
On Sun, Feb 20, 2011 at 5:40 PM, Keith Brafford
<keith.brafford at gmail.com> wrote:
> Patric Holmvall <patric.hol at ...> writes:
>> Can someone give me
>> pointers on how I find optimum workgroup size, threads and so on for
>> Radeon HD5000 series (HD5850, Stream Processors: 1440) and AMD Phenom
>> II X4 (965BE, 3.4 GHz quadcore) for example 10^9 calculations (10^9
>> dimensional vectors that I operate on)?
>> (Or are those parameters governed by the hardware at all?)
> You should join the AMD OpenCL forum for those sorts of questions:
> My quick answer:
> 1) Your 1440 stream processors are really 288 VLIW processors with 5 slots, 4
> for single precision floating point and 1 fancy slot for other stuff:
> So for floats, you really have 1152 processors, if you can make heavy use of
> the float4 data type. Otherwise you only have 288 processors.
> 2) As far as workgroup size goes, your 288 VLIW processors are divided into 18
> "compute units" (the red rectangles that occupy most of the room in this
> picture, which shows 20 since it's the 5870):
> So if you make your workgroup size a multiple of 16 it will fit nicely into the
> architecture. But remember, if you are able to use the float4 data type, you
> will be able to do 64 pieces of work in 16 work items.
> --Keith Brafford
> PyOpenCL mailing list
> PyOpenCL at tiker.net
More information about the PyOpenCL