Thank you a lot for your answers Simone and Keith!
A separate repository for examples would be very welcome. In fact, if
I could get access to a large set of examples now somehow, I would be
delighted. Especially examples based for AMD/ATI hardware. I find that
I learn more when working with examples rigth now.
A few quick basic questions:
1. If I want to operate on a vector and assign the result to another
vector this is done beautifully in OpenCL, example: fx[gid] =
native_exp(-x[gid]). If I want to sum all the elements in the
resulting vector (fx) to a scalar, is this typically done on the GPU
in OpenCL too (changing it to: fx += native_exp(-x[gid]), or is this
something that is quicker/better to do on the CPU (eg.
numpy.sum(Array))? Is it even possible to sum all the calculations of
the independent work items to the same variable/element?
2. What I'm currently struggling with is how I shall incorporate
workgroup size/local size, thread strides etc and how this is governed
by a) amount of calculations and b) my hardware. Can someone give me
pointers on how I find optimum workgroup size, threads and so on for
Radeon HD5000 series (HD5850, Stream Processors: 1440) and AMD Phenom
II X4 (965BE, 3.4 GHz quadcore) for example 10^9 calculations (10^9
dimensional vectors that I operate on)?
(Or are those parameters governed by the hardware at all?)
Yours sincerely,
Patric
On Sun, Feb 20, 2011 at 6:37 AM, Simone Mannori
<simone.mannori(a)gmail.com> wrote:
Bonjour Patric,
On 19 February 2011 11:49, Patric Holmvall <patric.hol(a)gmail.com> wrote:
Dear mailing list recipients,
I'm currently working on my bachelor project where I am to do heavy
calculations that are very suited for parallel computation (quantum field
theory using path integrals and markov chain montecarlo). This will be done
with OpenCL in Python, thus using PyOpenCL. This is a quite problematic
approach though since I'm new to both OpenCL and Python.
PyOpenCL is the perfect environment to learn OpenCL programming, but
I'm not aware of a step-by-step tutorial based on pyopencl.
You should attack the problem orthogonally ("divide et impera"):
- learn python first. Python is a pleasure to learn and use. The web
is full of very complete and easy to follow tutorials;
- C (for OpenCL) is a bit more difficult to master. C means pointer
and pointer means trouble. Unfortunately you cannot bypass the C
pointers mechanics because they are intensivey used to exchange data
between the two environment. To make the situation worse, opencl
mechanics is not easy to understand because you need to understand the
SIMT (single instruction, multiple threads) computing model first
applied to GPU architecures. I have spent two weeks just to figure out
why my perfectly running code was so slow. NVIDIA and ATI/AMD
implementations are not perfectly equivalent, so you should learn also
how to avoid some pitfalls.
The best tutorial that I have found on the web is this one:
http://developer.amd.com/zones/OpenCLZone/universities/Pages/default.aspx
This is a very generic tutorial that cover AMD/ATI/CELL architectures.
But it is not enough: you will be obliged to read the OpenCL manuals
put on line by NVIDIA and AMD/ATI. This lecture is a must also because
not all the OpenCL devices are made equals: you need to understand
some gory details about hardware ("no guts, no glory").
The biggest bifurcation point is: "Do you need double precision
support (64 bits) _or_ single precision (32 bits) is enough for your
application?".
I'm doing most of my code prototyping using single precision (float)
but, at the end, I will be obliged to switch to double precision in
order to show coherency between the new simulations running on GPU and
the old one running on CPU.
Why? Because
PyOpenCL is a shortcut that blocks me from directly applying 99% of all the
material/tutorials online, since they follow implementation of OpenCL in a C
environment. For every thing I learn about OpenCL, I have to "translate" it
to PyOpenCL. It is possible with this approach, however it feels like a very
slow progress that takes a lot of effort. Since I can't seem to find any
tutorials on learning OpenCL through PyOpenCL on the web, I wonder if anyone
on the mailing list knows of such tutorials? Learning Python isn't the issue
here, and the approach isn't up for change. The question is HOW I'm going to
learn/do this approach in an efficient way.
Finally, (for the moment) GPU programming (also using pyopencl) is
like driving an F1 race car:
you need to invest a lot of time in test driving and tuning. Most of
the thing that you know about "car" does not work on a F1.
A question for opencl development team:
"Is it possible to have the right to commit some - # commented -
example code inside the project?".
Another possibility is to create a completely separate code repository
for the "examples": a lot of people like me will be able to contribute
with examples / application code without take the risk to commit
garbage inside the "core" C/C++/Python engine.
Thanks to all to keep running this fantastic Open Source project :-)
Simone Mannori