I read your documentation. The project is more then just a collection
of implementation. How much useful it is to abstract between PyCUDA
and PyOpenCL? Personnaly, I probably won't use that part, but I want
to abstract between CUDA and OpenCL.
I like the idea of making code generator that do transformation on the
input before doing other computation. This is something I wanted
Theano code generator to do, but I never got time to implement it.
What the current parameter derive_s_from_lp and derive_lp_from_s mean?
Also the code section is not something I call readable... Is is only
because I never used Mako? Andreas, I think you used mako, do you find
I'm not sure that forcing people to use Mako is a good idea. Can we do
I still think that we need to provide the user not just with a common
gpu nd array object. We need to also provide fonctions on it. But I'm
not sure how we should do this.
Andreas, do you have an idea?
On Wed, Jul 18, 2012 at 10:29 AM, Bogdan Opanchuk <mantihor(a)gmail.com> wrote:
> Hi all,
> Some of you may remember compyte discussions last year when I made the
> suggestion of creating a library with a compilation of GPGPU
> algorithms, working both with PyOpenCL and PyCuda. Long story short, I
> have finally found some time and created a prototype. The preliminary
> tutorial can be found at http://tigger.publicfields.net/tutorial.html
> and the project itself at https://github.com/Manticore/tigger . The
> examples are working and those few tests I have are running. The code
> in tigger.core is a mess, but I'm working on it.
> At this stage this library is a prototype (or even a proof of concept)
> whose fate is not sealed. My current plans are to refactor tigger.core
> and tigger.cluda (sorry for stealing the name, Andreas, I can change
> it :) over the course of a week or two and start filling it with
> actual algorithms. One of the first will be FFT, thus deprecating
> pyfft, list of other plans is in TODO.rst. On the other hand, the
> library could be made a part of compyte, although I'm not sure it'll
> fit its goals.
> Anyway, any sort of input is appreciated. Those who want to use the
> library for practical applications may want to wait for the next
> version, which is supposed to be somewhat stable.
> PyCUDA mailing list
first, please make sure to keep the list cc'd for archival on replies.
Alexander Kiselyov <yl3gdy(a)archlinux.us> writes:
> So how to address them correctly? I thought that it would be sufficient to
> copy to a buffer a list of 3 item lists, tell that global size is the
> length of that list, and address elements of the array in OpenCL code as
> .x, .y, .z, using usual get_global_id(0) values as indexes. Am I wrong?
> Specification don't say about any peculiarities with 3-vectors.
Use pyopencl.array.vec.double3 as the dtype.
> Also I forgot to notice that, when running the code on Intel CPU, quite
> often it segfaults after finishing calculations (about 1 in 3 times).
Sure, that would explain that, too.
I've just observed super strange OpenCL behaviour with double3 with Intel
and Nvidia SDK. It can be shown in the following example. I sent an array
of double3 (in __global) and a double value (in __const) to device, added
the value to x and y components of each array item. It showed that only a
half of items were really increased. I'm really puzzled if it was my
obscure error. Python code was used before and worked correctly.
Here (http://pastebin.com/40e5cya3) is the kernel, here
(http://pastebin.com/yjfZRxVL) is my Python code for running OpenCL. Note
that printf's in kernel aren't triggered during runtime.
Using Arch Linux 64-bit, Intel OpenCL SDK version 2012, Nvidia SDK with
drivers 302.17, pyopencl 2012.1. Both SDK support OpenCL 1.1 specification.
I wonder how to use double3 properly and what I'm doing wrong.
please make sure to keep the list cc'd for archival.
Andrew Miller <ajm09c(a)acu.edu> writes:
> Thanks for all your work on this, these are excellent tools!
> I'm sorry that I keep bringing up errors, but it seems that the dot product
> has some unusual behavior when a combination of float and complex arrays
> are used.
> PyOpenCL gives an incorrect answer when calculating dot on a complex and
> float array, and an error when using a float and complex array (just
> switching the order of the arguments).
This should be fine with all possible type combinations now (in the git tree).
Thanks for the report!
Some of you may remember compyte discussions last year when I made the
suggestion of creating a library with a compilation of GPGPU
algorithms, working both with PyOpenCL and PyCuda. Long story short, I
have finally found some time and created a prototype. The preliminary
tutorial can be found at http://tigger.publicfields.net/tutorial.html
and the project itself at https://github.com/Manticore/tigger . The
examples are working and those few tests I have are running. The code
in tigger.core is a mess, but I'm working on it.
At this stage this library is a prototype (or even a proof of concept)
whose fate is not sealed. My current plans are to refactor tigger.core
and tigger.cluda (sorry for stealing the name, Andreas, I can change
it :) over the course of a week or two and start filling it with
actual algorithms. One of the first will be FFT, thus deprecating
pyfft, list of other plans is in TODO.rst. On the other hand, the
library could be made a part of compyte, although I'm not sure it'll
fit its goals.
Anyway, any sort of input is appreciated. Those who want to use the
library for practical applications may want to wait for the next
version, which is supposed to be somewhat stable.
On Monday 16 Jul 2012 07:58:31 Andreas Kloeckner wrote:
> Alex Leach <albl500(a)york.ac.uk> writes:
> > For the foreseeable future then, will OpenCL-empowered software only be
> > available for people willing (and able) to compile from source code?
> Let me be even more explicit here. The compiler for the compute device
> is *part* of the OpenCL implementation. If you have a device that
> supports OpenCL, you have a compiler for that device (*). And that
> compiler is completely independent of the compiler you use for 'host
> code'. It's right there in the libcuda.so (libnvidia-compiler.so
> really), libintelocl64.so, libamdocl64.so, or what have you. Many of
> these compilers don't even fork a new process to do their thing.
> (*) At least theoretically, it's possible to ship CL implementations
> without a compiler, but all of the ones I'm aware of do.
It does, thanks!
ps. Sorry, wrote a couple of replies to Bogdan and Tomasz, which bypassed the
On Monday 16 Jul 2012 21:56:13 Bogdan Opanchuk wrote:
> Hi Alex,
> On Mon, Jul 16, 2012 at 9:42 PM, Alex Leach <albl500(a)york.ac.uk> wrote:
> > Thanks; that helps a lot in clearing things up, but does lead me to ask
> > another question..
> > Once PyOpenCL is built, does it still require the external C/C++ compiler,
> > or is the OpenCL compiler included in the PyOpenCL build?
> No, PyOpenCL builds a binary library which links to the OpenCL library
> (also binary, part of the driver). After that, Python calls are routed
> through this library to the driver without any need for external
Thanks for the explanation. Your replies (Tomasz' and Andreas' included) are
clearing things up nicely!
> Although, regarding embedded systems, take into account what Tomasz
> said. I do not know much about embedded OpenCL, I only worked with it
> on Linux and Mac. It seems that embedded drivers do not have this
> compiler function in the API and you have to precompile kernels before
> deployment (but they still will not require additional compilation
> after that).
Embedded systems are a beast I'll leave for another day, or year even (as I
don't actually have any, other than an Android phone)
On Saturday 14 Jul 2012 14:34:52 Sean True wrote:
> On Sat, Jul 14, 2012 at 11:04 AM, Lucas Beyer
> <beyer(a)aices.rwth-aachen.de> wrote:
> > Yes, imagine the user replaces his old nVidia GPU by an AMD card he just
> > bought, now (in addition to all the other hassle) he will get very
> > strange error messages from your program and need to reinstall it.
> > This may sound far fetched but it happens.
> I think Luke has this right: OpenCL code needs to be recompiled for
> new GPU/CPU combinations
> regularly. You can't ship a precompiled or even compile-at-install
> version that will work well for
> general cases. The compiler has a pretty good caching mechanism. Using
> it is a good idea.
> -- Sean
Thanks for the replies. I did think after Andreas' email that even driver
updates would probably require recompiling OpenCL kernels. What I can't figure
out is how an OpenCL-empowered program could be distributed to your average
Windows, Mac or even tablet user; people who probably don't have a compiler or
development tools installed. Are these users completely outside the scope of
OpenCL? I thought one of the points of OpenCL is to overcome dissimilarities
in heterogeneous computing environments. Is that just on a source code level
The cache does look powerful, and from a developer's standpoint, I'm very
impressed with its thorough checks. But from a system administrator's point of
view, I thought it would be nice to just have a shared, importable library of
callable functions (i.e. kernels), much like a C, C++ or Cython extension
becomes, once built. I haven't before come across a desktop application that
requires a compiler installed, and can't think of any applications that
distribute gcc along with the application.
So my original perception of OpenCL seemed to be very naive; I assumed it'd be
similar to OpenGL (or my perception of it at least), in that programs like
Firefox etc. can utilise the GPU without (re-)compiling the program.
For the foreseeable future then, will OpenCL-empowered software only be
available for people willing (and able) to compile from source code?
I've cc'd the list for archival. In the future, please ask there directly.
Alex Leach <albl500(a)york.ac.uk> writes:
> I have had my eye on pyopencl for a while now, and finally have something worth
> writing in OpenCL (UPGMA algorithm), and wondered if it was possible to pre-
> compile kernels for later use.
> I've had a look through the wiki, documen.tician contents and read the
> Adventures in OpenCl tutorials a while ago, but don't think I've seen any
> specific mention of how to do this. (I imagine it's embarrassingly simple)
> Any pointers or help with doing this would be much appreciated.
The simple version of this is to simply run Program(...).build() once
for all your kernels. They'll be in PyOpenCL's binary cache at that
point, and recompiling the same kernel should be near-instantaneous
(really, however long the CL implementation you're using takes for
reloading a binary). If you want to do your own binary handling (why?),
to get the binaries of your kernel once compiled, and then use the
second form of the Program constructor to reload them:
Andrew Miller <ajm09c(a)acu.edu> writes:
> I've found some unexpected errors while working with PyOpenCL, specifically
> when using one-dimensional, complex arrays.
> pycuda.array.dot(a,b) doesn't work with a complex vector.
> pycuda.array.sum(a) also gives an error when a is complex
> Are these features that will be added in the future, or is there just not
> full support for complex vectors?
This works in git now.
Thanks for the report,