Dear all,
we are trying to implement a K nearest neighbor search on GPUs with
PyOpenCL. The goal of the algorithm is: For a given target point,
find the nearest points from a given set (training data). The distance
between two points is computed by the squared euclidean distance.
One of our implementations is a brute force approach, which aims
at processing big data sets in parallel, e.g. 1 million training data and
some millions of targets (test data). For every target point one kernel
instance is created which finds the k nearest points out of the
training points.
Our problem is the following. Everything works fine for small data sets
and the results are as expected on both GPU (GeForce GTX 650 with
nVidia Driver 313.09.) and CPU(Intel Core i5-3450 with AMD APP SDK)
running Ubuntu 12.10, PyOpenCL 2013.1-py2.7-linux-x86_64.
But if we increase the size of the data sets, the GPU version crashes
with the following error:
> File "brutegpu.py", line 65, in query
> cl.enqueue_copy(self.queue, d_min, self.d_min_buf).wait()
> File "/usr/local/lib/python2.7/dist-packages/
> pyopencl-2013.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
> line 935, in enqueue_copy
> return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
> pyopencl.LogicError: clEnqueueReadBuffer failed: invalid command queue
The CPU-Version still works fine with 1 million training points
and 1 million of test points. Attached you can find the corresponding
source code as working minimal example, which consists of on
Host-Python-File
and one OpenCL-Kernel-File.
We would highly apprecriate any help - maybe we made a
mistake which is already known to you.
So the big question for us is: Why is it working on CPU and why isn't it
working on the GPU?
Are there nVidia-specific pitfalls for such big data sets?
The compiler says:
> ptxas info : Compiling entry function 'find_knn' for 'sm_30'
> ptxas info : Function properties for find_knn
> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 17 registers, 336 bytes cmem[0], 4 bytes cmem[3]
Or are there any rules for using a kernel for big data sets such as setting
the work group sizes or maximum memory usage?
The error message "invalid command queue" is confusing and I wasn't able
to find any helpful information (except that oftentimes "invalid command
queue" means segfault, but i could not find any wrong array adress yet.)
Maybe one of you could have a look at our code and finds some stupid
mistake.
We would be very grateful for every hint.
Best regards,
Justin Heinermann,
University Oldenburg
I have a *very* specific bug in pyopencl: When I use round(88.9f) with
pyopencl from git (2014-04-08) on a Nvidia Ti780, it will give me 88.0
instead of 89.0.
- If I change the gfx card to my older GTX590, it will work.
- If I round doubles instead of floats, it works on both cards.
- If I write a test in C++, it works on both cards.
Can anyone with a Ti780 confirm this bug?
Side info: I have compiled pyopencl with ENABLE_GL=True and
CL_PRETEND_VERSION="1.1" due to missing clCreateSubDevices in nvidia OpenCL.
OS is Ubuntu 14.04.
My test is:
import pyopencl as cl
import numpy as np
ctx = cl.create_some_context()
que = cl.CommandQueue(ctx)
cl_prg = cl.Program(ctx, "__kernel void doit(__global float *a) { a[0] =
round(88.9f); }").build()
a = np.zeros(1, dtype=np.float32); A = cl.Buffer(ctx,
cl.mem_flags.READ_WRITE | cl.mem_flags.COPY_HOST_PTR, hostbuf=a)
cl_prg.doit(que, [1], None, A); que.finish()
cl.enqueue_copy(que, a, A)
print a[0]
"Keith R. Brafford" <keith.brafford(a)gmail.com> writes:
> Wait...it was my fault! I forgot I had commented out the context
> acquisition line and replaced it with my own before I realized I had a
> wrongly compiled pyopencl:
>
> #context = cl.Context(properties=[(cl.context_properties.PLATFORM,
> platform)] + get_gl_sharing_context_properties())
> context = cl.Context(properties=[(cl.context_properties.PLATFORM,
> platform)] )
>
> I returned it to its correct state and it works fine now.
>
> Any idea why the pip thing doesn't do it for me though?
Hmm, I did have a typo in the file name. It should be ".aksetup-defaults.py".
Andreas
Wait...it was my fault! I forgot I had commented out the context
acquisition line and replaced it with my own before I realized I had a
wrongly compiled pyopencl:
#context = cl.Context(properties=[(cl.context_properties.PLATFORM,
platform)] + get_gl_sharing_context_properties())
context = cl.Context(properties=[(cl.context_properties.PLATFORM,
platform)] )
I returned it to its correct state and it works fine now.
Any idea why the pip thing doesn't do it for me though?
--Keith Brafford
On Tue, Apr 29, 2014 at 10:24 PM, Keith R. Brafford <
keith.brafford(a)gmail.com> wrote:
> I tried that, and no luck:
>
> >>> pyopencl.have_gl()
>
> False
>
> Then I did:
>
> pip uninstall pyopencl
>
> followed by:
>
> git clone http://git.tiker.net/trees/pyopencl.git
>
> cd pyopencl
>
> git submodule init
>
> git submodule update
>
> python configure.py --cl-enable-gl
>
> python setup.py build
>
> make
>
> python setup.py install
>
> And now I get this:
>
> >>> pyopencl.have_gl()
>
> True
>
>
> So...Yay! I can run gl_interop_demo.py and it works well, but when I try
> to run the particle demo I get a "Segfault 11" error.
>
> Segmentation fault: 11
>
> with a "Python quit unexpectedly" message box popping up.
>
> Any ideas?
>
> --Keith Brafford
>
>
> On Tue, Apr 29, 2014 at 10:00 PM, Andreas Kloeckner <
> lists(a)informa.tiker.net> wrote:
>
>> "Keith R. Brafford" <keith.brafford(a)gmail.com> writes:
>>
>> > I am trying to use pip to install pyopencl on OSX, but I can't get
>> > pyopencl.have_gl() to return True.
>> >
>> > How can I tell pip to compile the module such that I can get gl
>> > interoperability?
>>
>> Create a $HOME/.aksetup-defualts.py with one line:
>>
>> CL_HAVE_GL = True
>>
>> Then run the pip install. That should do it. If not, come back and yell
>> at me.
>>
>> Andreas
>>
>
>
"Keith R. Brafford" <keith.brafford(a)gmail.com> writes:
> I am trying to use pip to install pyopencl on OSX, but I can't get
> pyopencl.have_gl() to return True.
>
> How can I tell pip to compile the module such that I can get gl
> interoperability?
Create a $HOME/.aksetup-defualts.py with one line:
CL_HAVE_GL = True
Then run the pip install. That should do it. If not, come back and yell
at me.
Andreas
I am trying to use pip to install pyopencl on OSX, but I can't get
pyopencl.have_gl() to return True.
How can I tell pip to compile the module such that I can get gl
interoperability?
Thanks,
--Keith Brafford
Hi all,
IPython integration for PyOpenCL has just landed in git. Documentation
and an expample here:
http://documen.tician.de/pyopencl/misc.html#ipython-integration
The idea is that you can define CL kernels in a special type of cell in
the IPython notebook, and when you evaluate that, you automatically get
all the kernels in your cell as objects to play with in your IPython
notebook.
Got comments? suggestions? Don't hold back!
Andreas
Hello, I'm knew to PyOpenCL
After searching for a while, I couldn't find some bignum implementation
for OpenCL in C.
But fortunately, I'm more experienced with python where the only limit
for the size of numbers is the whole RAM.
How I may use PyOpenCL to get CPUs intensive computations done on the
GPU by OpenCL?
regards.
On 27/04/2014 19:02, William Shipman wrote:
> I'm not aware of any arbitrary precision arithmetic libraries either, and I
> do wonder if they would be worthwhile on a GPU. But there are alternatives.
>
> You can use double-double (128 bit) and quad-double (256 bit) precision.
> Every number gets represented using two or four double precision numbers,
> roughly doubling (or quadrupling) the number of accurate digits. Every
> operation then takes these two (four) components for every number. You
> should have a look at http://crd-legacy.lbl.gov/~dhbailey/mpdist/ for C++
> and Fortran implementations of double-double and quad-double precision
> arithmetic. Of course, you will have to turn this into OpenCL code.
> Please check the license documentation that is posted at this link to make
> sure that the license is compatible with your intended uses.
>
>
>
> On 27 April 2014 19:00, Cellier <lcellier(a)lycee-joliverie.fr> wrote:
>
>> I don't need the basic operations (such as logab + - ) to be done in
>> parallel. I just want to perform operations on them. It doesn't matter if
>> it is done serially.
>> I need to parallelize things at an higher level : see
>> https://fr.wikipedia.org/wiki/Wikipédia:Oracle/semaine_17_
>> 2014#Math.C3.A9matiques_:_Trouver_des_nombres_premiers_
>> qui_loge_sur_512_Ko_en_binaire. (sorry for the language).
>>
>>
>> _______________________________________________
>> PyOpenCL mailing list
>> PyOpenCL(a)tiker.net
>> http://lists.tiker.net/listinfo/pyopencl
>>
>
256 bit is far from 512288 bits.
I expect performance by doing thousands miller-rabin test in parallel,
so it doesn't matter if that's done serially.
I'm also asking how I can use python integers (which don't are only
limited by the size of the RAM) within the kernel, with PyOpenCL.