Dear all,
we are trying to implement a K nearest neighbor search on GPUs with
PyOpenCL. The goal of the algorithm is: For a given target point,
find the nearest points from a given set (training data). The distance
between two points is computed by the squared euclidean distance.
One of our implementations is a brute force approach, which aims
at processing big data sets in parallel, e.g. 1 million training data and
some millions of targets (test data). For every target point one kernel
instance is created which finds the k nearest points out of the
training points.
Our problem is the following. Everything works fine for small data sets
and the results are as expected on both GPU (GeForce GTX 650 with
nVidia Driver 313.09.) and CPU(Intel Core i5-3450 with AMD APP SDK)
running Ubuntu 12.10, PyOpenCL 2013.1-py2.7-linux-x86_64.
But if we increase the size of the data sets, the GPU version crashes
with the following error:
> File "brutegpu.py", line 65, in query
> cl.enqueue_copy(self.queue, d_min, self.d_min_buf).wait()
> File "/usr/local/lib/python2.7/dist-packages/
> pyopencl-2013.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
> line 935, in enqueue_copy
> return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
> pyopencl.LogicError: clEnqueueReadBuffer failed: invalid command queue
The CPU-Version still works fine with 1 million training points
and 1 million of test points. Attached you can find the corresponding
source code as working minimal example, which consists of on
Host-Python-File
and one OpenCL-Kernel-File.
We would highly apprecriate any help - maybe we made a
mistake which is already known to you.
So the big question for us is: Why is it working on CPU and why isn't it
working on the GPU?
Are there nVidia-specific pitfalls for such big data sets?
The compiler says:
> ptxas info : Compiling entry function 'find_knn' for 'sm_30'
> ptxas info : Function properties for find_knn
> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 17 registers, 336 bytes cmem[0], 4 bytes cmem[3]
Or are there any rules for using a kernel for big data sets such as setting
the work group sizes or maximum memory usage?
The error message "invalid command queue" is confusing and I wasn't able
to find any helpful information (except that oftentimes "invalid command
queue" means segfault, but i could not find any wrong array adress yet.)
Maybe one of you could have a look at our code and finds some stupid
mistake.
We would be very grateful for every hint.
Best regards,
Justin Heinermann,
University Oldenburg
Dear Python/OpenCL community,
I am pretty new (py)opencl and encountered a problem, maybe it a lack of understanding of openCL, but I found strange python seg-faults:
test program:
#!/usr/bin/python
import numpy, pyopencl
ctx = pyopencl.create_some_context()
data=numpy.random.random((1024,1024)).astype(numpy.float32)
img = pyopencl.image_from_array(ctx, ary=data, mode="r", norm_int=False, num_channels=1)
print img
System: debian sid: pyopencl2012.1 (the same code works with debian stable and v2011.2)
Here is the backtrace obtained with GDB:
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff340c253 in pyopencl::create_image_from_desc(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#2 0x00007ffff342de36 in _object* boost::python::detail::invoke<boost::python::detail::install_holder<pyopencl::image*>, pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>, boost::python::arg_from_python<unsigned long>, boost::python::arg_from_python<_cl_image_format const&>, boost::python::arg_from_python<_cl_image_desc&>, boost::python::arg_from_python<boost::python::api::object> >(boost::python::detail::invoke_tag_<false, false>, boost::python::detail::install_holder<pyopencl::image*> const&, pyopencl::image* (*&)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>&, boost::python::arg_from_python<unsigned long>&, boost::python::arg_from_python<_cl_image_format const&>&, boost::python::arg_from_python<_cl_image_desc&>&, boost::python::arg_from_python<boost::python::api::object>&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#3 0x00007ffff342e06f in boost::python::detail::caller_arity<5u>::impl<pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::detail::constructor_policy<boost::python::default_call_policies>, boost::mpl::vector6<pyopencl::image*, pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object> >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#4 0x00007ffff311715b in boost::python::objects::function::call(_object*, _object*) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#5 0x00007ffff3117378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#6 0x00007ffff3120593 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffff3445983 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pyopencl::error, void (*)(pyopencl::error const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pyopencl::error const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007ffff3120373 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffff3115635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
Thanks for your help.
If you are not able to reproduce this bug, I should mention it to debian.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
Hi.
I am interested in getting PyOpenCL to work with PyPy, an
implementation of cpython with a JITwww.pypy.org . Has there been any
discussion or thought about doing this? PyPy has a basic
implementation of numpy called numpypy that I contribute to, and it
has a rudimentary numpy-compatible c interface available as an
external module at
https://bitbucket.org/antocuni/numpypy_c
The PyPy team has a cpython-compatible replacement for ctypes called
cffi, that is jit-friendly on PyPy and no slower than ctypes on
cpython.
So it seems like all the pieces exist to start, is anyone else interested in
getting the work done?
Or are there blocking issues I do not understand?
Matti
Dear Andreas,
I am currently working on a cython based wrapper for the OpenCL FFT library from AMD: https://github.com/geggo/gpyfft
For this I need to create a pyopencl Event instance from a cl_event returned by the library. I attached a patch against recent pyopencl that adds this possibility, similar to the from_cl_mem_as_int() method of the MemoryObject class. Could you please add this to pyopencl.
Thanks for your help
Gregor
Hi Alex,
Alex Nitz <alex.nitz(a)ligo.org> writes:
> I am mostly a pycuda user, but am investigating trying to use some of my
> codes with pyopencl. My codes make heavy use of the numpy-like array. I
> noticed that there doesn't seem to yet be a "__getitem__" function yet
> defined, although the buffer objects themselves have one.
>
> My needs are basically met by the version that is in pycuda, so I have
> created a short patch to add the same behavior to pyopencl. It is fairly
> limited in that it only supports 1-dimensional, non-strided slices. Is a
> more comprehensive functionality already in the works? If not, would it be
> possible to get this patch applied?
First of all, thanks for your contribution! I'm a bit hesitant to apply
this patch, because sub-buffers (which your patch implicitly uses, see
clCreateSubBuffer in the CL spec) are allowed to have alignment
requirements that make this routine fail. The better way to implement
this is to use the original buffer and store an offset to the intended
beginning of the data. I'll introduce this after the 2013.1 release,
which is due soon. (as soon as I sort out the current Mac trouble)
Andreas
Pedro Marcal <pedrovmarcal(a)gmail.com> writes:
> Can someone please show me how to pass a dict down to the GPU and access it
> with a C code? I have a 100MByte NLP dict I would like to access.
Dear Pedro,
that'll require a bit more work than just saying 'transfer this', for a
number of reasons.
- First, dicts (and generally most Python data structures) are very
pointer-heavy. But the GPU has a distinct memory space and thus
pointers are invalidated when transferring.
- Next, dicts rely on the Python run time system, which is available on
the host, but not the CL device (GPU).
- Finally, and most fundamentally, GPUs like data structures that are
compatible with data-parallel computing. dicts don't quite fit the bill.
But you might be able to build something using this recently added
PyOpenCL functionality:
http://documen.tician.de/pyopencl/algorithm.html#building-many-variable-siz…
Hope that helps,
Andreas
Can someone please show me how to pass a dict down to the GPU and access it
with a C code? I have a 100MByte NLP dict I would like to access.
Thanks,
Pedro
Dnia 2013-05-18, sob o godzinie 16:35 -0400, a cow-like object pisze:
> Hi all,
>
>
> Any idea how I could get access to my HD4000 in Debian? PyOpenCL
> seems to be working fine using the 'AMD Accelerated Parallel
> Processing' device, which I'm guessing is my CPU? But I'd like to be
> able to use the HD4000.
>
>
> I installed beignet0.0.1 and beignet-dev packages and naively created
> a cl.icd file under /etc/OpenCL/vendors that had just a 'libcl.so'
> entry (and temporarily removed amdocl64.icd), but then got this error:
>
>
> 1 cl.get_platforms()
> LogicError: clGetPlatformIDs failed: platform not found khr
>
I've tried to use beignet (0.0.1, available in Debian) but failed.
I've tried to use it on both AMD and Intel CPUs and on AMD I got
"Bad instruction" signal and program is killed, and on Intel
I got the same error as you.
It's a pity that it does not work, but I do not intend to investigate
it for the time being. If you find some solution please write to the
list so we can test PyOpenCL with another library.
Best regard.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi all,
Any idea how I could get access to my HD4000 in Debian? PyOpenCL seems to
be working fine using the 'AMD Accelerated Parallel Processing' device,
which I'm guessing is my CPU? But I'd like to be able to use the HD4000.
I installed beignet0.0.1 and beignet-dev packages and naively created a
cl.icd file under /etc/OpenCL/vendors that had just a 'libcl.so' entry (and
temporarily removed amdocl64.icd), but then got this error:
*1 cl.get_platforms()*
*LogicError: clGetPlatformIDs failed: platform not found khr*
Any steps on how to get Beignet working would be greatly appreciated.
Thank you.
a cow-like object <acowlikeobject(a)gmail.com> writes:
> Many thanks, Andreas. It did run fine with asserts disabled/removed.
>
> I'm very new to this, so pardon my ignorance, but what is knl an instance
> of? The original code doesn't actually have a knl. And how will the
> answer verify your hunch?
Ah, sorry. I meant:
print prg.sum.num_args
I was thinking of "knl = prg.sum" as an instance of cl.Kernel.
Andreas