Invalid command Queue when using big data sets on nVidia
by Justin Heinermann
Dear all,
we are trying to implement a K nearest neighbor search on GPUs with
PyOpenCL. The goal of the algorithm is: For a given target point,
find the nearest points from a given set (training data). The distance
between two points is computed by the squared euclidean distance.
One of our implementations is a brute force approach, which aims
at processing big data sets in parallel, e.g. 1 million training data and
some millions of targets (test data). For every target point one kernel
instance is created which finds the k nearest points out of the
training points.
Our problem is the following. Everything works fine for small data sets
and the results are as expected on both GPU (GeForce GTX 650 with
nVidia Driver 313.09.) and CPU(Intel Core i5-3450 with AMD APP SDK)
running Ubuntu 12.10, PyOpenCL 2013.1-py2.7-linux-x86_64.
But if we increase the size of the data sets, the GPU version crashes
with the following error:
> File "brutegpu.py", line 65, in query
> cl.enqueue_copy(self.queue, d_min, self.d_min_buf).wait()
> File "/usr/local/lib/python2.7/dist-packages/
> pyopencl-2013.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
> line 935, in enqueue_copy
> return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
> pyopencl.LogicError: clEnqueueReadBuffer failed: invalid command queue
The CPU-Version still works fine with 1 million training points
and 1 million of test points. Attached you can find the corresponding
source code as working minimal example, which consists of on
Host-Python-File
and one OpenCL-Kernel-File.
We would highly apprecriate any help - maybe we made a
mistake which is already known to you.
So the big question for us is: Why is it working on CPU and why isn't it
working on the GPU?
Are there nVidia-specific pitfalls for such big data sets?
The compiler says:
> ptxas info : Compiling entry function 'find_knn' for 'sm_30'
> ptxas info : Function properties for find_knn
> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 17 registers, 336 bytes cmem[0], 4 bytes cmem[3]
Or are there any rules for using a kernel for big data sets such as setting
the work group sizes or maximum memory usage?
The error message "invalid command queue" is confusing and I wasn't able
to find any helpful information (except that oftentimes "invalid command
queue" means segfault, but i could not find any wrong array adress yet.)
Maybe one of you could have a look at our code and finds some stupid
mistake.
We would be very grateful for every hint.
Best regards,
Justin Heinermann,
University Oldenburg
5 years, 3 months
Segmentation fault in pyopencl.image_from_array
by Jerome Kieffer
Dear Python/OpenCL community,
I am pretty new (py)opencl and encountered a problem, maybe it a lack of understanding of openCL, but I found strange python seg-faults:
test program:
#!/usr/bin/python
import numpy, pyopencl
ctx = pyopencl.create_some_context()
data=numpy.random.random((1024,1024)).astype(numpy.float32)
img = pyopencl.image_from_array(ctx, ary=data, mode="r", norm_int=False, num_channels=1)
print img
System: debian sid: pyopencl2012.1 (the same code works with debian stable and v2011.2)
Here is the backtrace obtained with GDB:
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff340c253 in pyopencl::create_image_from_desc(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#2 0x00007ffff342de36 in _object* boost::python::detail::invoke<boost::python::detail::install_holder<pyopencl::image*>, pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>, boost::python::arg_from_python<unsigned long>, boost::python::arg_from_python<_cl_image_format const&>, boost::python::arg_from_python<_cl_image_desc&>, boost::python::arg_from_python<boost::python::api::object> >(boost::python::detail::invoke_tag_<false, false>, boost::python::detail::install_holder<pyopencl::image*> const&, pyopencl::image* (*&)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>&, boost::python::arg_from_python<unsigned long>&, boost::python::arg_from_python<_cl_image_format const&>&, boost::python::arg_from_python<_cl_image_desc&>&, boost::python::arg_from_python<boost::python::api::object>&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#3 0x00007ffff342e06f in boost::python::detail::caller_arity<5u>::impl<pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::detail::constructor_policy<boost::python::default_call_policies>, boost::mpl::vector6<pyopencl::image*, pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object> >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#4 0x00007ffff311715b in boost::python::objects::function::call(_object*, _object*) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#5 0x00007ffff3117378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#6 0x00007ffff3120593 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffff3445983 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pyopencl::error, void (*)(pyopencl::error const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pyopencl::error const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007ffff3120373 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffff3115635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
Thanks for your help.
If you are not able to reproduce this bug, I should mention it to debian.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
5 years, 10 months
PyOpenCL on PyPy
by Matti Picus
Hi.
I am interested in getting PyOpenCL to work with PyPy, an
implementation of cpython with a JITwww.pypy.org . Has there been any
discussion or thought about doing this? PyPy has a basic
implementation of numpy called numpypy that I contribute to, and it
has a rudimentary numpy-compatible c interface available as an
external module at
https://bitbucket.org/antocuni/numpypy_c
The PyPy team has a cpython-compatible replacement for ctypes called
cffi, that is jit-friendly on PyPy and no slower than ctypes on
cpython.
So it seems like all the pieces exist to start, is anyone else interested in
getting the work done?
Or are there blocking issues I do not understand?
Matti
6 years, 3 months
create Event from cl_event
by Gregor Thalhammer
Dear Andreas,
I am currently working on a cython based wrapper for the OpenCL FFT library from AMD: https://github.com/geggo/gpyfft
For this I need to create a pyopencl Event instance from a cl_event returned by the library. I attached a patch against recent pyopencl that adds this possibility, similar to the from_cl_mem_as_int() method of the MemoryObject class. Could you please add this to pyopencl.
Thanks for your help
Gregor
6 years, 4 months
Re: [PyOpenCL] __getitem__ for pyopencl array
by Andreas Kloeckner
Hi Alex,
Alex Nitz <alex.nitz(a)ligo.org> writes:
> I am mostly a pycuda user, but am investigating trying to use some of my
> codes with pyopencl. My codes make heavy use of the numpy-like array. I
> noticed that there doesn't seem to yet be a "__getitem__" function yet
> defined, although the buffer objects themselves have one.
>
> My needs are basically met by the version that is in pycuda, so I have
> created a short patch to add the same behavior to pyopencl. It is fairly
> limited in that it only supports 1-dimensional, non-strided slices. Is a
> more comprehensive functionality already in the works? If not, would it be
> possible to get this patch applied?
First of all, thanks for your contribution! I'm a bit hesitant to apply
this patch, because sub-buffers (which your patch implicitly uses, see
clCreateSubBuffer in the CL spec) are allowed to have alignment
requirements that make this routine fail. The better way to implement
this is to use the original buffer and store an offset to the intended
beginning of the data. I'll introduce this after the 2013.1 release,
which is due soon. (as soon as I sort out the current Mac trouble)
Andreas
6 years, 5 months
Re: [PyOpenCL] FFT
by Gregor Thalhammer
Am 30.6.2013 um 02:11 schrieb Alex Nitz:
> Hello All,
>
> I was wondering if anyone is currently doing FFT's in conjunction with pyopencl. I have used pyfft in the past, but it is a bit limited for my needs. Has anyone had luck interfacing to any of the preexisting fft libraries (amd's, apple's, etc) from within python?
>
I put together a wrapper for the AMD FFT library, which provides decent performance, see https://github.com/geggo/gpyfft
Gregor
> -Alex
> _______________________________________________
> PyOpenCL mailing list
> PyOpenCL(a)tiker.net
> http://lists.tiker.net/listinfo/pyopencl
6 years, 5 months
FFT
by Alex Nitz
Hello All,
I was wondering if anyone is currently doing FFT's in conjunction with
pyopencl. I have used pyfft in the past, but it is a bit limited for my
needs. Has anyone had luck interfacing to any of the preexisting fft
libraries (amd's, apple's, etc) from within python?
-Alex
6 years, 5 months
OpenCL shared memory issue.
by Jerome Kieffer
Dear PyOpenCL community,
I have two implementations of a same algorithm using shared memory and I do not understand why one works and not the other ?
The wrong one is the second !!!
hist is cyclic and we want to average over 3 neighbours, everything 6 times.
WORKGROUP_SIZE = 128 in this case
__local volatile float hist[36];
__local volatile float hist2[WORKGROUP_SIZE];
int lid0 = get_local_id(0);
/*
Apply smoothing 6 times
*/
for (j=0; j<6; j++) {
if (lid0 == 0) {
hist2[0] = hist[0]; //save unmodified hist
hist[0] = (hist[35] + hist[0] + hist[1]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (0 < lid0 && lid0 < 35) {
hist2[lid0]=hist[lid0];
hist[lid0] = (hist2[lid0-1] + hist[lid0] + hist[lid0+1]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (lid0 == 35) {
hist[35] = (hist2[34] + hist[35] + hist[0]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
for (j=0; j<3; j++) {
if (lid0 < 36 ) {
prev = (lid0 == 0 ? 35 : lid0 - 1);
next = (lid0 == 35 ? 0 : lid0 + 1);
hist2[lid0] = (hist[prev] + hist[lid0] + hist[next]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
if (lid0 < 36 ) {
prev = (lid0 == 0 ? 35 : lid0 - 1);
next = (lid0 == 35 ? 0 : lid0 + 1);
hist[lid0] = (hist2[prev] + hist2[lid0] + hist2[next]) / 3.0f;
}
barrier(CLK_LOCAL_MEM_FENCE);
}
Do you have an idea why the second version is wrong ??
we tested on 2 platforms (nvidia & AMD-CPU) and debugged the whole day :(
How can you debug this ?
Thanks a lot.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
6 years, 5 months
Re: [PyOpenCL] incompatibility with llvm opencl support on mac
by Andreas Kloeckner
Hi Bogdan, Shaun,
Bogdan Opanchuk <mantihor(a)gmail.com> writes:
> On Sat, Jan 26, 2013 at 1:50 PM, Andreas Kloeckner
> <lists(a)informa.tiker.net> wrote:
>> Can anyone with a Mac say something about whether these issues exist on
>> 10.8 (Mountain Lion?), too?
>
> I get no errors in test_wrapper.py and six errors (seem to be
> originating in two testcases times three available devices) in
> test_array.py:
> "CVMS_ERROR_COMPILER_FAILURE: CVMS compiler has crashed or hung
> building an element."
> and
> "Error getting function data from server".
First off, sorry for the thread necromancy. :)
I've recently done a bit of work on getting PyOpenCL to pass most of its
tests on OS X. There are only two or three "Error getting function data
from server" messages left (out of 80-odd tests), everything else seemed
to work on the (relatively old) MBP (5,5) on which I tried, on both GPU
and CPU. I'm about ready to call these bugs in Apple's implementation,
but I'm willing to listen to dissenting opinions.
If you could give recent git a shot and report back, I'd be much obliged.
Thanks,
Andreas
6 years, 5 months