Invalid command Queue when using big data sets on nVidia
by Justin Heinermann
Dear all,
we are trying to implement a K nearest neighbor search on GPUs with
PyOpenCL. The goal of the algorithm is: For a given target point,
find the nearest points from a given set (training data). The distance
between two points is computed by the squared euclidean distance.
One of our implementations is a brute force approach, which aims
at processing big data sets in parallel, e.g. 1 million training data and
some millions of targets (test data). For every target point one kernel
instance is created which finds the k nearest points out of the
training points.
Our problem is the following. Everything works fine for small data sets
and the results are as expected on both GPU (GeForce GTX 650 with
nVidia Driver 313.09.) and CPU(Intel Core i5-3450 with AMD APP SDK)
running Ubuntu 12.10, PyOpenCL 2013.1-py2.7-linux-x86_64.
But if we increase the size of the data sets, the GPU version crashes
with the following error:
> File "brutegpu.py", line 65, in query
> cl.enqueue_copy(self.queue, d_min, self.d_min_buf).wait()
> File "/usr/local/lib/python2.7/dist-packages/
> pyopencl-2013.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
> line 935, in enqueue_copy
> return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
> pyopencl.LogicError: clEnqueueReadBuffer failed: invalid command queue
The CPU-Version still works fine with 1 million training points
and 1 million of test points. Attached you can find the corresponding
source code as working minimal example, which consists of on
Host-Python-File
and one OpenCL-Kernel-File.
We would highly apprecriate any help - maybe we made a
mistake which is already known to you.
So the big question for us is: Why is it working on CPU and why isn't it
working on the GPU?
Are there nVidia-specific pitfalls for such big data sets?
The compiler says:
> ptxas info : Compiling entry function 'find_knn' for 'sm_30'
> ptxas info : Function properties for find_knn
> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 17 registers, 336 bytes cmem[0], 4 bytes cmem[3]
Or are there any rules for using a kernel for big data sets such as setting
the work group sizes or maximum memory usage?
The error message "invalid command queue" is confusing and I wasn't able
to find any helpful information (except that oftentimes "invalid command
queue" means segfault, but i could not find any wrong array adress yet.)
Maybe one of you could have a look at our code and finds some stupid
mistake.
We would be very grateful for every hint.
Best regards,
Justin Heinermann,
University Oldenburg
5 years, 3 months
Segmentation fault in pyopencl.image_from_array
by Jerome Kieffer
Dear Python/OpenCL community,
I am pretty new (py)opencl and encountered a problem, maybe it a lack of understanding of openCL, but I found strange python seg-faults:
test program:
#!/usr/bin/python
import numpy, pyopencl
ctx = pyopencl.create_some_context()
data=numpy.random.random((1024,1024)).astype(numpy.float32)
img = pyopencl.image_from_array(ctx, ary=data, mode="r", norm_int=False, num_channels=1)
print img
System: debian sid: pyopencl2012.1 (the same code works with debian stable and v2011.2)
Here is the backtrace obtained with GDB:
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff340c253 in pyopencl::create_image_from_desc(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#2 0x00007ffff342de36 in _object* boost::python::detail::invoke<boost::python::detail::install_holder<pyopencl::image*>, pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>, boost::python::arg_from_python<unsigned long>, boost::python::arg_from_python<_cl_image_format const&>, boost::python::arg_from_python<_cl_image_desc&>, boost::python::arg_from_python<boost::python::api::object> >(boost::python::detail::invoke_tag_<false, false>, boost::python::detail::install_holder<pyopencl::image*> const&, pyopencl::image* (*&)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>&, boost::python::arg_from_python<unsigned long>&, boost::python::arg_from_python<_cl_image_format const&>&, boost::python::arg_from_python<_cl_image_desc&>&, boost::python::arg_from_python<boost::python::api::object>&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#3 0x00007ffff342e06f in boost::python::detail::caller_arity<5u>::impl<pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::detail::constructor_policy<boost::python::default_call_policies>, boost::mpl::vector6<pyopencl::image*, pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object> >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#4 0x00007ffff311715b in boost::python::objects::function::call(_object*, _object*) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#5 0x00007ffff3117378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#6 0x00007ffff3120593 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffff3445983 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pyopencl::error, void (*)(pyopencl::error const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pyopencl::error const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007ffff3120373 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffff3115635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
Thanks for your help.
If you are not able to reproduce this bug, I should mention it to debian.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
5 years, 10 months
About Beignet on Ivy-Bridge GPU
by Jerome Kieffer
Hello,
Just a small message to tell you the "beignet" opencl driver has been
released a bit earlier this week (version 0.3).
This driver is using the GPU integrated in the two last generation of Intel processors.
While I was not able to compile it, the debian team made a package which works. Thanks to them.
In [1]: import pyopencl
In [2]: ctx = pyopencl.create_some_context()
Choose platform:
[0] <pyopencl.Platform 'Intel(R) OpenCL' at 0x259dfc0>
[1] <pyopencl.Platform 'Experiment Intel Gen OCL Driver' at 0x7fee1dc2e020>
[2] <pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7fee199df520>
Choice [0]:1
Set the environment variable PYOPENCL_CTX='1' to avoid being asked again.
In [3]: queue = pyopencl.CommandQueue(ctx)
In [4]: import pyopencl.array, scipy.misc
In [9]: lgpu=pyopencl.array.to_device(queue, scipy.misc.lena().astype("float32"))
In [10]: inv_lena=255.0-lgpu
In [13]: ilena=inv_lena.get()
In [14]: ilena==255-scipy.misc.lena()
Out[14]:
array([[ True, True, True, ..., False, False, False],
[ True, True, True, ..., False, False, False],
[ True, True, True, ..., False, False, False],
...,
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False],
[False, False, False, ..., False, False, False]], dtype=bool)
It is not (yet) perfect but it starts to be useable.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
6 years, 1 month
Re: [PyOpenCL] About Beignet on Ivy-Bridge GPU
by Jerome Kieffer
On Sat, 26 Oct 2013 18:22:16 -0400
Tyler Hardin <tghardin1(a)catamount.wcu.edu> wrote:
> Thanks for the info. I was actually thinking about asking if PyOpenCL would
> work with beignet.
I noticed also screen image corruption, it looks like there is a memory
leak: OpenCL memory writing data in the video memory.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
6 years, 1 month
Many kernel calls under Nvidia leads to Memory Leak?
by David Higgins
Hi guys,
This is perhaps a tiny bit off topic, but the people here are the most
experienced at actually using OpenCL in the real world. I'm running a
simulation where I need to run the same OpenCL kernel an enormous number
of times (greater than 10^8). It's the exact same kernel, it's the
output of the previous kernel call transformed a tiny bit and fed back
for an update. My code runs perfectly under AMD GPU and CPU, Intel and
Apple GPU and CPU. But my compute cluster uses Nvidia under Linux and
that's where I run into problems.
If I run my code for long enough it eventually gobbles up all of the
memory. It takes approx 250,000,000 kernel calls for this to happen each
time but I need to run a simulation 2-3 times longer than this to get
meaningful results.
I've read that Nvidia prefers to handle resources in a way that seems a
little odd to anyone who's played with the Linux kernel but I guess is
somewhat in keeping with the OpenCL spec, if you're not so aware of how
things are done elsewhere. This conversation was a lot of help to me
http://www.khronos.org/message_boards/showthread.php/8777-Possible-Memory...
My question is, has anyone on the list experienced this problem before?
If you have, is it enough for me to actually implement an 'event'
argument for my buffer reads and kernel launches and then the release
this event after each utilisation? (It appears from the conversations
online that Nvidia may be allocating these event handles even if you
pass in a Null paramter to the calls and it's these event handles which
are using up all of the memory.) I intend trying this approach tomorrow,
but as the simulation takes over a day to reach memory saturation I'd
rather somebody with experience feeds back to me anything they know
about the issue.
Thanks a lot!
Dave.
6 years, 1 month
Installation issue ...
by Jerome Kieffer
Hello,
I know it is not the perfect place to ask this question but I am
struggling with the installation of openCL on an AMD graphics card
(Radeon 6670) under debian (unstable)
After re-installation and 2 subsequent upgrade from debian 6->7->sid ...
now the fglrx driver works, at least X starts and all OpenCL modules
are installed. all in version 13.4 from AMD.
When launching clinfo it just does nothing. Nothing in the logs, clinfo
can be killed without harm. It is the same story with pyopencl.create_some_context()
an strace of clinfo ends with:
poll([{fd=6, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=6, revents=POLLOUT}])
writev(6, [{"\235\7\3\0\0\0\0\0\271\0\0\0", 12}, {NULL, 0}, {"", 0}], 3[ProcFGLGetDriverData] Extension ATIFGLEXTENSION: wrong screen number
) = 12
poll([{fd=6, events=POLLIN}], 1, 4294967295
Does anybody of you have any idea ?
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
6 years, 1 month
Re: [PyOpenCL] CL-GL interoperability
by xhaju
On Tuesday 15 Oct 2013 11:05:05 Andreas Kloeckner wrote:
> xhaju.tm(a)gmail.com writes:
> > Hello,
> >
> > I'm having problems installing pyopencl-gl interoperability in my system:
> >
> > When I try to run the gl-interop example at pyopencl-2013.1/examples/
> > I get the following traceback:
> >
> > python gl_interop_demo.py
> >
> > Traceback (most recent call last):
> > File "gl_interop_demo.py", line 81, in <module>
> >
> > initialize()
> >
> > File "gl_interop_demo.py", line 41, in initialize
> >
> > + get_gl_sharing_context_properties(),
> >
> > File
> > "/usr/local/lib/python2.7/dist-packages/pyopencl-2013.1-py2.7-linux-
> >
> > x86_64.egg/pyopencl/tools.py", line 426, in
> > get_gl_sharing_context_properties>
> > (ctx_props.GL_CONTEXT_KHR, gl_platform.GetCurrentContext()))
> >
> > AttributeError: type object 'context_properties' has no attribute
> > 'GL_CONTEXT_KHR'
> >
> > I have tried adding the option --no-cl-enable-gl, but configure.py does
> > not accept it complaining that the option does not exist. Using
> > --cl-enable-gl succeeds, but the example fails, and
> > pyopencl.context_properties only has
> >
> > __module__ : pyopencl._cl
> > OFFLINE_DEVICES_AMD : 16447
> > __reduce__ : <Boost.Python.function object at 0x212e550>
> > to_string : <classmethod object at 0x21fb3d0>
> > PLATFORM : 4228
> > __doc__ : None
> > __init__ : <built-in function __init__>
>
> The first thing is to check if you're actually compiling with GL
> enabled--the traceback above doesn't make it seem that way.
>
> The way to check is to look in your 'siteconf.py' to see if
> "CL_ENABLE_GL" is set to "True". If not, set it to that, 'rm -Rf build',
> 'python setup.py install' and go from there.
>
> Another thing is to check your cl_ext.h header for the presence of
> cl_khr_gl_sharing. If that's there, then it should be picked up. If not,
> that's a problem to be fixed. (install newer/different CL headers
> maybe?)
>
> Hope that helps,
> Andreas
So I think the problem was in the GL bit:
Although I was removing siteconf.py (because I did not add the cl_enable_gl option
in the beginning), I was not removing the 'build' directory. Once I did that, the
compiler complained about not having GL/gl.h.I then installed the mesa drivers...
and voila, the example is working.
Thank you very much for your help!
David
6 years, 1 month
CL-GL interoperability
by xhaju.tm@gmail.com
Hello,
I'm having problems installing pyopencl-gl interoperability in my system:
When I try to run the gl-interop example at pyopencl-2013.1/examples/
I get the following traceback:
python gl_interop_demo.py
Traceback (most recent call last):
File "gl_interop_demo.py", line 81, in <module>
initialize()
File "gl_interop_demo.py", line 41, in initialize
+ get_gl_sharing_context_properties(),
File "/usr/local/lib/python2.7/dist-packages/pyopencl-2013.1-py2.7-linux-
x86_64.egg/pyopencl/tools.py", line 426, in get_gl_sharing_context_properties
(ctx_props.GL_CONTEXT_KHR, gl_platform.GetCurrentContext()))
AttributeError: type object 'context_properties' has no attribute
'GL_CONTEXT_KHR'
I have tried adding the option --no-cl-enable-gl, but configure.py does not accept it
complaining that the option does not exist. Using --cl-enable-gl succeeds, but the
example fails, and pyopencl.context_properties only has
__module__ : pyopencl._cl
OFFLINE_DEVICES_AMD : 16447
__reduce__ : <Boost.Python.function object at 0x212e550>
to_string : <classmethod object at 0x21fb3d0>
PLATFORM : 4228
__doc__ : None
__init__ : <built-in function __init__>
I'm runing in Ubuntu 13.04.
The versions of python and numpy are
python 2.7
numpy 1.7.1
Do you have any idea why this is happening?
Thanks!
David
6 years, 1 month
PyOpenCL and PyGTK TLS issue
by Antoine Martin
Hi,
Does anyone know why this works:
python -c "import gtk;import
pyopencl;context=pyopencl.create_some_context(False);pyopencl.Program(context,'').build()"
But this does not:
python -c "import
pyopencl;context=pyopencl.create_some_context(False);pyopencl.Program(context,'').build();import
gtk"
It fails with:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/gtk-2.0/gtk/__init__.py",
line 40, in <module>
from gtk import _gtk
ImportError: dlopen: cannot load any more object with static TLS
Note: it is the build() call that ends up triggering this TLS conflict.
Take it out and it works.
The same thing happens with PyCUDA.
Re-ordering the imports worked around this particular issue, but I seem
to be getting other issues when I do that ('atexit' not firing and other
weirdness).
So this may have just papered over the real issue. How would I fix that?
(preferably without touching pygtk2...)
Thanks
Antoine
6 years, 1 month