Dear all,
we are trying to implement a K nearest neighbor search on GPUs with
PyOpenCL. The goal of the algorithm is: For a given target point,
find the nearest points from a given set (training data). The distance
between two points is computed by the squared euclidean distance.
One of our implementations is a brute force approach, which aims
at processing big data sets in parallel, e.g. 1 million training data and
some millions of targets (test data). For every target point one kernel
instance is created which finds the k nearest points out of the
training points.
Our problem is the following. Everything works fine for small data sets
and the results are as expected on both GPU (GeForce GTX 650 with
nVidia Driver 313.09.) and CPU(Intel Core i5-3450 with AMD APP SDK)
running Ubuntu 12.10, PyOpenCL 2013.1-py2.7-linux-x86_64.
But if we increase the size of the data sets, the GPU version crashes
with the following error:
> File "brutegpu.py", line 65, in query
> cl.enqueue_copy(self.queue, d_min, self.d_min_buf).wait()
> File "/usr/local/lib/python2.7/dist-packages/
> pyopencl-2013.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py",
> line 935, in enqueue_copy
> return _cl._enqueue_read_buffer(queue, src, dest, **kwargs)
> pyopencl.LogicError: clEnqueueReadBuffer failed: invalid command queue
The CPU-Version still works fine with 1 million training points
and 1 million of test points. Attached you can find the corresponding
source code as working minimal example, which consists of on
Host-Python-File
and one OpenCL-Kernel-File.
We would highly apprecriate any help - maybe we made a
mistake which is already known to you.
So the big question for us is: Why is it working on CPU and why isn't it
working on the GPU?
Are there nVidia-specific pitfalls for such big data sets?
The compiler says:
> ptxas info : Compiling entry function 'find_knn' for 'sm_30'
> ptxas info : Function properties for find_knn
> 0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
> ptxas info : Used 17 registers, 336 bytes cmem[0], 4 bytes cmem[3]
Or are there any rules for using a kernel for big data sets such as setting
the work group sizes or maximum memory usage?
The error message "invalid command queue" is confusing and I wasn't able
to find any helpful information (except that oftentimes "invalid command
queue" means segfault, but i could not find any wrong array adress yet.)
Maybe one of you could have a look at our code and finds some stupid
mistake.
We would be very grateful for every hint.
Best regards,
Justin Heinermann,
University Oldenburg
Dear Python/OpenCL community,
I am pretty new (py)opencl and encountered a problem, maybe it a lack of understanding of openCL, but I found strange python seg-faults:
test program:
#!/usr/bin/python
import numpy, pyopencl
ctx = pyopencl.create_some_context()
data=numpy.random.random((1024,1024)).astype(numpy.float32)
img = pyopencl.image_from_array(ctx, ary=data, mode="r", norm_int=False, num_channels=1)
print img
System: debian sid: pyopencl2012.1 (the same code works with debian stable and v2011.2)
Here is the backtrace obtained with GDB:
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff340c253 in pyopencl::create_image_from_desc(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#2 0x00007ffff342de36 in _object* boost::python::detail::invoke<boost::python::detail::install_holder<pyopencl::image*>, pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>, boost::python::arg_from_python<unsigned long>, boost::python::arg_from_python<_cl_image_format const&>, boost::python::arg_from_python<_cl_image_desc&>, boost::python::arg_from_python<boost::python::api::object> >(boost::python::detail::invoke_tag_<false, false>, boost::python::detail::install_holder<pyopencl::image*> const&, pyopencl::image* (*&)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>&, boost::python::arg_from_python<unsigned long>&, boost::python::arg_from_python<_cl_image_format const&>&, boost::python::arg_from_python<_cl_image_desc&>&, boost::python::arg_from_python<boost::python::api::object>&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#3 0x00007ffff342e06f in boost::python::detail::caller_arity<5u>::impl<pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::detail::constructor_policy<boost::python::default_call_policies>, boost::mpl::vector6<pyopencl::image*, pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object> >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#4 0x00007ffff311715b in boost::python::objects::function::call(_object*, _object*) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#5 0x00007ffff3117378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#6 0x00007ffff3120593 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffff3445983 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pyopencl::error, void (*)(pyopencl::error const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pyopencl::error const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007ffff3120373 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffff3115635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
Thanks for your help.
If you are not able to reproduce this bug, I should mention it to debian.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
Dear Andreas,
I am currently working on a cython based wrapper for the OpenCL FFT library from AMD: https://github.com/geggo/gpyfft
For this I need to create a pyopencl Event instance from a cl_event returned by the library. I attached a patch against recent pyopencl that adds this possibility, similar to the from_cl_mem_as_int() method of the MemoryObject class. Could you please add this to pyopencl.
Thanks for your help
Gregor
Sorry if there are two copies of this message.
I have sent it to the list but received no confirmation
(nor any error) and checked that archive does not show
any message from January.
I can see that there is already new version (2013.1) in docs,
marked "in development". I would like for it not to be released
before fixing problems with parallel prefix scan.
Problems with scan are only visible on APU Loveland. They do not
occur on ION, nor on GTX 460. I do not have access to machine
with NVIDIA CC 3.x so I cannot test prefix scan there.
I first encountered it in August, and mentioned them in email
to the list from 2012-08-08 ("Python3 test failures").
Only recently I had some time and eagerness to look closer into them.
Tests still fail on recent git version c31944d1e81a.
Failing tests are now in test_algorithm.py, in third group (marked
scan-related, starting in line 418). I'll describe my observations
of test_scan function.
My APU has 2 Computing Units. GenericScanKernel chooses
k_group_size to be 4096, max_scan_wg_size to be 256,
and max_intervals to 6.
The first error occurs when there is enough work to fill two Computing
Units - in my case 2**12+5. It looks like there is problem with passing
partial result from computations occurring on fist CU to the second one.
Prefix sum is computed correctly on the second half of the array but
starting with the wrong point. I have printed interval_results array
and I have observed that error (difference between the correct value
of the interval's first element and actual one) is not the value
of any of the elements of interval_results, nor it is difference
between interval_results elements. On the other hand difference
between real and wanted value is similar (i.e. in the same range)
to the difference between interval_results[4] and interval_results[3].
In the test I have run just now the error is 10724571 and
the difference is 10719275; I am not sure if this is relevant though.
Errors are not repeatable - sometimes they occur for small arrays
(e.g. for 2**12+5) sometimes for larger ones (test I have run
right now failed for ExclusiveScan of size 2**24+5). The tests'
failures also depend on order of tests - after changing order of
elements of array scan_test_counts I got failures for different
sizes, but always for sizes larger than 2**12. It might be
some race condition, but I do not understand new scan fully
and cannot point my finger at one place.
If there is any additional test I can perform please let me know.
I'll try to investigate it further but I am not sure whether
it'll work.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi Jonny,
Jonathan Hunt <hunt(a)braincorporation.com> writes:
> I'd be happy to test for you on Apple (10.8.3 with NVIDIA GeForce GT 650M
> 1024 MB / i7). However, I'm not quite sure how to run the tests (I did
> look, but I missed any existing documentation).
>
> I install the latest pyopencl in a virtualenv but running test_algorithm.py
> spits out:
> "platform darwin -- Python 2.7.2 -- pytest-2.3.4
> collected 46 items
>
> test_algorithm.py .
> "
> and then appears to hang for quite some time (with no CPU load). It seems
> be the same for the other test files.
>
> Pyopencl works fine for me in normal operation on this machine so I'm
> probably doing something wrong.
Thanks for offering. I've known that something fishy is going on with
PyOpenCL's tests on OS X for a while, but since I don't have access to a
Mac, I've never been able to debug it. I'll try to get access (hopefully
lasting enough to integrate into my continuous integration setup) and
figure out what's going on.
Nonetheless, I think (hope?) that issue is separate from the AMD issue I
mentioned in my original email...
Andreas
Hi all,
I was wondering something about OpenCL's execution model.
Here's a quote from AMD's documentation [1], page 132:
Execution of kernel dispatches can overlap if there are no
dependencies between them and if there are resources available in
the GPU. This is critical when writing benchmarks it is important
that the measurements are accurate and that “false dependencies” do
not cause unnecessary slowdowns. An example of false dependency is:
a. Application creates a kernel “foo”.
b. Application creates input and output buffers.
c. Application binds input and output buffers to kernel “foo”.
d. Application repeatedly dispatches “foo” with the same parameters.
If the output data is the same each time, then this is a false dependency because
there is no reason to stall concurrent execution of dispatches. To avoid stalls,
use multiple output buffers. The number of buffers required to get peak
performance depends on the kernel.
Now, I thought OpenCL would only look at events passed to wait_for when
determining what kernel is allowed to run concurrently with what other
kernel. This sounds like some dependency information is also conveyed by
what mem objects are used, especially that two kernels aren't allowed to
write to the same one at the same time.
Is that AMD-specific, or is that part of the spec?
I'd be grateful for any clues.
Thanks!
Andreas
[1] http://developer.amd.com/download/AMD_Accelerated_Parallel_Processing_OpenC…
Hi all,
I'd like a show of hands of the number of people who are using or have
somewhat recently used PyOpenCL on Python 2.4 or 2.5. I'm not dropping
support for these versions just yet, but might at some point. For now,
I'm just looking for data.
Thanks!
Andreas
Hi all,
if you've used any of the things in pyopencl.algorithm (from git, has
not yet been in a release), here's an important heads-up:
I've made an incompatible change to these interfaces to make sure they
support wait_for arguments and return events. In some cases, this means
that result tuples have a different length now. (Yes, those should be
namedtuple()s, I know. Watch for a related post.)
If this affects you, I'm sorry.
Andreas
That's strange.
I attach full list of packages installed on my machine, running OpenCL
on NVIDIA GTX 460.
Some packages are for CUDA (I've installed CUDA SDK packages from
experimental), some for AMD OpenCL (run on CPU).
I know it is possible to run OpenCL (and PyOpenCL) on Debian
(with contrib and non-free enabled) without needing to install
non-packaged software.
To have working OpenCL you need:
1. ICD management library - it loads OpenCL implementations.
I recommend ocl-icd-libopencl1 which is open source and works
without problems with various OpenCL implementations. I had some
problems with NVIDIA ICD loader.
2. OpenCL implementation - for you it is nvidia-opencl-icd
Its version depends on installed driver, so if you are using
driver from experimental you should install nvidia-opencl-icd
from experimental, and so on.
3. As you are writing on this list - PyOpenCL ;-)
You can try to install packaged pyopencl (python-pyopencl
or python3-pyopencl, depending on your needs). It should
write about any missing dependencies.
My advice - install packaged software. Mixing .debs with
official NVIDIA drivers can cause problems. I know that Debian
contains not-so-recent PyOpenCL, but we are in freeze now
so no new versions of packages are accepted.
Regarding lack of symbols - I also do not have any symbols
in libnvidia-opencl.so. As for AMD:
$ nm /usr/lib/x86_64-linux-gnu/libamdocl64.so
nm: /usr/lib/x86_64-linux-gnu/libamdocl64.so: no symbols
and clinfo shows both platforms.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi James,
James Bergstra <james.bergstra(a)gmail.com> writes:
> I've come across a similar issue I think: I'm on debian unstable and I
> haven't seen my GTX 280 show up in the platforms or devices yet. I've tried
> all of these I think:
>
> nvidia-opencl-dev - NVIDIA OpenCL development files
> nvidia-libopencl1 - NVIDIA OpenCL library
> nvidia-libopencl1-ia32 - please switch to multiarch nvidia-libopencl1:i386
> nvidia-opencl-common - NVIDIA OpenCL driver
> nvidia-opencl-icd - NVIDIA OpenCL ICD
> nvidia-opencl-icd-ia32 - please switch to multiarch nvidia-opencl-icd:i386
> nvidia-libopencl1-dev - NVIDIA OpenCL development files
>
> I've also tried installing drivers from the official cuda 5 distribution
> (the one not using .deb files from nvidia's website).
>
> I feel like I'm stumped on what filename should I put into the nvidia.icd
> file in /etc/Opencl/vendors.
It's either just the file name of the .so (which is then searched using
/etc/ld.so.conf and LD_LIBRARY_PATH) or the full absolute path of the .so.
> The nvidia-opencl-common package sets up nvidia.icd to point to:
> libnvidia-opencl.so.1
>
> The only matching library I can find is:
> /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
>
> This library has no symbols:
> $ nm /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.304.88
> 0000000000a5a190 b cudbgApiClientPid
> 0000000000a5a194 b cudbgApiClientRevision
> 0000000000130eb0 t cudbgApiInit
> 0000000000a3b744 b cudbgDebuggerInitialized
> 0000000000a5a164 b cudbgIpcFlag
> 0000000000a5a170 b cudbgRpcEnabled
> 0000000000a5a160 b cudbgSessionId
That's normal, I think.
$ nm /usr/lib/x86_64-linux-gnu/libamdocl64.so
nm: /usr/lib/x86_64-linux-gnu/libamdocl64.so: no symbols
But:
$ nm -D /usr/lib/x86_64-linux-gnu/libamdocl64.so | egrep T
00000000002d5d80 T clBuildProgram
00000000002cdde0 T clCreateBuffer
00000000002bee60 T clCreateCommandQueue
00000000002bfa60 T clCreateContext
00000000002bfda0 T clCreateContextFromType
(snip)
> Perhaps unsurprisingly, no nvidia platform appears when I run clinfo. When
> I install the amd opencl package, the icd file points to a library that has
> cl* symbols, and the amd reveals my CPU as a device. All good.
>
> What do I have to do to get my nvidia card to be visible as an opencl
> device?
Do you have access to /dev/nvi*? Run you code under strace and see if it
open()s the library and/or the /dev/nvi* device files.
HTH,
Andreas