Dear Python/OpenCL community,
I am pretty new (py)opencl and encountered a problem, maybe it a lack of understanding of openCL, but I found strange python seg-faults:
test program:
#!/usr/bin/python
import numpy, pyopencl
ctx = pyopencl.create_some_context()
data=numpy.random.random((1024,1024)).astype(numpy.float32)
img = pyopencl.image_from_array(ctx, ary=data, mode="r", norm_int=False, num_channels=1)
print img
System: debian sid: pyopencl2012.1 (the same code works with debian stable and v2011.2)
Here is the backtrace obtained with GDB:
0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x00007ffff340c253 in pyopencl::create_image_from_desc(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#2 0x00007ffff342de36 in _object* boost::python::detail::invoke<boost::python::detail::install_holder<pyopencl::image*>, pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>, boost::python::arg_from_python<unsigned long>, boost::python::arg_from_python<_cl_image_format const&>, boost::python::arg_from_python<_cl_image_desc&>, boost::python::arg_from_python<boost::python::api::object> >(boost::python::detail::invoke_tag_<false, false>, boost::python::detail::install_holder<pyopencl::image*> const&, pyopencl::image* (*&)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::arg_from_python<pyopencl::context const&>&, boost::python::arg_from_python<unsigned long>&, boost::python::arg_from_python<_cl_image_format const&>&, boost::python::arg_from_python<_cl_image_desc&>&, boost::python::arg_from_python<boost::python::api::object>&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#3 0x00007ffff342e06f in boost::python::detail::caller_arity<5u>::impl<pyopencl::image* (*)(pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object), boost::python::detail::constructor_policy<boost::python::default_call_policies>, boost::mpl::vector6<pyopencl::image*, pyopencl::context const&, unsigned long, _cl_image_format const&, _cl_image_desc&, boost::python::api::object> >::operator()(_object*, _object*) ()
from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#4 0x00007ffff311715b in boost::python::objects::function::call(_object*, _object*) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#5 0x00007ffff3117378 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
#6 0x00007ffff3120593 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
from /usr/lib/libboost_python-py27.so.1.49.0
#7 0x00007ffff3445983 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pyopencl::error, void (*)(pyopencl::error const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pyopencl::error const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) () from /usr/lib/python2.7/dist-packages/pyopencl/_cl.so
#8 0x00007ffff3120373 in boost::python::handle_exception_impl(boost::function0<void>) ()
from /usr/lib/libboost_python-py27.so.1.49.0
#9 0x00007ffff3115635 in ?? () from /usr/lib/libboost_python-py27.so.1.49.0
Thanks for your help.
If you are not able to reproduce this bug, I should mention it to debian.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
Dear Andreas,
I am currently working on a cython based wrapper for the OpenCL FFT library from AMD: https://github.com/geggo/gpyfft
For this I need to create a pyopencl Event instance from a cl_event returned by the library. I attached a patch against recent pyopencl that adds this possibility, similar to the from_cl_mem_as_int() method of the MemoryObject class. Could you please add this to pyopencl.
Thanks for your help
Gregor
Sorry if there are two copies of this message.
I have sent it to the list but received no confirmation
(nor any error) and checked that archive does not show
any message from January.
I can see that there is already new version (2013.1) in docs,
marked "in development". I would like for it not to be released
before fixing problems with parallel prefix scan.
Problems with scan are only visible on APU Loveland. They do not
occur on ION, nor on GTX 460. I do not have access to machine
with NVIDIA CC 3.x so I cannot test prefix scan there.
I first encountered it in August, and mentioned them in email
to the list from 2012-08-08 ("Python3 test failures").
Only recently I had some time and eagerness to look closer into them.
Tests still fail on recent git version c31944d1e81a.
Failing tests are now in test_algorithm.py, in third group (marked
scan-related, starting in line 418). I'll describe my observations
of test_scan function.
My APU has 2 Computing Units. GenericScanKernel chooses
k_group_size to be 4096, max_scan_wg_size to be 256,
and max_intervals to 6.
The first error occurs when there is enough work to fill two Computing
Units - in my case 2**12+5. It looks like there is problem with passing
partial result from computations occurring on fist CU to the second one.
Prefix sum is computed correctly on the second half of the array but
starting with the wrong point. I have printed interval_results array
and I have observed that error (difference between the correct value
of the interval's first element and actual one) is not the value
of any of the elements of interval_results, nor it is difference
between interval_results elements. On the other hand difference
between real and wanted value is similar (i.e. in the same range)
to the difference between interval_results[4] and interval_results[3].
In the test I have run just now the error is 10724571 and
the difference is 10719275; I am not sure if this is relevant though.
Errors are not repeatable - sometimes they occur for small arrays
(e.g. for 2**12+5) sometimes for larger ones (test I have run
right now failed for ExclusiveScan of size 2**24+5). The tests'
failures also depend on order of tests - after changing order of
elements of array scan_test_counts I got failures for different
sizes, but always for sizes larger than 2**12. It might be
some race condition, but I do not understand new scan fully
and cannot point my finger at one place.
If there is any additional test I can perform please let me know.
I'll try to investigate it further but I am not sure whether
it'll work.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi
Using Python 3, sys.platform returns "linux" on my machine (whereas
Python 2 returns "linux2"). The code in tools.py checks only for linux2.
After I patched this line
if sys.platform == "linux2":
to
if sys.platform == "linux2" || sys.platform == "linux":
everythink worked out nicely in my project. Was this simply forgotten or
is there a particular reason that get_gl_sharing_context_properties is
not supported on Python3/linux?
Best, Marko
Hello again,
I checked again and I need to reduce my number of threads run by a factor 4
not to get "out of memory" error. This seems very strange since the idea is
that I want to use the same memory for seeds etc when running the
initialize kernel as running my main kernel. Is there something wrong in my
kernel invocation?
Cheers, Calle
On Mon, Jan 28, 2013 at 3:40 PM, Calle Snickare <problembarnet(a)gmail.com>wrote:
> Hello,
> I am currently trying to implement Ranlux in one of my programs. My kernel
> will be re-run several times with the same seeds, so I don't want to
> include the Ranlux initialization in it as I only want to do this once
> (right?). I also want to make sure to use the same memory between the runs.
> So I figure that I solve this by having two kernels: one kernel that
> initializes Ranlux (run this once at the beginning), as well as my "main"
> kernel. They will both be written in the same c-file.
>
> Here is some of the code. At first I had some strange errors getting it to
> work. Now I can get it to run, but it feels like it runs out of memory
> quicker than it should. Am I approaching this the wrong way?
>
>
> Host code:
> ctx = cl.create_some_context()
> queueProperties = cl.command_queue_properties.PROFILING_ENABLE
> queue = cl.CommandQueue(ctx, properties=queueProperties)
>
> mf = cl.mem_flags
> dummyBuffer = np.zeros(nbrOfThreads * 28, dtype=np.uint32)
> ins = cl.array.to_device(queue, (np.random.randint(0, high = 2 ** 31 - 1,
> size = (nbrOfThreads))).astype(np.uint32))
> ranluxcltab = cl.Buffer(ctx, mf.READ_WRITE, size=0, hostbuf=dummyBuffer)
>
> kernelCode_r = open(os.path.dirname(__file__) + 'ranlux_test_kernel.c',
> 'r').read()
> kernelCode = kernelCode_r % replacements
>
> prg = (cl.Program(ctx, kernelCode).build(options=programBuildOptions))
>
> kernel_init = prg.ranlux_init_kernel
> kernelObj_init = kernel_init(queue, globalSize, localSize, ins.data,
> ranluxcltab)
>
> kernelObj_init.wait()
>
> kernel = prg.ranlux_test_kernel
> kernelObj = kernel(queue, globalSize, localSize, ins.data, ranluxcltab)
> kernelObj.wait()
>
> Kernel Code:
> #pragma OPENCL EXTENSION cl_khr_fp64 : enable
> #define RANLUXCL_SUPPORT_DOUBLE
> #include "pyopencl-ranluxcl.cl" // Ranlux source-code
> #define RANLUXCL_LUX 4
>
> __kernel void ranlux_init_kernel(__global uint *ins, __global
> ranluxcl_state_t *ranluxcltab)
> {
> //ranluxclstate stores the state of the generator.
> ranluxcl_state_t ranluxclstate;
>
> ranluxcl_initialization(ins, ranluxcltab);
> }
>
> __kernel void ranlux_test_kernel(__global uint *ins, __global
> ranluxcl_state_t *ranluxcltab)
> {
> uint threadId = get_global_id(0) + get_global_id(1) *
> get_global_size(0);
>
> //ranluxclstate stores the state of the generator.
> ranluxcl_state_t ranluxclstate;
>
> //Download state into ranluxclstate struct.
> ranluxcl_download_seed(&ranluxclstate, ranluxcltab);
>
> double randomnr;
> randomnr = ranluxcl64(&ranluxclstate);
> /* DO STUFF */
>
>
> //Upload state again so that we don't get the same
> //numbers over again the next time we use ranluxcl.
> ranluxcl_upload_seed(&ranluxclstate, ranluxcltab);
> }
>
>
> Cheers,
> Calle
>
Hello guys,
I've been going crazy trying to build PyOpenCL in Fedora 18.
I'm getting 2 different sets of errors. I'll attach the build commands for both.
I tried reporting the bug to some bugzilla or whatever but didn't
found one; sorry if I missed it.
I appreciate your help.
Commands used:
# make clean; rm siteconf.py; git checkout v2012.1; git submodule
init; git submodule update; ./configure.py
--cl-inc-dir=$AMDAPPSDKROOT/include
--cl-lib-dir=$AMDAPPSDKROOT/lib/x86_64; make &> 2012.1_build.log
rm -Rf build
rm -f tags
M bpl-subset
M pyopencl/compyte
Previous HEAD position was dcd70eb... Ignore distribute tarballs.
HEAD is now at 8c46f4b... Bump version.
Submodule path 'bpl-subset': checked out
'b193f4230411e3ef5557ebee908453f38a205b43'
Submodule path 'pyopencl/compyte': checked out
'389cf828b67bdddc83afed6d79bd448076432ec6'
# make clean; rm siteconf.py; git checkout v2011.1; git submodule
init; git submodule update; ./configure.py
--cl-inc-dir=$AMDAPPSDKROOT/include
--cl-lib-dir=$AMDAPPSDKROOT/lib/x86_64; make &> 2011.1_build.log
rm -Rf build
rm -f tags
M bpl-subset
M pyopencl/compyte
Previous HEAD position was 8c46f4b... Bump version.
HEAD is now at a7666aa... Merge branch 'master' of t:src/pyopencl
Submodule path 'bpl-subset': checked out
'85bf4439e019df899a2a659fff529b11ea37270d'
Submodule path 'pyopencl/compyte': checked out
'52aecae2c0019caa81342ab79b47f60601a6a1b1'
# make clean; rm siteconf.py; git checkout v0.92 ; git submodule init;
git submodule update; ./configure.py
--cl-inc-dir=$AMDAPPSDKROOT/include
--cl-lib-dir=$AMDAPPSDKROOT/lib/x86_64; make &> 0.92_build.log
rm -Rf build
rm -f tags
warning: unable to rmdir pyopencl/compyte: Directory not empty
M bpl-subset
Previous HEAD position was a7666aa... Merge branch 'master' of t:src/pyopencl
HEAD is now at dcd70eb... Ignore distribute tarballs.
Submodule path 'bpl-subset': checked out
'19b13cae1f63e75dbf18b90020e4877639ab9f0e'
--
It's hard to be free... but I love to struggle. Love isn't asked for;
it's just given. Respect isn't asked for; it's earned!
Renich Bon Ciric
http://www.woralelandia.com/http://www.introbella.com/
Dear OpenCL users,
I am looking for a way to find the best device in a computer in order to be able to select it for processing.
PyOpenCL offers me a max_clock_frequency and a max_compute_units for the device. Nice!
Unfortunately on a dual-Xeon5520 + Fermi; the product max_clock_frequency*max_compute_units is in favour of the CPU but the GPU is clearly faster !
I have calculated the FLOPS per compute unit per Hz for a few devices and I got:
NVidia Fermi (GTX580): 64 FLOPS/Unit/Hz
NVidia Tesla (GT285): 24 FLOPS/Unit/Hz
NVidia GT9600: 24 FLOPS/Unit/Hz
Intel CPU: 4 FLOPS/Unit/MHz (I usually get less)
According to some readings on the web for Kepler cards, it should be 384 FLOPS/Unit/MHz (unchecked)
I have no figures for AMD cards, I would be interested in getting some
of them; and would like to be able to discriminate the various NVidia
generations within pyopencl (via compute_capability_major_nv &
compute_capability_minor_nv?)
Any ideas are welcome.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
Hello,
I am currently trying to implement Ranlux in one of my programs. My kernel
will be re-run several times with the same seeds, so I don't want to
include the Ranlux initialization in it as I only want to do this once
(right?). I also want to make sure to use the same memory between the runs.
So I figure that I solve this by having two kernels: one kernel that
initializes Ranlux (run this once at the beginning), as well as my "main"
kernel. They will both be written in the same c-file.
Here is some of the code. At first I had some strange errors getting it to
work. Now I can get it to run, but it feels like it runs out of memory
quicker than it should. Am I approaching this the wrong way?
Host code:
ctx = cl.create_some_context()
queueProperties = cl.command_queue_properties.PROFILING_ENABLE
queue = cl.CommandQueue(ctx, properties=queueProperties)
mf = cl.mem_flags
dummyBuffer = np.zeros(nbrOfThreads * 28, dtype=np.uint32)
ins = cl.array.to_device(queue, (np.random.randint(0, high = 2 ** 31 - 1,
size = (nbrOfThreads))).astype(np.uint32))
ranluxcltab = cl.Buffer(ctx, mf.READ_WRITE, size=0, hostbuf=dummyBuffer)
kernelCode_r = open(os.path.dirname(__file__) + 'ranlux_test_kernel.c',
'r').read()
kernelCode = kernelCode_r % replacements
prg = (cl.Program(ctx, kernelCode).build(options=programBuildOptions))
kernel_init = prg.ranlux_init_kernel
kernelObj_init = kernel_init(queue, globalSize, localSize, ins.data,
ranluxcltab)
kernelObj_init.wait()
kernel = prg.ranlux_test_kernel
kernelObj = kernel(queue, globalSize, localSize, ins.data, ranluxcltab)
kernelObj.wait()
Kernel Code:
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#define RANLUXCL_SUPPORT_DOUBLE
#include "pyopencl-ranluxcl.cl" // Ranlux source-code
#define RANLUXCL_LUX 4
__kernel void ranlux_init_kernel(__global uint *ins, __global
ranluxcl_state_t *ranluxcltab)
{
//ranluxclstate stores the state of the generator.
ranluxcl_state_t ranluxclstate;
ranluxcl_initialization(ins, ranluxcltab);
}
__kernel void ranlux_test_kernel(__global uint *ins, __global
ranluxcl_state_t *ranluxcltab)
{
uint threadId = get_global_id(0) + get_global_id(1) *
get_global_size(0);
//ranluxclstate stores the state of the generator.
ranluxcl_state_t ranluxclstate;
//Download state into ranluxclstate struct.
ranluxcl_download_seed(&ranluxclstate, ranluxcltab);
double randomnr;
randomnr = ranluxcl64(&ranluxclstate);
/* DO STUFF */
//Upload state again so that we don't get the same
//numbers over again the next time we use ranluxcl.
ranluxcl_upload_seed(&ranluxclstate, ranluxcltab);
}
Cheers,
Calle
Филипп Жинкин <xnerhx(a)gmail.com> writes:
> is it correct that pyopencl.array.dot not aimed to cover numpy.dot
> functionality (since for matrices it's not working as matrix-mul as dot
> function from numpy does)?
> Are there any plans to add default matrix-multiplication routine into
> PyOpenCL?
> Or maybe it's already there and I've just missed it? :)
It will happen at some point, but it's not there yet.
Andreas
Hi all,
is it correct that pyopencl.array.dot not aimed to cover numpy.dot
functionality (since for matrices it's not working as matrix-mul as dot
function from numpy does)?
Are there any plans to add default matrix-multiplication routine into
PyOpenCL?
Or maybe it's already there and I've just missed it? :)
Thanks,
Filipp.