Hey!
After many attempts, I was unable to package a project containing pyopencl
into a self-contained executable using cxfreeze. It gives me the error:
===============
File
"/usr/local/lib/python2.7/dist-packages/cx_Freeze/initscripts/Console.py",
line 27, in <module>
exec(code, m.__dict__)
File "test.py", line 4, in <module>
import pyopencl
File "/usr/local/lib/python2.7/dist-packages/pyopencl/__init__.py", line
79, in <module>
_DEFAULT_INCLUDE_OPTIONS = ["-I", _find_pyopencl_include_path()]
File "/usr/local/lib/python2.7/dist-packages/pyopencl/__init__.py", line
71, in _find_pyopencl_include_path
return resource_filename(Requirement.parse("pyopencl"), "pyopencl/cl")
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 949,
in resource_filename
return get_provider(package_or_requirement).get_resource_filename(
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 212,
in get_provider
return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 741,
in require
needed = self.resolve(parse_requirements(requirements))
File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 626,
in resolve
raise DistributionNotFound(req)
===================
This is what I get when embedding the simple python line
"import pyopencl"
Into an executable. All the other packages on which pyopencl depend cause
no issue. The error can be solved by manually copying
/usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7.egg-info/
Into the folder which contains the executable generated by cxfreeze, but
this seems extremely hackish to me. Changing
/usr/local/lib/python2.7/dist-packages/pyopencl/__init__.py:_find_pyopencl_include_path()
to make it return '' also solves the issue. Both fixes look pretty hackish
to me, but I have no other idea of how to properly solve it.
The issue is simple to reproduce
pip install cxfreeze
cd /tmp
echo "import pyopencl" > test.py
cxfreeze test.py
./dist/test
Any idea?
Best regards,
Philippe
Hi,
I had the opportunity to benchmark open source OpenCL drivers (POCL on
CPU, Beignet on GPU) versus proprietary ones and they behave very well,
now !
Test computer:
Macbook pro 13" with an Iris 5100 GPU integrated into the Haswell
processor (i5-4308U) running Debian Jessie (or macOSX)
The code used is describes on pages 7-14 of this document:
http://pdebuyl.be/tmp/esp2014_draft.pdf
It consists of a map operation (cast and multiplication/divisions)
followed by a sparse matrix dense vector multiplication implemented as
an array of struct (method called LUT, better suited to CPU) or as a
struct of array (called CSR, better suited to GPU). CSR is implemented
using parallel reduction within a workgroup. All OpenCL method use
single precision floating point arithmetics and Kahan summation while OpenMP
code uses double precision arithmetics.
This benchmark is the execution time in millisecond of the complete
treatment for input images of various size (from 1 to 16 Mpixel).
It is the best timing out of 3, averaged over 10 processing, using the
timeit module from python.
Reference timings:
1D_CPU_LUT_OpenMP
Img size Linux/gcc Apple/clang
1.02 12.12 13.451
2.10 30.14 35.307
4.19 63.79 87.110
6.22 96.17 130.77
11.90 222.15 265.94
16.78 270.42 359.93
1D_CPU_CSR_OpenMP
Img size Linux/gcc Apple/clang
1.02 12.31 12.256
2.10 30.20 33.220
4.19 64.34 76.948
6.22 88.82 111.60
11.90 206.82 218.81
16.78 280.03 443.35
Execution on the CPU:
1D_CPU_LUT_OpenCL
Img size AMD Intel Apple POCL
1.02 13.11 8.25 9.7813 8.47
2.10 29.85 15.20 20.563 17.85
4.19 58.08 32.77 47.877 47.19
6.22 97.88 53.04 80.372 62.53
11.90 184.29 125.52 149.33 135.89
16.78 261.21 149.31 205.81 190.14
1D_CPU_CSR_OpenCL
Img size AMD Intel Apple POCL
1.02 16.96 10.05 9.8027 10.02
2.10 37.12 18.46 21.904 21.35
4.19 82.78 42.24 46.961 59.89
6.22 133.41 70.17 68.312 73.87
11.90 271.61 182.41 143.57 178.77
16.78 346.55 222.82 212.17 260.62
Execution on the integrated GPU:
1D_GPU_LUT_OpenCL
Img size Beignet Apple
1.02 7.50 10.066
2.10 14.44 16.345
4.19 28.91 34.538
6.22 ----- 37.570
11.90 ----- 68.443
16.78 ----- 78.333
no data: MemoryError (only 256MB on GPU)
1D_GPU_CSR_OpenCL
Img size Beignet Apple
1.02 3.95 6.0475
2.10 7.55 13.324
4.19 15.62 23.255
6.22 23.88 33.352
11.90 45.63 55.099
16.78 68.78 82.569
It is funny to notice this laptop GPU outperforms a Intel Xeon-phi
accelerator which is much more expensive than the whole laptop, using
the same code.
Cheers,
--
Jérôme Kieffer
Data analysis unit - ESRF
Hi all,
While optimizing some host-device data transfers, I came to the little piece of code given below.
My questions are : why such a long time is spent on the non-blocking copy launching ? What can I do to have a ‘real’ non-blocking call in order to do some computations on the host before waiting the copy completion ?
In the example launch time ~ profile time where launch time is the cl.enqueue_copy calling time and profile time come from the event profiling informations. I was expecting that wait time ~ profile time.
The result on a K20m is :
In [15]: print "Launch time=", t_wait - t_start
Launch time= 0.373787879944
In [16]: print "Wait time", t_end - t_wait
Wait time 0.0372970104218
In [17]: print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)
Profile time 0.338622592
Thanks,
Jean-Matthieu.
import time
import pyopencl as cl
import numpy as np
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE)
data = np.zeros((512, 512, 512), dtype=np.float64)
data_cl = cl. Buffer(ctx, cl .mem_flags.READ_WRITE, size=data.nbytes)
cl.enqueue_copy(queue, data_cl, data)
queue.finish()
t_start = time.time()
evt = cl.enqueue_copy(queue, data_cl, data, is_blocking=False)
t_wait = time.time()
evt.wait()
t_end = time.time()
print "Launch time=", t_wait - t_start
print "Wait time", t_end - t_wait
print "Profile time", 1e-9 * (evt.profile.end - evt.profile.start)
On Tue, 7 Oct 2014 13:39:11 +0200
Ewald Zietsman <ewald.zietsman(a)gmail.com> wrote:
> Are you running double precision calculations? The Iris doesn't support
> those AFAIK. I had trouble with that, but I don't think I got a big report
> like that. I set the environment variable on mine so it always uses the CPU.
I think you spotted right ... it looks like a border-line behavor which specifically crashes
https://github.com/kif/pyFAI/issues/137
Cheer,
--
Jérôme Kieffer
tel +33 476 882 445
So I have this in the kernel I'm using at the moment
#define PYOPENCL_DEFINE_CDOUBLE
#pragma OPENCL EXTENSION cl_khr_fp64: enable
This part caused failures on my Mac but it runs on my NVidia GTX 760 on my
desktop. I haven't tried to run my code in parallel with single precision
yet. I should prob try that at some point.
I only just started learning about all this stuff, learning more as I need.
Good luck and post back here if you find the answer.
On Tue, Oct 7, 2014 at 6:00 PM, <pyopencl-request(a)tiker.net> wrote:
> Send PyOpenCL mailing list submissions to
> pyopencl(a)tiker.net
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.tiker.net/listinfo/pyopencl
> or, via email, send a message with subject or body 'help' to
> pyopencl-request(a)tiker.net
>
> You can reach the person managing the list at
> pyopencl-owner(a)tiker.net
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of PyOpenCL digest..."
>
>
> Today's Topics:
>
> 1. Re: PyOpenCL Digest, Vol 62, Issue 2 (Jerome Kieffer)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 7 Oct 2014 15:30:26 +0200
> From: Jerome Kieffer <Jerome.Kieffer(a)esrf.fr>
> To: pyopencl(a)tiker.net
> Subject: Re: [PyOpenCL] PyOpenCL Digest, Vol 62, Issue 2
> Message-ID: <20141007153026.425a92fa67a044ab687e6ceb(a)esrf.fr>
> Content-Type: text/plain; charset=UTF-8
>
> On Tue, 7 Oct 2014 13:39:11 +0200
> Ewald Zietsman <ewald.zietsman(a)gmail.com> wrote:
>
> > Are you running double precision calculations?
>
> Normally not. This crash occurred during a test suite.
> I will double check in this direction.
> I wish it has failed gracefully instead of seg-fault :(
>
> > The Iris doesn't support
> > those AFAIK. I had trouble with that, but I don't think I got a big
> report
> > like that. I set the environment variable on mine so it always uses the
> CPU.
>
> I got sensible speed ups: 400ms for 1 core serial code, 200ms for OpenCL
> on CPU and 80ms on the GPU.
> So it looks interesting to test.
>
> Thanks for the hint.
> --
> Jérôme Kieffer
> tel +33 476 882 445
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> PyOpenCL mailing list
> PyOpenCL(a)tiker.net
> http://lists.tiker.net/listinfo/pyopencl
>
>
> ------------------------------
>
> End of PyOpenCL Digest, Vol 62, Issue 4
> ***************************************
>
On Tue, 7 Oct 2014 13:39:11 +0200
Ewald Zietsman <ewald.zietsman(a)gmail.com> wrote:
> Are you running double precision calculations?
Normally not. This crash occurred during a test suite.
I will double check in this direction.
I wish it has failed gracefully instead of seg-fault :(
> The Iris doesn't support
> those AFAIK. I had trouble with that, but I don't think I got a big report
> like that. I set the environment variable on mine so it always uses the CPU.
I got sensible speed ups: 400ms for 1 core serial code, 200ms for OpenCL on CPU and 80ms on the GPU.
So it looks interesting to test.
Thanks for the hint.
--
Jérôme Kieffer
tel +33 476 882 445
Hi Ewald,
Ewald Zietsman <ewald.zietsman(a)gmail.com> writes:
> I'm trying to figure out exactly why I'm getting the above error. I'm
> assuming I'm trying to allocate too big a chunk of memory at a time, I got
> around this before by splitting the problem into smaller bits and doing
> them one by one and concatenating the results. Now I'd like to figure out
> what is the limits of the hardware. Can I do this via the pyopencl api?
I've previously gotten this error for accessing out-of-bounds memory on
the GPU, less so for running out of memory. I'm mentioning this to
suggest that you keep causes other than 'out of memory' in mind, as
out-of-memory errors generally raise different codes.
To answer your actual question, it turns out that finding out how much
memory is available in a single chunk is actually not very easy, and
since CL allows allocating memory lazily, even alloc-and-fail loops
aren't bulletproof unless you actually access that memory. The best
guess comes from a combination of
http://documen.tician.de/pyopencl/runtime.html#pyopencl.device_info.GLOBAL_…
and
http://documen.tician.de/pyopencl/runtime.html#pyopencl.device_info.MAX_MEM…
HTH,
Andreas
Hi All,
I'm trying to figure out exactly why I'm getting the above error. I'm
assuming I'm trying to allocate too big a chunk of memory at a time, I got
around this before by splitting the problem into smaller bits and doing
them one by one and concatenating the results. Now I'd like to figure out
what is the limits of the hardware. Can I do this via the pyopencl api?