Hi,
I was wondering if anyone had tested combining multiple cards of
different types (AMD and nVidia), and whether it can lead to any sort of
conflict ?
Right now I have a nVidia 9800GT for display and a 295GTX for GPU
computation, and I'm thinking about adding an AMD HD 7970.
I want to use them for distributed computation (multiple threads
computing different parts of a data set), so I was wondering if there
would not be any conflict:
- for pyopencl to compile kernels in // for two platforms (probablky
not, I think I already tested parallel computing with CPU+GPU)
- for amd and nvidia drivers to play along nicely...
Any advice ?
--
Vincent Favre-Nicolin http://inac.cea.fr
CEA/Grenoble Institut Nanosciences & Cryogénie
Laboratoire SP2M/Nano-structures et Rayonnement Synchrotron
17, rue des Martyrs
38054 Grenoble Cedex 9 - France
Université Joseph Fourier http://www.ujf-grenoble.fr
tél: (+33) 4 38 78 95 40 fax: (+33) 4 38 78 51 38
Hi all,
Trying to get pyopencl up and running, but I'm clearly missing something
fundamental. If I try and follow the bare-bones example:
from pyfft.cl import Plan
import numpy
import pyopencl as cl
import pyopencl.array as cl_array
ctx = cl.create_some_context(interactive=False)
queue = cl.CommandQueue(ctx)
But then creating a plan leads to a motley collection of errors:
plan = Plan((16, 16), queue=queue)
Build on <pyopencl.Device 'Tahiti' on 'AMD Accelerated Parallel
Processing' at 0x21b5f40>:
"/tmp/OCLAA4IMX.cl", line 123: warning: double-precision constant is
represented as single-precision constant because double is not
enabled
const float2 w1 = complex_ctr((float)0.707106781187,
(float)0.707106781187);
[Lots of these]
"/tmp/OCLAA4IMX.cl", line 983: error: too many initializer values
float2 a[8] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
[Two of these]
Clearly, there are twice as many of these initializer values as a[8] can
reasonably expect there should be, but I don't know why. I thought it
was perhaps related to the single/double precision warnings, but if I
try explicitly declaring the dtype for the plan, I can get the
single/double warnings to disappear.
test_array fails on one test (test_elwise_kernel)
test_clmath fails on test_fmod and test_bessel_j
test_wrapper fails on one test (test_image_2d), but informs me that this
is not unusual, though still bad.
demo.py runs fine.
I'm trying to get this going under Ubuntu 11.10.
Thanks for any help,
Alex
On Mon, 30 Apr 2012 14:36:05 -0400, Devin Jeanpierre <jeanpierreda(a)gmail.com> wrote:
> On Mon, Apr 30, 2012 at 11:42 AM, Andreas Kloeckner
> <lists(a)informa.tiker.net> wrote:
> > FWIW, your program finishes just fine for me, using AMD's CPU CL and
> > Intel's most recent '2012' CPU CL package. I don't have their version
> > 1.5 around any more, but if there was a hang with that version, I'd
> > imagine that would have been a bug that they've resolved.
>
> I just removed Intel's 1.5 SDK and replaced it with 2012, and removed
> and rebuilt PyOpenCL. I still have this hang.
>
> I'm not sure what else to try next.
Can you strace that program, see what it's doing when it hangs?
Andreas
Hi Devin,
On Sun, 29 Apr 2012 13:54:51 -0400, Devin Jeanpierre <jeanpierreda(a)gmail.com> wrote:
> My system is an Intel core i5 laptop, no GPU. It's running Fedora (the
> most recent one, I've never been able to find version numbers), and
> I'm using the Intel OpenCL implementation (version 1.5) for 64-bit
> Linux. My PyOpenCL version is 2011.2 . Python is 2.7.2
>
> In getting started for OpenCL development, I started writing a kernel.
> That kernel did not ever halt. I've pared down the arguments to the
> bare minimum -- if I remove one of those arguments, then the kernel
> halts. That behavior is confusing and alarming. The arguments aren't
> _used_, but as long as they are part of the function signature they
> cause the OpenCL kernel to work forever rather than halting.
>
> Furthermore, top reports 350+% CPU usage, so it's not blocking or
> anything -- it's busy working. If my NDRange only has one work-item,
> then I get about 100% CPU usage. So even just a single kernel instance
> takes forever.
>
> In an effort to rule out PyOpenCL problems, I managed to rule them in:
> a C port of the Python program, with the same kernel, did not take
> forever to execute. It's not an exact copy, and I don't really know
> what I'm doing, so if there are different semantics please point them
> out, as they could also explain differing behavior.
>
> Here is a link to the Python and C code that reproduces and doesn't
> reproduce this problem, respectively:
>
> http://bpaste.net/show/X8Gcr5b2mL3QcgwABdqY/
>
> If you can offer any help or advice, it'd be much appreciated.
FWIW, your program finishes just fine for me, using AMD's CPU CL and
Intel's most recent '2012' CPU CL package. I don't have their version
1.5 around any more, but if there was a hang with that version, I'd
imagine that would have been a bug that they've resolved.
HTH,
Andreas
My system is an Intel core i5 laptop, no GPU. It's running Fedora (the
most recent one, I've never been able to find version numbers), and
I'm using the Intel OpenCL implementation (version 1.5) for 64-bit
Linux. My PyOpenCL version is 2011.2 . Python is 2.7.2
In getting started for OpenCL development, I started writing a kernel.
That kernel did not ever halt. I've pared down the arguments to the
bare minimum -- if I remove one of those arguments, then the kernel
halts. That behavior is confusing and alarming. The arguments aren't
_used_, but as long as they are part of the function signature they
cause the OpenCL kernel to work forever rather than halting.
Furthermore, top reports 350+% CPU usage, so it's not blocking or
anything -- it's busy working. If my NDRange only has one work-item,
then I get about 100% CPU usage. So even just a single kernel instance
takes forever.
In an effort to rule out PyOpenCL problems, I managed to rule them in:
a C port of the Python program, with the same kernel, did not take
forever to execute. It's not an exact copy, and I don't really know
what I'm doing, so if there are different semantics please point them
out, as they could also explain differing behavior.
Here is a link to the Python and C code that reproduces and doesn't
reproduce this problem, respectively:
http://bpaste.net/show/X8Gcr5b2mL3QcgwABdqY/
If you can offer any help or advice, it'd be much appreciated.
-- Devin
P.S. I asked a question on Stack Overflow some time ago, but I now
have reason to believe it's specifically a PyOpenCL problem rather
than OpenCL in general, so I figured I'd post here.
If you want to see the original, it is here:
http://stackoverflow.com/questions/10306669/opencl-kernel-hangs-forever-unl…
It has had no responses, but this is a cross-post so I figured I'd let you know.
On Wed, 25 Apr 2012 12:17:33 -0400, Andrea Borsic <Andrea.Borsic(a)dartmouth.edu> wrote:
> Hi Andreas,
>
> I'd like to ask you a question (but I don't plan to bother you in the
> future with PyOpnCL questions), I hope you don't mind.
>
> I am at a stage where I am trying to figure out whether I can use
> PyOpenCL for my work, and wanted to ask you one information regarding
> copying only parts of a buffer from host to device.
>
> In most of the problems in medical imaging I am dealing with I need to
> copy data from a 3D array to the GPU in a slice by slice fashion. Whole
> 3D volumes do not fit on the GPU, but fortunately many algorithms can
> work in "sliding slice" way, where only a certain thickness of the
> volume is copied to the GPU, and as the algorithm proceeds a new slice
> is added and one is discarded from the back of the stack. The bottom
> line is that I need to copy only one 2D slice at a time from host to
> device, from a 3D array.
>
> I currently use, in C, clEnqueueWriteBuffer pointing to the base of the
> 3D buffer, specifying the current slice offset and numbers of bytes to copy.
>
> I am looking to do the same with PyOpenCL, I did various searches on the
> internet, but I am not quite sure that I understand how to specify an
> offset and num_bytes in a memory transfer operation.
>
> From the documentation of pyopencl.enqueue_copy and it seems that for
> host <-> Buffer copies the supported parameter is "device_offset", but I
> would need actually to specify a host_offset, to address the single
> slice, and a byte_count, to read only 1 slice. These parameters seem to
> be available only for Buffer <-> Buffer transfers (which I assume is a
> GPU <-> GPU transfer ?)
>
> What's the correct way of copying only a subset of a host memory buffer
> to device and vice-versa (writing back from device to host to a
> particular range within a buffer ?)
(cc'ing pyopencl list)
I assume your data is sitting in a numpy array on the host. Then all you
need to do is enqueue_copy(dev_buf, host[1000:2000]), i.e. pass the
desired slice of the numpy array to enqueue_copy, instead of the whole
array. Obviously, this will only work if the numpy array resulting from
the slice access is contiguous in host memory.
Hope this helps!
Andreas
On Wed, 25 Apr 2012 15:46:36 -0500, Robert Kirby <robert.c.kirby(a)gmail.com> wrote:
> Still have an invalid work group size. See attached.
Oh, misunderstanding--it's either:
prg.stiffmat( queue , (num_cells*num_bf , num_bf ) , (num_bf,num_bf)
,...)
(OpenCL native grid sizing)
*or*
prg.stiffmat( queue , (num_cells,) , (num_bf,num_bf) ,..., g_times_l=True)
(CUDA-like grid sizing)
That said, your example does execute on the CPU CL implementations I
tested, but crashes with a segfault.
HTH,
Andreas
On Tue, 24 Apr 2012 13:18:55 -0500, "Brennan, Brian" <brian.brennan(a)ttu.edu> wrote:
> Hello Andreas,
> I had a quick question about contexts and choosing the platform to run on. When I run my code I am prompted with:
>
> Choose platform:
> [0] <pyopencl.Platform 'Intel(R) OpenCL' at 0x1594c00>
> [1] <pyopencl.Platform 'AMD Accelerated Parallel Processing' at 0x7f0fc2245060>
> Choice [0]:
>
> Now, I can set a default value through the shell with: export PYOPENCL_CTX=0
>
> but I am trying to run comparisons between GPU and CPU run times and
> would like to class my script twice with each value '0' and '1' being
> set in the code. Is there a command I can type into my script to set
> this rather than being prompted each time?
(cc'ing PyOpenCL list)
Undocumented keyword argument:
cl.create_some_context(answers=[0])
:)
That said, what create_some_context does is really easily done manually,
too:
plat = cl.get_platforms()[plat_index]
dev = plat.get_devices()[dev_index]
ctx = cl.Context([dev])
Andreas
Hello.
I am trying to build PyOpenCL on both Python2 and Python3.
Build fails for Python3 while building _pvt_struct:
building '_pvt_struct' extension
gcc -pthread -fwrapv -Wall -O3 -DNDEBUG -fPIC
-I/usr/lib/python3/dist-packages/numpy/core/include
-I/usr/lib/python3/dist-packages/numpy/core/include
-I/usr/include/python3.2mu -c src/wrapper/_pvt_struct_v3.cpp -o
build/temp.linux-x86_64-3.2/src/wrapper/_pvt_struct_v3.o
src/wrapper/_pvt_struct_v3.cpp:80:26: error: ‘_Bool’ does not name a
type
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject* nu_bool(const
char*, const formatdef*)’:
src/wrapper/_pvt_struct_v3.cpp:467:5: error: ‘_Bool’ was not declared in
this scope
src/wrapper/_pvt_struct_v3.cpp:467:15: error: expected ‘;’ before ‘x’
src/wrapper/_pvt_struct_v3.cpp:468:21: error: ‘x’ was not declared in
this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘int np_bool(char*,
PyObject*, const formatdef*)’:
src/wrapper/_pvt_struct_v3.cpp:674:5: error: ‘_Bool’ was not declared in
this scope
src/wrapper/_pvt_struct_v3.cpp:674:15: error: expected ‘;’ before ‘x’
src/wrapper/_pvt_struct_v3.cpp:678:5: error: ‘x’ was not declared in
this scope
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:745:24: error: ‘_Bool’ was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp:745:41: error: ‘_Bool’ was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: In function ‘int
prepare_s(PyStructObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1326:13: error: invalid conversion from
‘void*’ to ‘formatcode* {aka _formatcode*}’ [-fpermissive]
src/wrapper/_pvt_struct_v3.cpp: In function ‘int s_init(PyObject*,
PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1409:41: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack(PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1513:48: error: invalid conversion from
‘void*’ to ‘char*’ [-fpermissive]
src/wrapper/_pvt_struct_v3.cpp:1455:1: error: initializing argument 2
of ‘PyObject* s_unpack_internal(PyStructObject*, char*)’ [-fpermissive]
src/wrapper/_pvt_struct_v3.cpp: In function ‘PyObject*
s_unpack_from(PyObject*, PyObject*, PyObject*)’:
src/wrapper/_pvt_struct_v3.cpp:1528:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1528:51: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:1778:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1778:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1778:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1778:1: warning: deprecated conversion
from string constant to ‘char*’ [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1781:14: error: redefinition of
‘PyTypeObject PyStructType’
src/wrapper/_pvt_struct_v3.cpp:12:21: error: ‘PyTypeObject PyStructType’
previously declared here
error: command 'gcc' failed with exit status 1
make[1]: *** [override_dh_auto_install] Błąd 1
Is there something I am missing, or is it bug in PyOpenCL sources?
Also, am I correct assuming that there is no strict
dependency on matplotlib (only some examples use it?)
If so, and if I manage to build PyOpenCL on both Python2 and Python3,
next Debian (Wheezy) could contain PyOpenCL for all supported Python
versions.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
NVidia Quadro 2000 laptop, gl interop does this:
C:\Users\Keith\Desktop\pyopencl-2011.2\examples>python gl_interop_demo.py
Traceback (most recent call last):
File "gl_interop_demo.py", line 81, in <module>
initialize()
File "gl_interop_demo.py", line 42, in initialize
devices = [platform.get_devices()[0]])
pyopencl.LogicError: Context failed: invalid gl sharegroup reference khr
C:\Users\Keith\Desktop\pyopencl-2011.2\examples>
If I change it to platform [1] (AMD CPU) or platform [2] (Intel CPU) it works.
I've seen this happen before, but I forgot how we fix it.
--Keith Brafford