Hi all,
thanks to the hard work of Marko Bencun and Yichao Yu, the next version
of PyOpenCL will be substantially different internally from the previous
one. In particular, the wrapper will no longer be built using
Boost.Python but instead using cffi 1.0's ahead-of-time mode. One main
consequence of this is that PyOpenCL now works on Pypy.
This new code is now on the git master branch. (It used to live on the
'cffi' branch. The old Boost wrapper is now on the
'deprecated-boost-python' branch.)
From a user's perspective, nothing should have changed--on all machines
I have access to, PyOpenCL passes the same tests as before, on any
Python version more recent than 2.6, including Pypy. Nonetheless, before
I go ahead and release a new PyOpenCL based on this code, I'd like to
get as many of you as I can to try it and report back. If you package
PyOpenCL, or if you have a Mac or a Windows machine, I'd especially like
to hear from you.
Thanks!
Andreas
Hi all,
I'm writing about PyOpenCL's support for complex numbers. As of right
now, PyOpenCL's complex numbers are typedefs of float2 and double2,
which makes complex+complex addition and real*complex addition match the
desired semantics, but lots of other operators silently do the wrong
thing, such as
- complex*complex
- real+complex
I've come to regard this as a major flaw, and I can't count the number
of times I've had to hunt bugs related to this, and so I'd like to get
rid of it. I've thought about ways of doing this in a
backward-compatible manner, and they all strike me as flawed, and so I'd
prefer to move to a simple struct (supporting both .real and .imag as
well as the old .x and .y members) in one big change.
If you have code depending on PyOpenCL's complex number support and are
opposed to this change, please speak up now. I'll make the change in git
today to give you a preview.
What do you think?
Andreas
> Am 02.07.2015 um 09:39 schrieb Bogdan Opanchuk <bogdan(a)opanchuk.net>:
>
> (did not CC to the mail list by mistake)
>
> Hi Andreas,
>
> I tried to compile & install it on OSX 10.10.4, default clang (from Xcode 6.4) and Pythons 2.7.9, 3.4.3, and pypy-2.6.0 (installed via pyenv). Strangely enough, I have not encountered the problem Gregor reported — PyOpenCL compiles successfully and seems to work fine with my programs.
>
Hi,
as a follow up to my initial report, I could succesfully build recent pyopencl with Python 2.7.6 from python.org <http://python.org/> (now on os 10.10.4), but not with Anaconda Python 2.7. I opened an issue https://github.com/ContinuumIO/anaconda-issues/issues/373 <https://github.com/ContinuumIO/anaconda-issues/issues/373> for anaconda, perhaps there someone knows how to resolve this.
Gregor
Hi,
Has anyone tried OpenCL on Xeon Phi systems (eg. Stampede)? If so, how
did you get it to work, in particular what runtime libraries did you use?
Benson
I recently upgraded my MacBook pro, to discover Apple no longer uses Nvidia and has swapped to AMD,
meaning I have had to swap from pycuda to pyopencl. I have run several Apple supplied openCL demos,
though the code at: http://documen.tician.de/pyopencl/ <http://documen.tician.de/pyopencl/> seems to give odd output, I was hoping someone could
validate if this is correct or what steps I should take? I installed PyOpenCL from the instructions bottom of page: http://wiki.tiker.net/PyOpenCL/Installation/Mac <http://wiki.tiker.net/PyOpenCL/Installation/Mac>
Warmest regards,
Justin
$ python ./pyopencl/pyopencl/examples/demo.py
Choose platform:
[0] <pyopencl.Platform 'Apple' at 0x7fff0000>
Choice [0]:
Choose device(s):
[0] <pyopencl.Device 'Intel(R) Core(TM) i7-4980HQ CPU @ 2.80GHz' on 'Apple' at 0xffffffff>
[1] <pyopencl.Device 'Iris Pro' on 'Apple' at 0x1024500>
[2] <pyopencl.Device 'AMD Radeon R9 M370X Compute Engine' on 'Apple' at 0x1021c00>
Choice, comma-separated [0]:2
Set the environment variable PYOPENCL_CTX=':2' to avoid being asked again.
[ 0. 0. 0. ..., 0. 0. 0.] <— Is this correct?
0.0 <— Is this correct?
Config is:
Model Name: MacBook Pro
Model Identifier: MacBookPro11,5
Processor Name: Intel Core i7
Processor Speed: 2.8 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP114.0172.B04
SMC Version (system): 2.30f2
System Version: OS X 10.10.4 (14E46)
Kernel Version: Darwin 14.4.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Secure Virtual Memory: Enabled
Time since boot: 4 days 1:41
AMD Radeon R9 M370X:
Chipset Model: AMD Radeon R9 M370X
Type: GPU
Bus: PCIe
PCIe Lane Width: x8
VRAM (Total): 2048 MB
Vendor: ATI (0x1002)
Device ID: 0x6821
Revision ID: 0x0083
ROM Revision: 113-C5670E-777
gMux Version: 4.0.20 [3.2.8]
EFI Driver Version: 01.00.777
Displays:
Color LCD:
Display Type: Retina LCD
Resolution: 2880 x 1800 Retina
Retina: Yes
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
[ end ]
Hi all,
We have ported a CUDA implementation to an OpenCL implementation. The
CUDA version was running in a python application using pyCUDA, so now
I'm looking into pyOpenCL to add this new implementation to our
applications.
I've managed to have it up and running on a desktop CPU (Intel) and GPU
(NVIDIA).
The challenge now is to run pyOpenCL on a server (centos linux) with
Xeon Phi cards. The host CPU runs the demo.py nicely. However, the Xeon
Phi card returns all zeros in the memory.
The demo.py does recognize the cards:
>>> ctx = cl.create_some_context()
Choose platform:
[0] <pyopencl.Platform 'Intel(R) OpenCL' at 0x7f9e20>
Choice [0]:
Choose device(s):
[0] <pyopencl.Device 'Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz' on
'Intel(R) OpenCL' at 0x7e76d8>
[1] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0xe07c38>
[2] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0x7da438>
[3] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0xfa8488>
[4] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0xfa9b28>
[5] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0xfab208>
[6] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card'
on 'Intel(R) OpenCL' at 0xfac8e8>
When I print the resulting array using device 0 (host CPU):
>>> print(res_np)
[ 1.13724446 0.91993028 1.07355368 ..., 0.70078576 1.66417909
1.3580389 ]
When I print the resulting array using device 1 (Xeon Phi card):
>>> print(res_np)
[ 0. 0. 0. ..., 0. 0. 0.]
The compiler says:
/home/me/.local/lib/python2.7/site-packages/pyopencl/__init__.py:59:
CompilerWarning: From-source build succeeded, but resulted in
non-empty logs:
Build on <pyopencl.Device 'Intel(R) Many Integrated Core
Acceleration Card' on 'Intel(R) OpenCL' at 0x1662be8> succeeded, but
said:
Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Build started
Kernel <sum> was successfully vectorized (16)
Done.
warn(text, CompilerWarning)
I'm missing a library? Do I need to install something on the cards
related to pyOpenCL?
Any help is very much appreciated!
Sven
Hi Lars,
Lars.Ericson(a)wellsfargo.com writes:
> The reason it doesn't have a Windows installer is that a process of
> post-build host-based tuning which was used on clAmdBlas has been
> replaced by direct access to driver properties in the compilation of
> clBlas. This means that each user of the package has to compile it on
> their machine before they can use it, which also means that wrappers
> for the package to pyOpenCL have to be built at that time. In
> addition, if you have a machine with multiple OpenCL devices (for
> example I have AMD and Intel OpenCL on my workstation, with the Intel
> CPU chip acting as a separate platform and device), I don't know if
> the build is correct and optimal for all devices and platforms on the
> machine at build time or only correct and optimal for the AMD device.
Thanks for sharing your wrapper. So clBlas does tuning at the actual
build time of the library? That seems a little weird, given that the
device that the dgemm would target might not even be available until
runtime... and since OpenCL makes JIT so easy, I get this even less.
Looking at the source, it seems that the library might ship a binary
called clblastune that a use could run to redo the hw tuning:
https://github.com/clMathLibraries/clBLAS/blob/9731ea2a270509211a47bf6cf9df…
If the tuning could be done after the fact, I don't really see the
obstacle to a Windows installer.
Andreas
Hi,
I wrapped AMD's deprecated DGEMM as a form of matrix_multiply in pyOpenCL as follows:
python matrix_multiply_setup.py build_ext -inplace
where matrix_multiply_setup.py is
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy as np
extensions = [
Extension(name = 'matrix_multiply',
sources = ['matrix_multiply.pyx'],
include_dirs = [ "C:\\Program Files (x86)\\AMD APP SDK\\2.9-1\\include",
"C:\\Program Files (x86)\\AMD\\clAmdBlas\\include",
np.get_include() ],
library_dirs = ["C:\\Program Files (x86)\\AMD APP SDK\\2.9-1\\bin\\x86_64",
"c:\\Program Files (x86)\\AMD\\clAmdBlas\\bin64"],
libraries=['clAmdBlas', 'OpenCL'])
]
extensions = cythonize(extensions)
setup(
ext_modules = extensions
)
and matrix_multiply.pyx is:
import numpy as np
cimport numpy as np
import pyopencl as cl
import pyopencl.array as cla
import pyopencl.clrandom as clr
import pyopencl.clmath as clm
from clAmdBlas cimport *
def blas_setup():
clAmdBlasSetup()
def blas_teardown():
clAmdBlasTeardown()
def matrix_multiply(A_g,B_g,C_g,queue):
(M,K)=A_g.shape
N=B_g.shape[1]
cdef cl_event event = NULL
cdef intptr_t queue_p = <intptr_t>queue.int_ptr
cdef cl_command_queue cq = <cl_command_queue>queue_p
cdef intptr_t A_g_p = A_g.data.int_ptr
cdef cl_mem bufA = <cl_mem> A_g_p
cdef intptr_t B_g_p = B_g.data.int_ptr
cdef cl_mem bufB = <cl_mem> B_g_p
cdef intptr_t C_g_p = C_g.data.int_ptr
cdef cl_mem bufC = <cl_mem> C_g_p
err = clAmdBlasDgemm(clAmdBlasRowMajor,clAmdBlasNoTrans,clAmdBlasNoTrans,M,N,K,1.0,
bufA,K,bufB,N,0.0,bufC,N,1,&cq,0,NULL,&event)
where clAmdBlas.pxd is:
from libc.stdint cimport intptr_t, uintptr_t
cdef extern from "clAmdBlas.h":
enum:
CL_SUCCESS = 0
enum clAmdBlasStatus:
clAmdBlasSuccess = CL_SUCCESS
enum clAmdBlasOrder:
clAmdBlasRowMajor = 0
enum clAmdBlasTranspose:
clAmdBlasNoTrans = 0
ctypedef unsigned int cl_uint
ctypedef double cl_double
ctypedef void* cl_mem
ctypedef void* cl_command_queue
ctypedef void* cl_event
ctypedef void* cl_platform_id
ctypedef void* cl_device_id
ctypedef void* cl_context
clAmdBlasStatus clAmdBlasSetup( )
void clAmdBlasTeardown( )
clAmdBlasStatus clAmdBlasDgemm(clAmdBlasOrder order, clAmdBlasTranspose transA, clAmdBlasTranspose transB,
size_t M, size_t N, size_t K, cl_double alpha, const cl_mem A, size_t lda, const cl_mem B,
size_t ldb, cl_double beta, cl_mem C, size_t ldc,
cl_uint numCommandQueues, cl_command_queue *commandQueues,
cl_uint numEventsInWaitList, const cl_event *eventWaitList, cl_event *events)
Once matrix_multiply.pyd is created, it can be used in a pure Python program involving PyOpenCL for example as follows, where queue is a pyOpenCL queue:
import pyopencl.array as cla
import matrix_multiply
import numpy as np
import pyopencl as cl
A = np.ascontiguousarray(np.ones((2,2)))
B = np.ascontiguousarray(np.ones(2,2)))
bufA = cla.to_device(queue, A)
bufB = cla.to_device(queue, B)
bufC = cl.array.zeros(queue, shape=((2,2)), dtype=np.float64)
matrix_multiply.blas_setup()
matrix_multiply.matrix_multiply(bufA, bufB, bufC, queue)
matrix_multiply.blas_teardown()
Note though that clAmdBlas is deprecated in favor of clBlas on gitub: https://github.com/clMathLibraries/clBLAS, which doesn't have a Windows installer, whereas clAmdBlas did.
The reason it doesn't have a Windows installer is that a process of post-build host-based tuning which was used on clAmdBlas has been replaced by direct access to driver properties in the compilation of clBlas. This means that each user of the package has to compile it on their machine before they can use it, which also means that wrappers for the package to pyOpenCL have to be built at that time. In addition, if you have a machine with multiple OpenCL devices (for example I have AMD and Intel OpenCL on my workstation, with the Intel CPU chip acting as a separate platform and device), I don't know if the build is correct and optimal for all devices and platforms on the machine at build time or only correct and optimal for the AMD device.
Thanks,
Lars Ericson
Quantitative Analytics Consultant
Market & Institutional Risk Management
Wells Fargo Bank, N.A. | 301 S. College St., 4th Floor | Charlotte, NC 28202-6000
MAC D1053-04X
Tel 704-410-2219 | Cell 917-891-1639
lars.ericson(a)wellsfargo.com<mailto:lars.ericson@wellsfargo.com>