Hi all,

We have ported a CUDA implementation to an OpenCL implementation. The CUDA version was running in a python application using pyCUDA, so now I'm looking into pyOpenCL to add this new implementation to our applications.
I've managed to have it up and running on a desktop CPU (Intel) and GPU (NVIDIA).

The challenge now is to run pyOpenCL on a server (centos linux) with Xeon Phi cards. The host CPU runs the demo.py nicely. However, the Xeon Phi card returns all zeros in the memory.

The demo.py does recognize the cards:
>>> ctx = cl.create_some_context()
Choose platform:
[0] <pyopencl.Platform 'Intel(R) OpenCL' at 0x7f9e20>
Choice [0]:
Choose device(s):
[0] <pyopencl.Device 'Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz' on 'Intel(R) OpenCL' at 0x7e76d8>
[1] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0xe07c38>
[2] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0x7da438>
[3] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0xfa8488>
[4] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0xfa9b28>
[5] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0xfab208>
[6] <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0xfac8e8>

When I print the resulting array using device 0 (host CPU):
>>> print(res_np)
[ 1.13724446  0.91993028  1.07355368 ...,  0.70078576  1.66417909
  1.3580389 ]

When I print the resulting array using device 1 (Xeon Phi card):
>>> print(res_np)
[ 0.  0.  0. ...,  0.  0.  0.]

The compiler says:
/home/me/.local/lib/python2.7/site-packages/pyopencl/__init__.py:59: CompilerWarning: From-source build succeeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Intel(R) Many Integrated Core Acceleration Card' on 'Intel(R) OpenCL' at 0x1662be8> succeeded, but said:

Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Build started
Kernel <sum> was successfully vectorized (16)
Done.
  warn(text, CompilerWarning)

I'm missing a library? Do I need to install something on the cards related to pyOpenCL?

Any help is very much appreciated!
Sven