I am trying to use FP16 (half-precision) with pycuda. However I have encountered an issue when trying to use this for element-wise kernels.
If I try a very simple kernel:
import numpy as np
from pycuda.elementwise import ElementwiseKernel as CU_ElK
import pycuda.gpuarray as cua
cu_options = ['-use_fast_math', '-D__CUDA_NO_HALF_OPERATORS__', '-D__CUDA_NO_HALF2_OPERATORS__']
testk = CU_ElK(name='testk', operation="d[i] *= 2", preamble='#include <cuda_fp16.h>',
options=cu_options, arguments="float *d")
cu_d = cua.empty(128, dtype=np.float32)
(the kernel does not even use half-precision, only the fp16 header is necessary to trigger the issue)
This works on MacOS (it only requires the D__CUDA_NO_HALF_OPERATORS__ to avoid multiple linkage), but on debian9 and Ubuntu20 it fails with a bunch of errors like:
/usr/include/c++/8/bits/stl_pair.h(446): error: this declaration may not have extern "C" linkage
which come from the cuda_fp16.h using STL headers (std::move etc..).
This is due to the kernel being compiled with an ‘extern “C”’ directive, which is necessary to avoid C++ name mangling and still be able to access the element wise kernel function.
The workaround is to include the cuda_fp16.h header _before_ the ‘extern “C”’ - I’ve tested this and that runs without a hitch.
So my question is how to proceed - I’d like as much as possible to directly use pycuda without having to write a derived version of SourceModule and the element-wise code.
I see two options:
1) if there is a way to have an element-wise kernel with no_extern_c=True - but I don’t know how to resolve the name mangling issue to access the kernel function ?
2) add a ‘cpp_preamble’ option to SourceModule and ElementwiseKernel (and others) to add a preamble before the ‘extern “C”’
I could propose a PR for 2) but I’d like to know if that’d be acceptable in pycuda. Note that it also removes the need for D__CUDA_NO_HALF_OPERATORS__
Co-editor, J. Synchrotron Radiation http://journals.iucr.org/s/ <http://journals.iucr.org/s/>
Director, HERCULES school http://hercules-school.eu <http://hercules-school.eu/>
ESRF-The European Synchrotron http://www.esrf.eu <http://www.esrf.eu/>
71, Avenue des Martyrs
X-Ray NanoProbe (XNP) group
Tel: +33 4 76 88 28 11
On leave from Univ. Grenoble Alpes
Vernon Perry <vernonperry(a)protonmail.com> writes:
> My CUDA install was just via apt; do you suggest doing it the
> old-fashioned way from Nvidia itself?
Please keep the list cc'd for archival.
Via apt from Ubuntu's package sources? Or from some other sources (check
your /etc/apt/sources.list*)? If it was from Ubuntu, then that's a bug
in their packages... Inconsistent software state is what apt
dependencies are designed to prevent. Maybe reboot to activate a new
I've installed PyCUDA using several different methods, including pip, apt, as well as compiling from source, but there is still a conflict with the version of CUDA that I am running it would appear:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda.driver as cuda
>>> import pycuda.autoinit
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/gp/.local/lib/python3.8/site-packages/pycuda/autoinit.py", line 5, in <module>
pycuda._driver.LogicError: cuInit failed: system has unsupported display driver / cuda driver combination
(2019, 1, 2)
Any ideas? I can't seem to find the version requirements anywhere on github or elsewhere. I am running a clean install of Ubuntu 20.04 LTS.
Thanks in advance