I am trying to use FP16 (half-precision) with pycuda. However I have encountered an issue when trying to use this for element-wise kernels.
If I try a very simple kernel:
import numpy as np
from pycuda.elementwise import ElementwiseKernel as CU_ElK
import pycuda.gpuarray as cua
cu_options = ['-use_fast_math', '-D__CUDA_NO_HALF_OPERATORS__', '-D__CUDA_NO_HALF2_OPERATORS__']
testk = CU_ElK(name='testk', operation="d[i] *= 2", preamble='#include <cuda_fp16.h>',
options=cu_options, arguments="float *d")
cu_d = cua.empty(128, dtype=np.float32)
(the kernel does not even use half-precision, only the fp16 header is necessary to trigger the issue)
This works on MacOS (it only requires the D__CUDA_NO_HALF_OPERATORS__ to avoid multiple linkage), but on debian9 and Ubuntu20 it fails with a bunch of errors like:
/usr/include/c++/8/bits/stl_pair.h(446): error: this declaration may not have extern "C" linkage
which come from the cuda_fp16.h using STL headers (std::move etc..).
This is due to the kernel being compiled with an ‘extern “C”’ directive, which is necessary to avoid C++ name mangling and still be able to access the element wise kernel function.
The workaround is to include the cuda_fp16.h header _before_ the ‘extern “C”’ - I’ve tested this and that runs without a hitch.
So my question is how to proceed - I’d like as much as possible to directly use pycuda without having to write a derived version of SourceModule and the element-wise code.
I see two options:
1) if there is a way to have an element-wise kernel with no_extern_c=True - but I don’t know how to resolve the name mangling issue to access the kernel function ?
2) add a ‘cpp_preamble’ option to SourceModule and ElementwiseKernel (and others) to add a preamble before the ‘extern “C”’
I could propose a PR for 2) but I’d like to know if that’d be acceptable in pycuda. Note that it also removes the need for D__CUDA_NO_HALF_OPERATORS__
Co-editor, J. Synchrotron Radiation http://journals.iucr.org/s/ <http://journals.iucr.org/s/>
Director, HERCULES school http://hercules-school.eu <http://hercules-school.eu/>
ESRF-The European Synchrotron http://www.esrf.eu <http://www.esrf.eu/>
71, Avenue des Martyrs
X-Ray NanoProbe (XNP) group
Tel: +33 4 76 88 28 11
On leave from Univ. Grenoble Alpes
Vernon Perry <vernonperry(a)protonmail.com> writes:
> My CUDA install was just via apt; do you suggest doing it the
> old-fashioned way from Nvidia itself?
Please keep the list cc'd for archival.
Via apt from Ubuntu's package sources? Or from some other sources (check
your /etc/apt/sources.list*)? If it was from Ubuntu, then that's a bug
in their packages... Inconsistent software state is what apt
dependencies are designed to prevent. Maybe reboot to activate a new
I've installed PyCUDA using several different methods, including pip, apt, as well as compiling from source, but there is still a conflict with the version of CUDA that I am running it would appear:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Python 3.8.2 (default, Apr 27 2020, 15:53:34)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda.driver as cuda
>>> import pycuda.autoinit
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/gp/.local/lib/python3.8/site-packages/pycuda/autoinit.py", line 5, in <module>
pycuda._driver.LogicError: cuInit failed: system has unsupported display driver / cuda driver combination
(2019, 1, 2)
Any ideas? I can't seem to find the version requirements anywhere on github or elsewhere. I am running a clean install of Ubuntu 20.04 LTS.
Thanks in advance
I need to access the tridiagonal solving routines gtsv2StridedBatch and gtsvInterleavedBatch from the cuSPARSE library in a Python/C program. Is there a way to access/link to the cuSPARSE library using PyCUDA?
For background, I’m hoping to port a Python (with inner loops in C) application to leverage GPU computing. The most intensive part of the computation is solving tridiagonal systems, so I was excited to see that the standard cuSPARSE library includes routines for this. But I’m struggling to see how to access cuSPARSE using any of the existing Python to CUDA interfaces. For PyCUDA, I couldn’t find a similar example in the documentation. Pyculib has bindings for an older version of cuSPARSE, but it’s not maintained and I couldn’t get it installed easily: https://pyculib.readthedocs.io/en/latest/cusparse.html. CuPy seems to support only a very small fraction of cuSPARSE: https://docs-cupy.chainer.org/en/stable/reference/sparse.html .
I’ll need to write my own kernels for part of the project, so PyCUDA seems ideal. Any advice you can offer would be very appreciated.
Associate Professor and Associate Department Head
Department of Molecular and Cellular Biology, University of Arizona
phone: (520) 626-0569, office: LSS 325, web: http://gutengroup.mcb.arizona.edu
I was wondering if there are any binaries for OpenGL enabled PyCUDA for Windows 10. My understanding (thanks, Andreas) is that it I will probably need to build it on my own. Since I never did that, I went online and found some resources here (https://stackoverflow.com/questions/19634073/pip-install-pycuda-on-windows) with the source code from here (https://files.pythonhosted.org/packages/5e/3f/5658c38579b41866ba21ee1b5020b…). Specifically:1. Downloaded the source code from pythonhosted and untarred it.2. On the main folder I ran:>> python configure.py3. Then I went to siteconf.py and enabled OpenGL on line 9:CUDA_ENABLE_GL = True4. Finally I ran the commands:>> python setup.py build>> python setup.py install
After that, I tried to open PyCUDA in an IPython window and got:ModuleNotFoudError: No module named 'pycuda._driver'And obviously no pycuda.gl either.
I actually have a MSVC working and the CUDA toolkit successfully installed. I was running PyCUDA successfully on miniconda till I ran the stackoverflow suggestion. Now PyCUDA is broken and I can't recover from that yet. When I do recover, I would like to try something else if possible and if any of you can help me, that would be great.
My application involves collecting some data from a digitizer, uploading and doing some number crunching using a Titan V with PyCUDA and displaying the result through the OpenGL interoperability capacity. So far, I have been pulling the result out of the Titan V, building the image in the CPU and displaying it. For a 128x128 pixel image, I get something like 5 Hz refresh rate and I hope to speed it up to 20 Hz or higher by using the interoperability module. I am using Python 3.7.1 and VS 2017 version 15.9.8.
Thanks for any insights you may have.Cordially,Fabio.
"thierry.moudiki" <thierry.moudiki(a)protonmail.com> writes:
> Hi Andreas,
> I'm interested in using your package PyCUDA, and I have one question about it (just to make sure that I understand how it works). In the example presented here: https://documen.tician.de/pycuda/index.html, when you call the sourced function `multiply_them`:
> - The option `block` has 3 elements in case it's a 1D, 2D or 3D block,
> and each tuple element is the number of threads per block. E.g: if I
> use threadIdx.x and threadIdx.y with blocks of 400 threads each, will
> I have `block = (400, 400, 1)`?
> - For the option `grid`, I'm not sure. Is it: we can have 1D or 2D
> grids? What about this tuple's elements? Tuple element == Number of
> blocks per grid?
In the future, please address requests like this to the mailing
list. I've cc'd them for archival.
Please make sure the list stays cc'd for archival.
"Guralnik,Dan" <danguralnik(a)ufl.edu> writes:
> Andreas, I'm so sorry, should've done it myself so you have more info. Here is what happens:
> C:\Users\danguralnik\Documents\GitHub\kodlab-uma-sims\mice\smooth>python cuda_test.py
> Traceback (most recent call last):
> File "cuda_test.py", line 23, in <module>
> File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 291, in __init__
> arch, code, cache_dir, include_dirs)
> File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 254, in compile
> return compile_plain(source, options, keep, nvcc, cache_dir, target)
> File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 137, in compile_plain
> stderr=stderr.decode("utf-8", "replace"))
> pycuda.driver.CompileError: nvcc compilation of C:\Users\DANGUR~1\AppData\Local\Temp\tmppoj82k3c\kernel.cu failed
> [command: nvcc --cubin -arch sm_75 -m64 -Ic:\programdata\anaconda3\lib\site-packages\pycuda\cuda kernel.cu]
> nvcc fatal : Cannot find compiler 'cl.exe' in PATH
> Is this something about environment variables?
Yes. The Visual Studio compilers also need to be on your PATH
environment variable. VS installs a batch file vcvars.bat that should
arrange for that.
I have just installed pycuda on a new machine running anaconda3 and cuda 10.1.243_426 for Windows 10, and tested the installation by running a program that had been run successfully on another machine, but with anaconda2.
My program breaks on a call for gpuarray.zeros, following a call to gpuarray.to_gpu (no other calls to pycuda have been made before that). Both calls are trying to establish 2D arrays of dtype=np.float32.
The included headers are:
import pycuda.driver as cuda
import pycuda.gpuarray as gpuarray
from pycuda.compiler import SourceModule
I've attached the error trace from python below. I'll be grateful if you could offer an explanation/fix.
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\tools.py", line 428, in context_dependent_memoize
KeyError: <pycuda._driver.Context object at 0x000001A0CCE991B0>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "mouse_base.py", line 832, in <module>
File "mouse_base.py", line 812, in main
File "mouse_base.py", line 587, in __init__
File "mouse_base.py", line 641, in __init__
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\gpuarray.py", line 1068, in zeros
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\gpuarray.py", line 549, in fill
func = elementwise.get_fill_kernel(self.dtype)
File "<C:\ProgramData\Anaconda3\lib\site-packages\decorator.py:decorator-gen-13>", line 2, in get_fill_kernel
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\tools.py", line 432, in context_dependent_memoize
result = func(*args)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", line 496, in get_fill_kernel
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", line 161, in get_elwise_kernel
arguments, operation, name, keep, options, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", line 147, in get_elwise_kernel_and_types
keep, options, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\elementwise.py", line 75, in get_elwise_module
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 291, in __init__
arch, code, cache_dir, include_dirs)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 254, in compile
return compile_plain(source, options, keep, nvcc, cache_dir, target)
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 78, in compile_plain
checksum.update(preprocess_source(source, options, nvcc).encode("utf-8"))
File "C:\ProgramData\Anaconda3\lib\site-packages\pycuda\compiler.py", line 55, in preprocess_source
pycuda.driver.CompileError: nvcc preprocessing of C:\Users\DANGUR~1\AppData\Local\Temp\tmphnd7bq1w.cu failed
[command: nvcc --preprocess -arch sm_75 -m64 -Ic:\programdata\anaconda3\lib\site-packages\pycuda\cuda C:\Users\DANGUR~1\AppData\Local\Temp\tmphnd7bq1w.cu --compiler-options -EP]
I have a Pycuda code, which deals with two kernels. Both kernels run well
separately, but when I put them together, there is a memory problem
"LogicError: cuMemcpyDtoH failed: an illegal memory access was encountered".
In the second kernel "DotKernel", I can't change the values of any shared
array or global array. Could you please have a look at the code? Thank you