Hi Andreas,
On 29/09/14 16:17, Andreas Kloeckner wrote:
> GPUArrays don't actually care who "owns" the data, so if you're OK with
> building a GPUArray as a 'descriptor' structure (which is quick and
> lightweight) without moving any data around, then that would likely be a
> reasonable way of going about this.
>
> How does that sound?
That sounds like exactly what I am looking for. Do you have an API
reference for wrapping an allocation with a GPUArray structure?
Regards, Freddie.
Hi all,
In my application I need to perform a (relatively) simple reduction of
the form: sum(f(x[i], y[i])) over two device allocations x, y. If
possible I would very much like to use the rather nice reduction code
already in the pycuda.reduction module.
However, the module only appears to work with GPUArrays and no raw
device allocations. Is anyone aware of any simple workarounds for this
-- other than creating actual GPUArrays for my data?
Regards, Freddie.
Dnia 2014-09-18, czw o godzinie 22:30 +1200, Chris O'Halloran pisze:
[ cut ]
> However, I think I've determined that the opengl part of the
> python-pycuda package hasn't been enabled properly.
>
>
> Normally when you install pycuda from the tar package and run
> configure you need to pass the ---cuda-enable-gl option. I note in
> pycuda 2014 that has changed to --no-cuda-enable-gl
>
>
Thanks for trying PyCUDA packages.
> I find on my machine I can
>
>
> import pycuda.driver as cuda_drv
>
> import pycuda.compiler as Source_module
>
>
> etc
>
>
> but I cannot
>
> import pycuda.gl
>
Sorry - it's my fault.
OpenGL is enabled, but Python files from pycuda/gl/ packages
are not installed when package is build. I'll look into it
and let you know when it's fixed.
Best regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
[sending to pycuda(a)tiker.net again, I think I replied to the wrong
address last time so I'm not sure they ended up on the list]
On 2014-09-25 00:53, Andreas Kloeckner wrote:
> Thomas Unterthiner <thomas_unterthiner(a)web.de> writes:
>> The C library uses the runtime API, thus it does not do any explicit
>> context management. It calls cudaDeviceReset() before it returns to
>> Python (which as far as I understand should undo any implicit context
>> allocations it did before, according to
>> http://developer.download.nvidia.com/compute/cuda/4_1/rel/toolkit/docs/onli…
>> ).
>>
>> If needed, I can see if I can provide a minimal C library that exhibits
>> the behavior. I just wanted to make sure I didn't have any mistakes in
>> my PyCUDA code beforehand.
>
> I seem to remember that what Nvidia hacked together in terms of their runtime/driver API
> interoperability requires that the runtime API (in your case the C
> library) be in charge of managing CUDA contexts.
>
> Andreas
>
From what little I found by googling around, mixing runtime API/context
APIs seems to be quite a mess. I found no clear indication of how to
handle this within the C library. Do you by any chance have any
pointers? (I have forgotten to point this out before: I do have the
source code to the C library and can modify it if there's something I
can do from there. But I'd rather not rewrite the whole thing using the
driver API if it can be avoided).
Ideally what I'd want is to shut down PyCUDA completely before calling
the C library, and re-initializing it from the ground up again
afterwards. But for some reason this doesn't seem to work the way I
envisioned. So either
1) I forgot some steps when shutting down PyCUDA
2) I forgot some steps when re-initializing PyCUDA
3) the C library doesn't clean up properly before exiting
4) as both PyCUDA and the C library operate within the same
process/thread, I can't avoid some sort of co-dependence between the two
I was hoping it would be one of the cases 1-3, as these are probably
easiest to remedy. Can you confirm that I did 1 and 2 correctly (the
code I used for these two steps is included in the first email to the
list)?
Cheers
Thomas
Hi!
I have a program that makes extensive use of pycuda, but also calls out
to a C library which also uses CUDA internally (it does not share any
state or memory with the pycuda code, and uses the CUDA runtime API).
However, after the call to the C library ends, all my PyCUDA calls fail
with a "LogicError: cuFuncSetBlockShape failed: invalid handle". (The
same call with the same parameters worked fine before calling out to the
C library).
I have tried explicitly initializing/shutting down the PyCUDA contexts,
but I still can't get stuff to work. The relevant parts of my program
look as follows:
# initialize PyCUDA
def init_pycuda(gpu_id):
import pycuda.driver as cuda
global __pycuda_context, __pycuda_device
pycuda_drv.init()
__pycuda_device = pycuda_drv.Device(gpu_id)
__pycuda_context = __pycuda_device.make_context()
import scikits.cuda.misc
scikits.cuda.misc.init()
init_pycuda(0)
use_pycuda()
# trying to shut down PyCUDA
import scikits.cuda.misc
from pycuda.tools import clear_context_caches
cuda_memory_pool.free_held()
cuda_hostmemory_pool.free_held()
scikits.cuda.misc.shutdown()
__pycuda_context.pop()
clear_context_caches()
__pycuda_context = None
__pycuda_device = None
# ... now I'm calling out to the other library
call_external_library()
init_pycuda(0)
use_pycuda() # this will now fail with a LogicError
As said before, the C library uses the CUDA runtime API, so it uses
cudaSetDevice to initialize and calls cudaDeviceReset at the end. Is
there something I'm overlooking wrt. how to (de)initializing PyCUDA?
Cheers
Thomas
Hello all,
Following up on the above post.
I've had pycuda working successfully on ubuntu 12.04 laptop using an
Optimus card using Nvidia and Intel graphics cards.
I then upgraded to 14.04 but have subsequently found Bumblebee to be a bit
broken and am now using nvidia-prime as the nvidia driver.
Trying to keep everything tidy, I thought I'd install python-pycuda using
apt-get and it was very pleasing to see all the cuda toolkit automagically
install. Great work. I'm not too fussy about being on the latest cuda
toolkit since I'm a bit of a hobbyist.
However, I think I've determined that the opengl part of the python-pycuda
package hasn't been enabled properly.
Normally when you install pycuda from the tar package and run configure you
need to pass the ---cuda-enable-gl option. I note in pycuda 2014 that has
changed to --no-cuda-enable-gl
I find on my machine I can
import pycuda.driver as cuda_drv
import pycuda.compiler as Source_module
etc
but I cannot
import pycuda.gl
I've downloaded the ubuntu source deb package with a view to recompiling to
enable gl but I get so many compilation errors I thought it best to
describe the issue on this mailing list. Reading the debian/rules it seems
as though the --cuda-enable-gl is selected for so I'm not really sure where
to proceed. A lot of modifications have been made to the ubuntu source
package.
I should add that code that doesn't rely on pycuda.gl compiles and runs
fine.
Thanks for all the packaging work.
Cheers and regards,
Chris O'Halloran
Bruce Labitt <bdlabitt(a)gmail.com> writes:
> I'm trying to port an FDTD code to pycuda and have run into a problem. The
> error occurs when using slices.
>
> Known error?
>
> All variables are gpuarray. Fails in main time loop.
>
> In [4]: run fdtd_pycuda.py
> ---------------------------------------------------------------------------
> RuntimeError Traceback (most recent call last)
> /usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in
> execfile(fname, *where)
> 202 else:
> 203 filename = fname
> --> 204 __builtin__.execfile(filename, *where)
>
> /home/bruce/FDTD/fdtd_pycuda.py in <module>()
> 554
> 555 bx[1:ie_tot,:,:] = D1hx[1:ie_tot,:,:] * bx[1:ie_tot,:,:] -
> D2hx[1:ie_tot,:,:] * \
> --> 556 ( ( ez[1:ie_tot, 1:jh_tot, :] - ez[1:ie_tot, 0:je_tot,
> :]) - ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] ) ) / delta
> 557 """
> 558 above line generates RunTimeError: only contiguous arrays
> may be used as arguments to this operation
>
> /usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
> in __sub__(self, other)
> 425 if isinstance(other, GPUArray):
> 426 result = self._new_like_me(_get_common_dtype(self,
> other))
> --> 427 return self._axpbyz(1, other, -1, result)
> 428 else:
> 429 if other == 0:
>
> /usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
> in _axpbyz(self, selffac, other, otherfac, out, add_timer, stream)
> 308 assert self.shape == other.shape
> 309 if not self.flags.forc or not other.flags.forc:
> --> 310 raise RuntimeError("only contiguous arrays may "
> 311 "be used as arguments to this operation")
> 312
>
> RuntimeError: only contiguous arrays may be used as arguments to this
> operation
>
>
> Evaluating the following generates the RuntimeError.
>
> In [17]: ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] )
> ---------------------------------------------------------------------------
> RuntimeError Traceback (most recent call last)
> <ipython-input-17-1c6dfb98d933> in <module>()
> ----> 1 ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] )
>
> /usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
> in __sub__(self, other)
> 425 if isinstance(other, GPUArray):
> 426 result = self._new_like_me(_get_common_dtype(self,
> other))
> --> 427 return self._axpbyz(1, other, -1, result)
> 428 else:
> 429 if other == 0:
>
> /usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
> in _axpbyz(self, selffac, other, otherfac, out, add_timer, stream)
> 308 assert self.shape == other.shape
> 309 if not self.flags.forc or not other.flags.forc:
> --> 310 raise RuntimeError("only contiguous arrays may "
> 311 "be used as arguments to this operation")
> 312
>
> RuntimeError: only contiguous arrays may be used as arguments to this
> operation
>
> ey.shape = (111, 110, 111)
> kh_tot = 111
> ke_tot = 110
>
> Shapes are appropriate.
>
> Work arounds for slices? Straight numpy implementation works fine.
None yet. I'd appreciate patches, but for now the linear algebra
functionality in PyCUDA only works on contiguous arrays.
Andreas
I'm trying to port an FDTD code to pycuda and have run into a problem. The
error occurs when using slices.
Known error?
All variables are gpuarray. Fails in main time loop.
In [4]: run fdtd_pycuda.py
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/usr/lib/python2.7/dist-packages/IPython/utils/py3compat.pyc in
execfile(fname, *where)
202 else:
203 filename = fname
--> 204 __builtin__.execfile(filename, *where)
/home/bruce/FDTD/fdtd_pycuda.py in <module>()
554
555 bx[1:ie_tot,:,:] = D1hx[1:ie_tot,:,:] * bx[1:ie_tot,:,:] -
D2hx[1:ie_tot,:,:] * \
--> 556 ( ( ez[1:ie_tot, 1:jh_tot, :] - ez[1:ie_tot, 0:je_tot,
:]) - ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] ) ) / delta
557 """
558 above line generates RunTimeError: only contiguous arrays
may be used as arguments to this operation
/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
in __sub__(self, other)
425 if isinstance(other, GPUArray):
426 result = self._new_like_me(_get_common_dtype(self,
other))
--> 427 return self._axpbyz(1, other, -1, result)
428 else:
429 if other == 0:
/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
in _axpbyz(self, selffac, other, otherfac, out, add_timer, stream)
308 assert self.shape == other.shape
309 if not self.flags.forc or not other.flags.forc:
--> 310 raise RuntimeError("only contiguous arrays may "
311 "be used as arguments to this operation")
312
RuntimeError: only contiguous arrays may be used as arguments to this
operation
Evaluating the following generates the RuntimeError.
In [17]: ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] )
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-17-1c6dfb98d933> in <module>()
----> 1 ( ey[1:ie_tot,:,1:kh_tot] - ey[1:ie_tot,:,0:ke_tot] )
/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
in __sub__(self, other)
425 if isinstance(other, GPUArray):
426 result = self._new_like_me(_get_common_dtype(self,
other))
--> 427 return self._axpbyz(1, other, -1, result)
428 else:
429 if other == 0:
/usr/local/lib/python2.7/dist-packages/pycuda-2014.1-py2.7-linux-x86_64.egg/pycuda/gpuarray.pyc
in _axpbyz(self, selffac, other, otherfac, out, add_timer, stream)
308 assert self.shape == other.shape
309 if not self.flags.forc or not other.flags.forc:
--> 310 raise RuntimeError("only contiguous arrays may "
311 "be used as arguments to this operation")
312
RuntimeError: only contiguous arrays may be used as arguments to this
operation
ey.shape = (111, 110, 111)
kh_tot = 111
ke_tot = 110
Shapes are appropriate.
Work arounds for slices? Straight numpy implementation works fine.
Thanks,
Bruce
Hi,
Does anyone have any experience or tips on distributing an application
using PyCUDA for users/computers that have a suitable GPU & driver but
otherwise unprepared for PyCUDA, i.e. not the full C++ compiler + CUDA
SDK toolchain?
Otherwise, I suspect the context cache is a place to start: I would
compile all possible kernels, persist the cache and at runtime load the
cache so that no compilation is necessary? Any information there upon
would be welcome.
Thanks,
Marmaduke
Bogdan Opanchuk <mantihor(a)gmail.com> writes:
> P.S. Fixes include corrected invocations of prepare() and prepared_call(),
> of course.
Updated. For future reference--you can't sign up for user accounts, but
you *can* edit anonymously on that wiki.
Andreas