Hello.
I've been packaging PyCUDA for Debian.
I run all the tests to ensure that package works on Python 2
and Python 3. All tests pass except for on from test_driver.py:
$ python test_driver.py
============================= test session starts
==============================
platform linux2 -- Python 2.7.5 -- pytest-2.3.5
collected 21 items
test_driver.py ........F............
=================================== FAILURES
===================================
_____________________ TestDriver.test_register_host_memory
_____________________
args = (<test_driver.TestDriver instance at 0x24e7d88>,), kwargs = {}
pycuda = <module 'pycuda' from
'/usr/lib/python2.7/dist-packages/pycuda/__init__.pyc'>
ctx = <pycuda._driver.Context object at 0x2504488>
clear_context_caches = <function clear_context_caches at 0x1dbf848>
collect = <built-in function collect>
def f(*args, **kwargs):
import pycuda.driver
# appears to be idempotent, i.e. no harm in calling it more than
once
pycuda.driver.init()
ctx = make_default_context()
try:
assert isinstance(ctx.get_device().name(), str)
assert isinstance(ctx.get_device().compute_capability(),
tuple)
assert isinstance(ctx.get_device().get_attributes(), dict)
> inner_f(*args, **kwargs)
/usr/lib/python2.7/dist-packages/pycuda/tools.py:434:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
self = <test_driver.TestDriver instance at 0x24e7d88>
@mark_cuda_test
def test_register_host_memory(self):
if drv.get_version() < (4,):
from py.test import skip
skip("register_host_memory only exists on CUDA 4.0 and
later")
import sys
if sys.platform == "darwin":
from py.test import skip
skip("register_host_memory is not supported on OS X")
a = drv.aligned_empty((2**20,), np.float64, alignment=4096)
> drv.register_host_memory(a)
E LogicError: cuMemHostRegister failed: invalid value
test_driver.py:559: LogicError
==================== 1 failed, 20 passed in 116.85 seconds
=====================
This test fails both on ION (GeForce 9400M, CC 1.1) and GeForce 460
(CC 2.1). I've compiled PyCUDA with gcc 4.8, run with kernel 3.9
and drivers 304.88.
Regards.
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi,
When using PyCUDA with the cuda 5.5 release candidate, I get a
segmentation fault when Python exits.
I guess it's a problem somewhere in the cleanup process..
The segfault can be reproduced by running:
$ python -c "import pycuda.autoinit"
Setup: Python 2.73rc2, CUDA 5.5, pycuda 2012.1, boost 1.48.0
Anyone got ideas how to fix it?
Best,
Soren
Hello.
For some time I've been working on adding new features from CURAND
into PyCUDA ( git@github.com:rybaktomasz/pycuda.git branch curand-41)
I have added MRG32k3a, and poisson generation to all existing classes.
Branch contains documentation and tests - and all tests pass on my
hardware.
I was not able to add Mtgp32; this generator is different from ones
already existing in CURAND. I do not think I will be able to add
this soon and at the same time I would not like for my additions
to rot in my repository.
IMO it would be the best to remove Mtgp from branch curand-41,
ask for pull request, and incorporate this code into PyCUDA.
This way everyone will be able to use new generators,
and someone will be able to work on remaining features.
Any thoughts, remarks, suggestions?
--
Tomasz Rybak GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860
http://member.acm.org/~tomaszrybak
Hi Ahmed,
That is more or less how I ended up solving the problem.
This would certainly be a useful addition to pycuda as a whole or just
gpuarray.
Perhaps submit a patch when I get the chance
Thanks,
Matthias
On Sun, Jun 16, 2013 at 8:48 PM, Ahmed Fasih <wuzzyview(a)gmail.com> wrote:
> On Fri, Jun 14, 2013 at 2:01 PM, Matthias Lee <matthias.a.lee(a)gmail.com>
> wrote:
> > Hi,
> > I need to add some zero padding around a 2D gpuarray. In numpy I have
> > usually accomplished this by slicing:
> > x = np.zeros((20, 20))
> > y = np.ones((10, 10))
> > x[0:10,0:10]=y
> >
> > I had hoped this would work similarly in pycuda with gpuarrays, but it
> seems
> > it's not implemented yet.
> > Is there a preferred method of doing this?
>
> Matthias, I don't have an easy or exact answer, but I hope the
> following helps. In the past when I have had to do this (to do 2D
> zero-padding before 2D FFT), I recall doing the following:
>
> - `mem_alloc` to allocate new memory to store all of `x`,
> - `memset_d32` to fill it with zeros,
> - `Memcpy2D` by giving its `set_src_device` method `y`'s pointer,
> viz., `y.gpudata` and a few other things it needs.
>
> This is probably an answer that you were fearing, but I thought it
> might help if you knew that others had successfully done this through
> this long-winded method. Once you figure out the details, it's easily
> put into a reusable and testable function... which might be a valuable
> addition to PyCUDA? (I can't share code due to proprietary software
> requirements :-/.)
>
> Best,
> Ahmed
>
--
Matthias Lee
IDIES/Johns Hopkins University
Performance @ Rational/IBM
Matthias.A.Lee(a)gmail.com
MatthiasLee(a)jhu.edu
(320) 496 6293
To know recursion, you must first know recursion.
On Fri, Jun 14, 2013 at 2:01 PM, Matthias Lee <matthias.a.lee(a)gmail.com> wrote:
> Hi,
> I need to add some zero padding around a 2D gpuarray. In numpy I have
> usually accomplished this by slicing:
> x = np.zeros((20, 20))
> y = np.ones((10, 10))
> x[0:10,0:10]=y
>
> I had hoped this would work similarly in pycuda with gpuarrays, but it seems
> it's not implemented yet.
> Is there a preferred method of doing this?
Matthias, I don't have an easy or exact answer, but I hope the
following helps. In the past when I have had to do this (to do 2D
zero-padding before 2D FFT), I recall doing the following:
- `mem_alloc` to allocate new memory to store all of `x`,
- `memset_d32` to fill it with zeros,
- `Memcpy2D` by giving its `set_src_device` method `y`'s pointer,
viz., `y.gpudata` and a few other things it needs.
This is probably an answer that you were fearing, but I thought it
might help if you knew that others had successfully done this through
this long-winded method. Once you figure out the details, it's easily
put into a reusable and testable function... which might be a valuable
addition to PyCUDA? (I can't share code due to proprietary software
requirements :-/.)
Best,
Ahmed
Hi,
I need to add some zero padding around a 2D gpuarray. In numpy I have
usually accomplished this by slicing:
x = np.zeros((20, 20))
y = np.ones((10, 10))
x[0:10,0:10]=y
I had hoped this would work similarly in pycuda with gpuarrays, but it
seems it's not implemented yet.
Is there a preferred method of doing this?
Thanks,
Matthias
--
Matthias Lee
IDIES/Johns Hopkins University
Performance @ Rational/IBM
Matthias.A.Lee(a)gmail.com
MatthiasLee(a)jhu.edu
(320) 496 6293
To know recursion, you must first know recursion.
Le 12/06/2013 22:40, Massimo Becker a écrit :
> If you really want a simple benchmark for speed comparison I recommend
> a matrix multiplication example.
>
> The thing you will really see when comparing the runtime of CUDA
> kernels to the runtime of equivalent CPU functions is the cost of
> transferring your data from CPU memory to GPU memory and back.
>
> For small datasets with little computation, you will see that the
> decrease in compute time when using CUDA is not enough to offset the
> overhead of doing the memory transfer. While with larger datasets that
> require intense computation on each piece of data, the decrease in
> compute time greatly outweighs the overhead of doing the memory transfer.
>
> Another interesting benchmark is to look at the runtime of the CUDA
> kernel broken down into time to copy data from CPU memory to GPU
> memory, time for GPU computation, and time to copy data from GPU
> memory back to CPU memory. I haven't tried this with the latest Kepler
> cards, but historically what you will see is a rather large fixed cost
> of doing the memory transfers.
>
> Many of the programs that see the greatest speed improvement are not
> only making use of the GPU for computation, but also acknowledge the
> memory transfer cost and do something clever to compensate for it. The
> fastest speedups are also achieved by making use of the special
> caches/memory types found on the card.
>
> In short,
> Your new Kepler hardware is much much faster than you think and the
> best results are achieved when hardware architecture is fully utilized
> in the application.
>
> Regards,
> Max
>
>
>
> On Wed, Jun 12, 2013 at 1:24 PM, Andreas Kloeckner
> <lists(a)informa.tiker.net <mailto:lists@informa.tiker.net>> wrote:
>
> Pierre Castellani <pcastell12(a)gmail.com
> <mailto:pcastell12@gmail.com>> writes:
> > I have bought kepler GPU in order to do some numerical
> calculation on it.
> >
> > I would like to use pyCuda (looks to me the best solution).
> >
> > Unfortunatly when I am running a test like
> > MeasureGpuarraySpeedRandom
> >
> <http://wiki.tiker.net/PyCuda/Examples/MeasureGpuarraySpeedRandom?action=ful…>
> >
> > I get the following results:
> > Size |Time GPU |Size/Time GPU|Time CPU |Size/Time
> > CPU|GPU vs CPU speedup
> >
> ---------+---------------+-------------+-----------------+-------------+------------------
> > 1024
> >
> |0.0719905126953|14224.0965047|3.09289598465e-05|33108129.2446|0.000429625497701
> >
> > 2048
> >
> |0.0727789160156|28140.0179079|5.74035215378e-05|35677253.6795|0.000788738341822
> >
> > 4096 |0.07278515625 |56275.2106478|0.00010898976326
> > |37581511.1208|0.00149741745261
> > 8192
> >
> |0.0722379931641|113402.928863|0.000164551048279|49783942.9508|0.00227790171171
> >
> > 16384 |0.0720771630859|227311.94318
> > |0.000254381122589|64407294.9802|0.00352928877467
> > 32768 |0.0722085107422|453796.923149|0.00044281665802
> > |73999022.8609|0.0061324718301
> > 65536 |0.0720480078125|909615.713047|0.000749320983887|87460516.133
> > |0.0104003012247
> > 131072 |0.0723209472656|1812365.64171|0.00153271682739
> > |85516122.5202|0.0211932626071
> > 262144 |0.0727287304688|3604407.75345|0.00305026916504
> > |85941268.0706|0.041940360369
> > 524288 |0.0723101269531|7250547.35888|0.00601688781738
> > |87136076.9741|0.0832094766101
> > 1048576 |0.0627352734375|16714297.1178|0.0123564978027
> > |84860291.0582|0.196962524042
> > 2097152 |0.0743136047363|28220297.0431|0.026837512207
> > |78142563.4322|0.361138613882
> > 4194304 |0.074144744873 |56569133.8905|0.0583531860352
> > |71877891.9367|0.787017153206
> > 8388608 |0.0736544189453|113891442.226|0.121150952148
> > |69240958.0877|1.64485653248
> > 16777216 |0.0743454406738|225665701.191|0.242345166016
> > |69228597.6891|3.2597179305
> > 33554432 |0.0765948486328|438076875.912|0.484589794922
> > |69242960.4412|6.32666300112
> > 67108864 |0.0805058410645|833589999.343|0.970654882812 |69137718.45
> > |12.0569497813
> > 134217728|0.0846059753418|1586385919.64|1.94103554688
> > |69147485.8439|22.9420621774
> > 268435456|0.094531427002 |2839642482.01|3.88270039062
> > |69136278.6189|41.0731173089
> > 536870912|0.111502416992 |4814881385.37|7.7108625
> > |69625273.6967|69.1542184286
> >
> >
> > I was not expecting fantastic result but not that bad.
>
> I've added a note to the documentation of the function you're
> using to benchmark:
>
> http://documen.tician.de/pycuda/array.html#pycuda.curandom.rand
>
> That should answer your concerns.
>
> I'd like to have a word with whoever came up with the idea that
> this was
> a valid benchmark. Random number generation is a bad problem to
> use. Parallel RNGs are more complicated than sequential ones. So
> claiming that both do the same amount of work is... mistaken. But even
> neglecting this basic fact, the notion that all RNGs are somehow
> comparable or do comparable amounts of work is also completely
> off. There are subtle tradeoffs in how much work is done and how
> 'good'
> (uncorrelated, ...) the RN sequence and its subsequences are:
>
> https://www.xkcd.com/221/
>
> If you'd like to assess how viable GPUs and PyCUDA are, I'd
> suggest you
> use a more well-defined workload, such as "compute 10^8 sines and
> cosines", or, even better, the thing that you'd actually like to do.
>
> Andreas
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net <mailto:PyCUDA@tiker.net>
> http://lists.tiker.net/listinfo/pycuda
>
>
>
>
> --
> Respectfully,
> Massimo 'Max' J. Becker
> Computer Scientist / Software Engineer
> Commercial Pilot - SEL/MEL
> (425)-239-1710
Thanks for all those advises and answers.
I will look deeply into the target computation that I shoud reach in
order to evaluate the performances win.
Thanks again,
Pierre.
Hi all,
I have bought kepler GPU in order to do some numerical calculation on it.
I would like to use pyCuda (looks to me the best solution).
Unfortunatly when I am running a test like
MeasureGpuarraySpeedRandom
<http://wiki.tiker.net/PyCuda/Examples/MeasureGpuarraySpeedRandom?action=ful…>
I get the following results:
Size |Time GPU |Size/Time GPU|Time CPU |Size/Time
CPU|GPU vs CPU speedup
---------+---------------+-------------+-----------------+-------------+------------------
1024
|0.0719905126953|14224.0965047|3.09289598465e-05|33108129.2446|0.000429625497701
2048
|0.0727789160156|28140.0179079|5.74035215378e-05|35677253.6795|0.000788738341822
4096 |0.07278515625 |56275.2106478|0.00010898976326
|37581511.1208|0.00149741745261
8192
|0.0722379931641|113402.928863|0.000164551048279|49783942.9508|0.00227790171171
16384 |0.0720771630859|227311.94318
|0.000254381122589|64407294.9802|0.00352928877467
32768 |0.0722085107422|453796.923149|0.00044281665802
|73999022.8609|0.0061324718301
65536 |0.0720480078125|909615.713047|0.000749320983887|87460516.133
|0.0104003012247
131072 |0.0723209472656|1812365.64171|0.00153271682739
|85516122.5202|0.0211932626071
262144 |0.0727287304688|3604407.75345|0.00305026916504
|85941268.0706|0.041940360369
524288 |0.0723101269531|7250547.35888|0.00601688781738
|87136076.9741|0.0832094766101
1048576 |0.0627352734375|16714297.1178|0.0123564978027
|84860291.0582|0.196962524042
2097152 |0.0743136047363|28220297.0431|0.026837512207
|78142563.4322|0.361138613882
4194304 |0.074144744873 |56569133.8905|0.0583531860352
|71877891.9367|0.787017153206
8388608 |0.0736544189453|113891442.226|0.121150952148
|69240958.0877|1.64485653248
16777216 |0.0743454406738|225665701.191|0.242345166016
|69228597.6891|3.2597179305
33554432 |0.0765948486328|438076875.912|0.484589794922
|69242960.4412|6.32666300112
67108864 |0.0805058410645|833589999.343|0.970654882812 |69137718.45
|12.0569497813
134217728|0.0846059753418|1586385919.64|1.94103554688
|69147485.8439|22.9420621774
268435456|0.094531427002 |2839642482.01|3.88270039062
|69136278.6189|41.0731173089
536870912|0.111502416992 |4814881385.37|7.7108625
|69625273.6967|69.1542184286
I was not expecting fantastic result but not that bad.
Until around 4M numbers the CPU is faster.
Another strange results the GPU timing is constant until 16M numbers. I
asume that is, in fact, only the transaction cost which looks quit
important 70ms.
I am under Xubuntu 12.04, Cuda5, i72600(a)3.4Ghz, mem(a)1.3Ghz, python2.7,
running Nsight.
I will be happy to get any comments on this.
Many thanks in advance,
Pierre.
<http://wiki.tiker.net/PyCuda/Examples/MeasureGpuarraySpeedRandom?action=ful…>
Alright, last spam of the day the problem is solved. I had to reinstall
the developer drivers I had from the original cuda package I installed.
This solved my low-res mode problem and the pycuda test scripts now run
fine.
Thanks,
CK
On Mon, Jun 10, 2013 at 4:20 PM, Kay, Christina, Danielle <ckay(a)bu.edu>wrote:
> Thanks for the help. I was able to pull up the nvidia version (304.54)
> but I couldn't get anything for cuda grep-ing libcuda1 (or libcuda) so the
> issue may be there. I attempted restarting my computer and the drivers
> fell apart, I'm stuck in low graphics mode and I'm now attempting to fix
> that. When installing pycuda I tried adding the link to nvidia-current for
> the libcuda.so. I thought I had successfully installed the developer
> drivers with CUDA but I either broke that trying to get pycuda to work or
> it never actually worked to begin with.
>
>
> On Mon, Jun 10, 2013 at 3:16 PM, Jerome Kieffer <Jerome.Kieffer(a)esrf.fr>wrote:
>
>> On Mon, 10 Jun 2013 14:54:25 -0400
>> Christina Kay <ckay(a)bu.edu> wrote:
>>
>> > So I'm running Ubuntu 10.04, I've had CUDA running fine for a while and
>> I'm
>> > trying to install pycuda. I got through the installation process, can
>> open
>> > up python and import pycuda but when I try to run any of the test
>> scripts I
>> > get the following error.
>>
>> > I know the header and driver do not match. I don't know how to make
>> them
>> > match. I'm not the most experienced in troubleshooting linux
>> installations
>> > so the more specific of help I can get the better.
>>
>> Most of the time this kind of error goes away by rebooting ...
>> it it fails on shoud investigate:
>>
>> For the kernel version:
>> cat /proc/driver/nvidia/version
>> NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.88 Wed Mar 27
>> 14:26:46 PDT 2013
>> GCC version: gcc version 4.6.3 (Debian 4.6.3-14)
>>
>> what we are interested in is 304.88
>>
>> For the cuda version, please type:
>> dpkg -l |grep libcuda1
>> ii libcuda1:amd64 304.88-1
>> amd64 NVIDIA CUDA runtime library
>>
>> in my case they match (great!)
>>
>> If cuda was not installed by your system management, this probably means
>> 2 installation are co-habiting (not correctly)
>>
>> Hope this helps
>> --
>> Jérôme Kieffer
>> Data analysis unit - ESRF
>>
>> _______________________________________________
>> PyCUDA mailing list
>> PyCUDA(a)tiker.net
>> http://lists.tiker.net/listinfo/pycuda
>>
>
>
Thanks for the help. I was able to pull up the nvidia version (304.54) but
I couldn't get anything for cuda grep-ing libcuda1 (or libcuda) so the
issue may be there. I attempted restarting my computer and the drivers
fell apart, I'm stuck in low graphics mode and I'm now attempting to fix
that. When installing pycuda I tried adding the link to nvidia-current for
the libcuda.so. I thought I had successfully installed the developer
drivers with CUDA but I either broke that trying to get pycuda to work or
it never actually worked to begin with.
On Mon, Jun 10, 2013 at 3:16 PM, Jerome Kieffer <Jerome.Kieffer(a)esrf.fr>wrote:
> On Mon, 10 Jun 2013 14:54:25 -0400
> Christina Kay <ckay(a)bu.edu> wrote:
>
> > So I'm running Ubuntu 10.04, I've had CUDA running fine for a while and
> I'm
> > trying to install pycuda. I got through the installation process, can
> open
> > up python and import pycuda but when I try to run any of the test
> scripts I
> > get the following error.
>
> > I know the header and driver do not match. I don't know how to make them
> > match. I'm not the most experienced in troubleshooting linux
> installations
> > so the more specific of help I can get the better.
>
> Most of the time this kind of error goes away by rebooting ...
> it it fails on shoud investigate:
>
> For the kernel version:
> cat /proc/driver/nvidia/version
> NVRM version: NVIDIA UNIX x86_64 Kernel Module 304.88 Wed Mar 27
> 14:26:46 PDT 2013
> GCC version: gcc version 4.6.3 (Debian 4.6.3-14)
>
> what we are interested in is 304.88
>
> For the cuda version, please type:
> dpkg -l |grep libcuda1
> ii libcuda1:amd64 304.88-1
> amd64 NVIDIA CUDA runtime library
>
> in my case they match (great!)
>
> If cuda was not installed by your system management, this probably means 2
> installation are co-habiting (not correctly)
>
> Hope this helps
> --
> Jérôme Kieffer
> Data analysis unit - ESRF
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)tiker.net
> http://lists.tiker.net/listinfo/pycuda
>