Re: [PyCuda] An error:
by Andreas Klöckner
And the PyCUDA version?
Please keep the list cc'd.
Andreas
On Mittwoch 18 März 2009, William King wrote:
> python dump_properties.py
> 1 device(s) found.
> Device #0: GeForce 8600 GT
> Compute Capability: 1.1
> Total Memory: 261440 KB
> CLOCK_RATE: 1242000
> GPU_OVERLAP: 1
> MAX_BLOCK_DIM_X: 512
> MAX_BLOCK_DIM_Y: 512
> MAX_BLOCK_DIM_Z: 64
> MAX_GRID_DIM_X: 65535
> MAX_GRID_DIM_Y: 65535
> MAX_GRID_DIM_Z: 1
> MAX_PITCH: 262144
> MAX_REGISTERS_PER_BLOCK: 8192
> MAX_SHARED_MEMORY_PER_BLOCK: 16384
> MAX_THREADS_PER_BLOCK: 512
> MULTIPROCESSOR_COUNT: 4
> TEXTURE_ALIGNMENT: 256
> TOTAL_CONSTANT_MEMORY: 65536
> WARP_SIZE: 32
>
> Andreas Klöckner wrote:
> > On Mittwoch 18 März 2009, William King wrote:
> >> I have a Nvidia 8600 GT
> >>
> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16814143105
> >>
> >>
> >> python test_driver.py
> >> ......E........
> >> ======================================================================
> >> ERROR: test_mempool (__main__.TestCuda)
> >> ----------------------------------------------------------------------
> >> Traceback (most recent call last):
> >> File "test_driver.py", line 284, in test_mempool
> >> queue.append(pool.allocate(1<<e))
> >> MemoryError: memory_pool::allocate failed: out of memory - failed to
> >> free memory for allocation
> >>
> >> ----------------------------------------------------------------------
> >> Ran 15 tests in 8.286s
> >>
> >> FAILED (errors=1)
> >
> > Using what version of PyCUDA? Can you send the output of
> > 'examples/dump_properties.py'?
> >
> > Andreas
10 years, 9 months
An error:
by William King
I have a Nvidia 8600 GT
http://www.newegg.com/Product/Product.aspx?Item=N82E16814143105
python test_driver.py
......E........
======================================================================
ERROR: test_mempool (__main__.TestCuda)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test_driver.py", line 284, in test_mempool
queue.append(pool.allocate(1<<e))
MemoryError: memory_pool::allocate failed: out of memory - failed to
free memory for allocation
----------------------------------------------------------------------
Ran 15 tests in 8.286s
FAILED (errors=1)
10 years, 9 months
Re: [PyCuda] async memcpy
by Andreas Klöckner
On Mittwoch 18 März 2009, Nicholas Tung wrote:
> Did you get the messages below? Not much reply necessary, but that
> [1-line] bug should be fixed.
Yes, I had gotten them. And then I also managed to forget about them. Sorry
about that.
> >> Also, I think you have a bug...
> >>
> >> ~device_allocation()
> >> {
> >> if (m_valid)
> >> free();
> >> }
> >>
> >> however,
> >> ~pooled_allocation()
> >> { free(); }
Well-spotted. Fixed in git.
> >> Also, do you typically build using python setup.py build?
Yes.
> >> It doesn't
> >> seem to detect c++ file changes for me...
Distutils kinda sucks. It does detect changes in the .cpp files, but not in
any included headers. rm -Rf build, python setup.py install. Ick. :)
> > By the way, I know this is probably my problem, but have you ever
> > encountered, "Fatal Python error: GC object already tracked"
No, not that I recall.
> > For some reason, my multithreaded code isn't printing stack traces where
> > I'd like it to... any suggestions would be helpful (can pdb handle mt
> > code well?)
I have nearly no experience with threaded Python code, and I'm not too keen on
changing that.
Andreas
10 years, 9 months
How to use float4 textures?
by Ahmed Fasih
Greetings. I see that float2 textures in CUDA are seamlessly filled by
complex64 Numpy arrays, but what Python representation should an array
have if I want to load it as a float4 texture? (I want a 1d texture.)
Would I create a 4Nx1 float32 array in Python in C (not Fortran)
ordering and use the ArrayDescriptor class with num_channels=4 and
then the Memcpy2D class ... ? doesn't sound right (I'll try it
though), so thanks for any advice!
Ahmed
10 years, 9 months
Re: [PyCuda] async memcpy
by Nicholas Tung
On Sat, Mar 14, 2009 at 15:37, Nicholas Tung <ntung(a)ntung.com> wrote:
> On Fri, Mar 13, 2009 at 21:28, Andreas Klöckner <lists(a)informa.tiker.net>wrote:
>
>> On Freitag 13 März 2009, Nicholas Tung wrote:
>> > On Wed, Mar 11, 2009 at 22:44, Andreas Klöckner
>> <lists(a)informa.tiker.net>wrote:
>> > > On Donnerstag 12 März 2009, Nicholas Tung wrote:
>> > > > Is there merit in creating a "main device thread" and letting Python
>> > > > threads post requests to it [memcpy, kernel invocation, etc.], which
>> > >
>> > > would
>> > >
>> > > > be synchronized through streams [typically one stream per thread,
>> but
>> > > > modifications for passing data between threads]? If so, I'd be happy
>> to
>> > > > contribute to any implementation.
>> > >
>> > > This would probably have a non-negligible latency penalty, rendering
>> the
>> > > approach useful to only a few applications. If you write something
>> like
>> > > this,
>> > > please make it available so people with similar needs can find it. I'd
>> > > also have no problem sticking it into examples/.
>> >
>> > [from above, the "main device thread" is "DeviceContextThread" (could be
>> > multiple if there are multiple contexts) and the "Python threads post
>> > requests..." are "ExecutionThreads" below]
>> >
>> > The memory freeing gets kind of bad though; right now, I have a
>> > DeviceContextThread which keeps a list of all memory allocated, and
>> makes
>> > the main thread drop the ref counts when ExecutionThreads no longer have
>> > any references. This can get tricky, because one has to ensure that
>> > ExecutionThreads [instances] don't have any references to memory, as the
>> > thread objects stick around after the thread actually closes. The other
>> > unfortunate aspect is that it's potentially slow and unintuitive "if
>> > getrefcount(ref) == 3".
>>
>> All types of memory handle in PyCUDA have an explicit 'free()'. Use that.
>> Forget refcounts.
>
>
> Are you suggesting writing c++-like code and tracking every piece of
> memory? I think this might take too much time since I haven't done it since
> the beginning...
>
> Also, I think you have a bug...
>
> ~device_allocation()
> {
> if (m_valid)
> free();
> }
>
> however,
> ~pooled_allocation()
> { free(); }
>
> adding the if(m_valid) [to avoid the exception in the code below] seems to
> help with some problems... will get back to you with more later.
>
> Also, do you typically build using python setup.py build? It doesn't seem
> to detect c++ file changes for me...
>
> thanks,
> Nicholas
[cc'ing to list]
By the way, I know this is probably my problem, but have you ever
encountered,
"Fatal Python error: GC object already tracked"
For some reason, my multithreaded code isn't printing stack traces where I'd
like it to... any suggestions would be helpful (can pdb handle mt code
well?)
Thanks,
Nicholas
10 years, 9 months
curandom.rand() OverflowError: long int to large to convert to int
by Michael Freitas
I have compiled pycuda 0.92 on a macbook pro (8600M GT). For examples
that use curandom.rand() I get the following error.
Freitas-MacBook-Pro:examples mfreitas$ python demo_elementwise.py
Traceback (most recent call last):
File "demo_elementwise.py", line 7, in ?
a_gpu = curand((50,))
File "/opt/local/lib/python2.4/site-packages/pycuda-0.92-py2.4-macosx-10.5-i386.egg/pycuda/curandom.py",
line 215, in rand
result.gpudata, numpy.random.randint(2**32), result.size)
File "mtrand.pyx", line 700, in mtrand.RandomState.randint
OverflowError: long int too large to convert to int
I think this is a numpy related error. Has anybody encountered this
error and knows a solution? Both python 2.4 and 2.5 yield the same
error.
10 years, 9 months
Re: [PyCuda] async memcpy
by Andreas Klöckner
On Mittwoch 11 März 2009, you wrote:
> Hi Andreas,
>
> For asynchronous memory copies, what do I use to load data to a
> page-locked array?
> Does page_locked[:] = original[:] work?
That should do it. Even if it comes out pagelocked_something(), it's just a
plain old numpy array, just in special memory.
Andreas
10 years, 9 months
Error raised at _get_nvcc_version---unable to use cuda.SourceModule
by Minjae Kim
Hello,
I have this error that that does not allow me to run the example tutorial
project on the PyCuda website.
The example code uses cuda.SourceModule to load the code into the host.
When I run the script, I get an error at that line saying that
OSError
"nvcc was not found (is it on the PATH?) [[Errno 2] No such file or
directory]"
So, I followed to the source of the error, and it is the file driver.py at
/usr/lib/python2.5/site-packages/pycuda-xxxxxxxxx/pycuda/
from driver.py
@memoize
def _get_nvcc_version(nvcc):
from subprocess import Popen, PIPE
try:
return Popen([nvcc, "--version"], stdout=PIPE).communicate()[0]
except OSError, e:
raise OSError, "%s was not found (is it on the PATH?) [%s]" % (
nvcc, str(e))
Now, I know that nvcc is in the PATH, as shown by typing nvcc --version at
terminal which leads to correct version information displayed.
So I am suspecting there is something even more profound going on, or I am
missing something obvious.
Any help is greatly appreciated.
Best,
Minjae
10 years, 9 months
Texture followup
by Holger Rapp
Hi,
to get more experience with pycuda and textures I just hacked a script
together which rotates images. It should be able to handle all images
that PIL can read and it shouldn't need anything but numpy and PIL.
This is quite verbose so that it should be fairly easy to follow
through, so maybe it would make a good contribution for the demo
folder in pycuda. If you think so too Andreas, feel free to include it.
Thanks for your support!
Holger
----------------------------------------
Dipl.-Phys. Holger Rapp
Institut für Mess- und Regelungstechnik
Universität Karlsruhe (TH)
Engler-Bunte-Ring 21
76131 Karlsruhe, Germany
Geb. 40.32, Zi. 232, zweite Etage
Tel: +49 (0)721 / 608-2341
Fax: +49 (0)721 / 661874
Mail: Rapp(a)mrt.uka.de
Web: www.mrt.uni-karlsruhe.de
----------------------------------------
10 years, 9 months
Using Textures with pycuda
by Holger Rapp
Hey,
I discovered pycuda today (after deciding it is time to finally put my
cuda card to some use) and was very thrilled to find a library so much
in my style: write code in a nice language, make sure the inner loop
is fast! I was also quite pleased how fast I got my toy example (1d
cubic spline interpolation) to run on the gpu. I then turned to my
real problem.
I need cubic interpolation of numpy arrays, so I can sample my pixels
at pixel positions (x=2.345,y=pi). I used scipy.interpolate before,
but now I'm looking for a cuda implementation. I found http://www.dannyruijters.nl/cubicinterpolation/
which seems like exactly what i want, but i was unable to get it to
work with pycuda: The kernels rely on texture<> to write their results
to and I have not understood how I can feed the memory of my numpy
array as a texture to the kernels. I have some vague understanding of
textures in cuda in general, so I think some preprocessing is needed
(feeding alignement informations, how the data should be adressed and
it must be transferred to the cuda device). Does somebody have sample
code using pycuda? Something simple like a rotation kernel would be a
perfect example!
Hope somebody can help!
Greetings and thanks for the great work with pycuda. I will follow it
closely.
Holger
----------------------------------------
Dipl.-Phys. Holger Rapp
Institut für Mess- und Regelungstechnik
Universität Karlsruhe (TH)
Engler-Bunte-Ring 21
76131 Karlsruhe, Germany
Geb. 40.32, Zi. 232, zweite Etage
Tel: +49 (0)721 / 608-2341
Fax: +49 (0)721 / 661874
Mail: Rapp(a)mrt.uka.de
Web: www.mrt.uni-karlsruhe.de
----------------------------------------
10 years, 9 months