Hi everybody,
I am new on PyCuda. I just installed everything on Windows XP and, from the installation log, I think that I did it properly. However, I tried to run the test files provided with pycuda and I get this error
Traceback (most recent call last):
File "C:\PyCuda\test\test_gpuarray.py", line 2, in <module>
import pycuda.autoinit
File "C:\PyCuda\pycuda\autoinit.py", line 1, in <module>
import pycuda.driver as cuda
File "C:\PyCuda\pycuda\driver.py", line 1, in <module>
from _driver import *
ImportError: No module named _driver
how can I solve it?
thanks and sorry for the newbieness of this post
den3b
_________________________________________________________________
I tuoi amici sempre a portata di clic, sul nuovo Web Messenger
http://www.windowslive.it/foto.aspx
Hi,
Regarding my complex arithmetic wrapper (complex.py) posted previously
on this mailing list, the following changes need to be made to make it
work. One usage scenario of the wraper is arithmetic operations on the
output of CUFFT, which is in float2 datatype on the GPU.
diff --git a/pycuda/tools.py b/pycuda/tools.py
index f91a6d5..d7af3a9 100644
--- a/pycuda/tools.py
+++ b/pycuda/tools.py
@@ -371,6 +371,8 @@ def dtype_to_ctype(dtype, with_fp_tex_hack=False):
return "fp_tex_double"
else:
return "double"
+ elif dtype == numpy.complex64:
+ return "float2"
else:
raise ValueError, "unable to map dtype '%s'" % dtype
@@ -447,6 +449,7 @@ def parse_c_arg(c_arg):
elif tp in ["char"]: dtype = numpy.int8
elif tp in ["unsigned char"]: dtype = numpy.uint8
elif tp in ["bool"]: dtype = numpy.bool
+ elif tp in ["float2"]: dtype = numpy.complex64
else: raise ValueError, "unknown type '%s'" % tp
return arg_class(dtype, name)
Daniel
I'm trying to install pycuda on Max OS X 10.6.1 and got the following
error:
dbmacpro:make -j 4
ctags -R src || true
ctags: illegal option -- R
usage: ctags [-BFadtuwvx] [-f tagsfile] file ...
/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/
Contents/MacOS/Python setup.py build
/Users/dbelll/src/pycuda-0.93/ez_setup.py:93: UserWarning: Module
pkg_resources was already imported from /Users/dbelll/src/pycuda-0.93/
pkg_resources.pyc, but /Library/Frameworks/Python.framework/Versions/
2.6/lib/python2.6/site-packages/distribute-0.6.8-py2.6.egg is being
added to sys.path
import pkg_resources
/Users/dbelll/src/pycuda-0.93/ez_setup.py:93: UserWarning: Module site
was already imported from /Library/Frameworks/Python.framework/
Versions/2.6/lib/python2.6/site.pyc, but /Library/Frameworks/
Python.framework/Versions/2.6/lib/python2.6/site-packages/
distribute-0.6.8-py2.6.egg is being added to sys.path
import pkg_resources
/Users/dbelll/local/include/boost-1_39 /boost/ python .hpp
/Users/dbelll/local/lib / lib boost_python-xgcc40-mt .so
/Users/dbelll/local/lib / lib boost_python-xgcc40-mt .dylib
/Users/dbelll/local/lib / lib boost_thread-xgcc40-mt .so
/Users/dbelll/local/lib / lib boost_thread-xgcc40-mt .dylib
/usr/local/cuda /bin/ nvcc
/usr/local/cuda/include / cuda .h
/usr/local/cuda/lib / lib cuda .so
/usr/local/cuda/lib / lib cuda .dylib
running build
running build_py
running build_ext
--------------------------------------------------------------------------
Sorry, your build failed. Try rerunning configure with different
options.
--------------------------------------------------------------------------
Traceback (most recent call last):
File "setup.py", line 325, in <module>
main()
File "setup.py", line 317, in main
("include/cuda", glob.glob("src/cuda/*.hpp"))
File "/Users/dbelll/src/pycuda-0.93/aksetup_helper.py", line 12, in
setup
setup(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/core.py", line 152, in setup
dist.run_commands()
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/dist.py", line 975, in run_commands
self.run_command(cmd)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/command/build.py", line 134, in run
self.run_command(cmd_name)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/cmd.py", line 333, in run_command
self.distribution.run_command(command)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/dist.py", line 995, in run_command
cmd_obj.run()
File "/Users/dbelll/src/pycuda-0.93/setuptools/command/
build_ext.py", line 46, in run
_build_ext.run(self)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/command/build_ext.py", line 340, in run
self.build_extensions()
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/command/build_ext.py", line 449, in build_extensions
self.build_extension(ext)
File "/Users/dbelll/src/pycuda-0.93/setuptools/command/
build_ext.py", line 175, in build_extension
_build_ext.build_extension(self,ext)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/command/build_ext.py", line 460, in build_extension
ext_path = self.get_ext_fullpath(ext.name)
File "/Library/Frameworks/Python.framework/Versions/2.6/lib/
python2.6/distutils/command/build_ext.py", line 637, in get_ext_fullpath
filename = self.get_ext_filename(ext_name)
File "/Users/dbelll/src/pycuda-0.93/setuptools/command/
build_ext.py", line 85, in get_ext_filename
ext = self.ext_map[fullname]
KeyError: '_driver'
make: *** [all] Error 1
dbmacpro:
My siteconf.py is:
BOOST_INC_DIR = ['/Users/dbelll/local/include/boost-1_39']
BOOST_LIB_DIR = ['/Users/dbelll/local/lib']
BOOST_COMPILER = 'gcc43'
BOOST_PYTHON_LIBNAME = ['boost_python-xgcc40-mt']
BOOST_THREAD_LIBNAME = ['boost_thread-xgcc40-mt']
CUDA_TRACE = False
CUDA_ROOT = '/usr/local/cuda'
CUDA_ENABLE_GL = False
CUDADRV_LIB_DIR = []
CUDADRV_LIBNAME = ['cuda']
CXXFLAGS = []
LDFLAGS = []
Has anyone run into this same problem? Any suggestions on what to try
would be appreciated.
... Dwight Bell
Hey PyCUDA folks,
Is there someplace where we can share our code?
Should we make some place?
Does anyone want some place?
I'm attaching some self-organizing map code I messed with. It makes a pretty picture ;)
Hi,
Is there a particular paper or conference presentation that you'd like
cited for PyCUDA in academic papers? It's the least we can do for your
efforts!
Thanks,
Imran
Ohhh, Ian, thanks :-) I confess to being very entry level with only a few
days here and there at present. I've got a background in SMP, distributed
computation and general multi-core work (and 8 bit machine code from
way-back-when) but I've yet to read up on the CUDA architecture properly.
When I get a moment I'll plumb your code in and give it a go - 100x is more
like what I was hoping for :-)
i.
On 29 January 2010 02:55, Ian Cullinan <Ian.Cullinan(a)nicta.com.au> wrote:
> You think that's a speedup? :P
>
> You're only using of the multiprocessors in your GPU! (Because you're
> launching a 1x1 grid). Try this on for size:
>
>
> ======
>
>
>
> import pycuda.driver as drv
> import pycuda.tools
> import pycuda.autoinit
> import numpy
> import numpy.linalg as la
> from pycuda.compiler import SourceModule
>
> blocks = 64
> block_size = 128
> nbr_values = blocks * block_size
> n_iter = 100000
>
> #############
> # GPU SECTION
>
> mod = SourceModule("""
> __global__ void addone(float *dest, float *a, int n_iter)
> {
> const int i = blockDim.x*blockIdx.x + threadIdx.x;
> for(int n = 0; n < n_iter; n++) {
> a[i] = sin(a[i]);
> }
> dest[i] = a[i];
> }
> """)
>
> addone = mod.get_function("addone")
>
> a = numpy.ones(nbr_values).astype(numpy.float32)
> a += 1 # a is now an array of 2s
>
> dest = numpy.zeros_like(a)
>
> start = drv.Event()
> end = drv.Event()
> start.record()
>
> addone(drv.Out(dest), drv.In(a), numpy.int32(n_iter), grid=(blocks,1),
> block=(block_size,1,1))
>
> #stop timer
> end.record()
> end.synchronize()
> secs = start.time_till(end)*1e-3
> print "GPU time:", secs
> print "GPU result starts with...", dest[:3]
>
>
> #############
> # CPU SECTION
>
> a = numpy.ones(nbr_values).astype(numpy.float32)
> a += 1 # a is now an array of 2s
> start.record()
>
> for i in range(n_iter):
> a = numpy.sin(a)
>
> #stop timer
> end.record()
> end.synchronize()
> secs = start.time_till(end)*1e-3
> print "CPU time:", secs
> print "CPU result starts with...", a[:3]
>
>
> ======
>
>
>
> (I reduced the number of iterations so it doesn't take forever on the CPU).
> On my machine (3GHz Core 2 Duo, GTX280, Linux), I get:
>
> GPU time: 0.0843682250977
> GPU result starts with... [ 0.00547702 0.00547702 0.00547702]
> CPU time: 8.12050439453
> CPU result starts with... [ 0.00547701 0.00547701 0.00547701]
>
> So, about a 100x speedup for the GPU version. Would be more a speedup with
> more iterations (the overhead of copying the data to the GPU and back is the
> same regardless). In fact, for such a small amount of data (only 32K) you
> can probably significantly increase the size of the data without incurring
> much more copying overhead - setting up the transfer is expensive, copying a
> few KB isn't.
>
> Have a play with the params and enjoy.
>
> Next thing to get even more speed is to copy the data to shared memory
> within each block, do the computation there and then copy the result back to
> main memory when you're done. The NVIDIA docs and whitepapers should make it
> fairly clear how to achieve that :)
>
> Cheers,
> Ian Cullinan
>
> ________________________________________
> From: pycuda-bounces(a)tiker.net [pycuda-bounces(a)tiker.net] On Behalf Of Ian
> Ozsvald [ian(a)ianozsvald.com]
> Sent: Friday, 29 January 2010 1:29 AM
> To: pycuda(a)tiker.net
> Subject: [PyCUDA] Very simple speed testing code for another beginner...
>
> Here is some very simple speed testing code - maybe it is useful to another
> beginner. I've used it to convince myself that GPUs really do go faster
> than CPUs (this is useful here in the office to show my physics colleagues).
>
> The code was adapted from hello_gpu.py. It has two halves, first it does a
> simple calculation many times on the GPU and then it does the same
> calculation on the CPU. Both times it uses dev.Event to count how long the
> operations took.
>
> Roughly speaking on my WinXP Intel Core2 Duo 2.66GHz CPU (1 CPU used) the
> 9800GT GPU comes out 20-55* faster than the CPU.
>
> In the code below a value for sin is calculated 2,000,000 times in a 400
> element array. A 20-30* speedup holds for tan, sin, addition, sqrt, exp.
> The pow function shows a 55* speedup. If you want to do your own testing
> then replace the two references to 'sin' with your chosen function.
> Remember that 2,000,000 is also written twice so change it in both places
> to alter the number of iterations. Extra note - the final result for 'tan'
> diverges quickly, 'sin' and others seem to be mostly stable.
>
> By lowering the iterations from 2,000,000 to 200 (in both places) then both
> the CPU and GPU complete their tasks in roughly the same time.
>
> I did a variation where 'dest' is removed and 'float *a' is referenced by
> drv.InOut(a) (so a is the input parameter and is also used for the output
> result) - I didn't observe any obvious speed difference.
>
> Side note - I'm also using the NVIDIA System Monitor, I've selected all the
> GPU outputs along with CPU outputs so they hover as transparent displays at
> the top of the screen. Whenever the GPU is invoked you see the GPU Usage,
> Cooler and Temp change.
>
> HTHs another newbie,
> Ian.
>
> _______________________________________________
> PyCUDA mailing list
> PyCUDA(a)host304.hostmonster.com
> http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net
>
--
Ian Ozsvald (Professional Screencaster)
ian(a)ProCasts.co.uk
http://ProCasts.co.uk/examples.htmlhttp://TheScreencastingHandbook.comhttp://IanOzsvald.com + http://ShowMeDo.comhttp://twitter.com/ianozsvald
Python has many different packages, including ones for performing
statistics. Are there any that use CUDA for statistics such as OLS
regressions?
I'm attempting to calculate a 40 variable correlation matrix for each of
240 months using a sample size of 3000, i.e. 40x40x240=384,000
correlation calculations, each using 3000 data points.
Links and/or advice are much appreciated.
Here is some very simple speed testing code - maybe it is useful to another
beginner. I've used it to convince myself that GPUs really do go faster
than CPUs (this is useful here in the office to show my physics colleagues).
The code was adapted from hello_gpu.py. It has two halves, first it does a
simple calculation many times on the GPU and then it does the same
calculation on the CPU. Both times it uses dev.Event to count how long the
operations took.
Roughly speaking on my WinXP Intel Core2 Duo 2.66GHz CPU (1 CPU used) the
9800GT GPU comes out 20-55* faster than the CPU.
In the code below a value for sin is calculated 2,000,000 times in a 400
element array. A 20-30* speedup holds for tan, sin, addition, sqrt, exp.
The pow function shows a 55* speedup. If you want to do your own testing
then replace the two references to 'sin' with your chosen function.
Remember that 2,000,000 is also written twice so change it in both places to
alter the number of iterations. Extra note - the final result for 'tan'
diverges quickly, 'sin' and others seem to be mostly stable.
By lowering the iterations from 2,000,000 to 200 (in both places) then both
the CPU and GPU complete their tasks in roughly the same time.
I did a variation where 'dest' is removed and 'float *a' is referenced by
drv.InOut(a) (so a is the input parameter and is also used for the output
result) - I didn't observe any obvious speed difference.
Side note - I'm also using the NVIDIA System Monitor, I've selected all the
GPU outputs along with CPU outputs so they hover as transparent displays at
the top of the screen. Whenever the GPU is invoked you see the GPU Usage,
Cooler and Temp change.
HTHs another newbie,
Ian.
======
# based on hello_gpu.py
import pycuda.driver as drv
import pycuda.tools
import pycuda.autoinit
import numpy
import numpy.linalg as la
from pycuda.compiler import SourceModule
nbr_values = 400
#############
# GPU SECTION
mod = SourceModule("""
__global__ void addone(float *dest, float *a)
{
const int i = threadIdx.x;
for(int n = 0; n < 2000000; n++) {
a[i] = sin(a[i]);
}
dest[i] = a[i];
}
""")
addone = mod.get_function("addone")
a = numpy.ones(nbr_values).astype(numpy.float32)
a += 1 # a is now an array of 2s
dest = numpy.zeros_like(a)
start = drv.Event()
end = drv.Event()
start.record()
addone(drv.Out(dest), drv.In(a), block=(nbr_values,1,1))
#stop timer
end.record()
end.synchronize()
secs = start.time_till(end)*1e-3
print "GPU time:", secs
print "GPU result starts with...", dest[:3]
#############
# CPU SECTION
a = numpy.ones(nbr_values).astype(numpy.float32)
a += 1 # a is now an array of 2s
start.record()
for i in range(2000000):
a = numpy.sin(a)
#stop timer
end.record()
end.synchronize()
secs = start.time_till(end)*1e-3
print "CPU time:", secs
print "CPU result starts with...", a[:3]
======
--
Ian Ozsvald (Professional Screencaster)
ian(a)ProCasts.co.uk
http://ProCasts.co.uk/examples.htmlhttp://TheScreencastingHandbook.comhttp://IanOzsvald.com + http://ShowMeDo.comhttp://twitter.com/ianozsvald
After attempting to go through the process of building with mingw32 on win7,
I discovered the include file features.h is missing.
A quick search didn't turn much up other than this file seems to be missing
from mingw and breaks the build process.
There is also a newlib-mingw32 package in debain which is used to cross
compile from linux. After extracting this header and including it the
extensions compiles. When I run it, I crash.
I'm running python 2.6.4 win32, numpy 1.4.0, cuda toolkit 2.3.0 (32Bit),
trying both 32 and x64 versions of the driver.
Can anyone give me advice on this problem?
Thanks,
Robert
From: Robert Pickel [mailto:robert.pickel@gmail.com]
Sent: Wednesday, January 20, 2010 5:54 PM
To: pycuda(a)tiker.net
Subject: Windows - 7 Support - Package
Hello,
I'm new to the list. A quick search didn't show any post asking if pycuda
can run under windows 7.
Are there any pre-built windows packages out there?
Thanks,
Robert
Hi Andreas/Ying Wai, I see a discussion you've had about complex number
support:
http://www.mail-archive.com/pycuda@tiker.net/msg00788.html
I also see the 'complex' tag:
http://git.tiker.net/pycuda.git/commit/296810d8c57f7620cfcc959f73f6aefbb021…
and I've merged the code with mine.
When I try to run demo_complex.py I get an error (below) - should the demo
work without an error? Given the discussion you were both having I'm not
clear whether the complex support is finished or not?
Here's the error I get, it looks like nvcc is unhappy. If there's something
I could debug then feel free to give me some pointers.
I'm using MSVC 9 (Visual Studio 2008) on WinXP with the current 'master' (I
could try Mac/gcc if that's useful?):
C:\Panalytical\pycuda_git\pycuda\examples>python demo_complex.py
kernel.cu
...
C:/Python26/lib/site-packages/pycuda-0.94beta-py2.6-win32.egg/pycuda/../include/pycuda\pycuda-complex.hpp(299):
error: calling a __device__ function from a __host__ function is not allowed
...
C:/Python26/lib/site-packages/pycuda-0.94beta-py2.6-win32.egg/pycuda/../include/pycuda\pycuda-complex.hpp(437):
error: calling a __device__ function from a __host__ function is not allowed
2 errors detected in the compilation of
"C:\DOCUME~1\parc\LOCALS~1\Temp/tmpxft_000003f4_00000000-6_kernel.cpp1.ii".
Traceback (most recent call last):
File "demo_complex.py", line 20, in <module>
preamble="#include <pycuda-complex.hpp>",)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\elementwise.py",
line 108, in __init__
arguments, operation, name, keep, options, **kwargs)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\elementwise.py",
line 83, in get_elwise_kernel_and_types
keep, options, **kwargs)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\elementwise.py",
line 72, in get_elwise_module
options=options, keep=keep)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\compiler.py",
line 214, in __init__
arch, code, cache_dir, include_dirs)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\compiler.py",
line 193, in compile
return compile_plain(source, options, keep, nvcc, cache_dir)
File
"C:\Python26\lib\site-packages\pycuda-0.94beta-py2.6-win32.egg\pycuda\compiler.py",
line 86, in compile_plain
raise CompileError, "nvcc compilation of %s failed" % cu_file_path
pycuda.driver.CompileError: nvcc compilation of
c:\docume~1\parc\locals~1\temp\tmpbz4jgs\kernel.cu failed
--
Ian Ozsvald (Professional Screencaster)
ian(a)ProCasts.co.uk
http://ProCasts.co.uk/examples.htmlhttp://TheScreencastingHandbook.comhttp://IanOzsvald.com + http://ShowMeDo.comhttp://twitter.com/ianozsvald