Hi all,
I'm observing the following behavior with latest (git-fetched today)
pycuda and opencl versions on Snow Leopard 10.6.4:
$ python
>>> import pycuda.driver
>>> import pyopencl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/pyopencl-0.92beta-py2.6-macosx-10.6-i386.egg/pyopencl/__init__.py",
line 3, in <module>
import pyopencl._cl as _cl
AttributeError: 'NoneType' object has no attribute '__dict__'
$ python
>>> import pyopencl
>>> import pycuda.driver
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.6/site-packages/pycuda-0.94rc-py2.6-macosx-10.6-i386.egg/pycuda/driver.py",
line 1, in <module>
from _driver import *
AttributeError: 'NoneType' object has no attribute '__dict__'
This worked with stable versions. Does anyone know why is this happening?
(One may ask why do I need both libraries in the same program. I have
the set of tests for my module, which can use both Cuda and OpenCL,
and it is convenient to run all the tests using the single file.
Although it is not a critical issue, I'm just curious).
Best regards,
Bogdan
Hi,
I want to run element-wise computations on different parts of an
array. Loading each part of the array to device mem when needed turned
out to use up a lot of time and not really speed things up compared to
cpu. Instead, I want to once load the data array into device mem and
provide pointers to which elements to look at (I do have the numpy
view/slice of the array). I looked into different ways of doing this
but can't seem to find the right approach, any help would be
appreciated.
ElementwiseKernel seems to support range and slicing now, however, my
code is (cuda) c and I import it as a SourceModule which probably
means I can't use the ElementwiseKernel approach.
-Thomas
hi,
ive been endlessly trying to install pycuda on a red hat dist. machine, but
to no avail. it would be much appreciated if i could get some help.
i am able to get past the configure part of the installation, but the when i
"make" , the problem occurs. here is my siteconf.py file
BOOST_INC_DIR = ['/usr/local/include/boost/']
BOOST_LIB_DIR = ['/usr/lib']
BOOST_COMPILER = 'gcc4.1.2'
BOOST_PYTHON_LIBNAME = ['boost_python']
BOOST_THREAD_LIBNAME = ['boost_thread']
CUDA_TRACE = False
CUDA_ROOT = '/usr/local/cuda/'
CUDA_ENABLE_GL = False
CUDADRV_LIB_DIR = ['/usr/lib']
CUDADRV_LIBNAME = ['cuda']
CXXFLAGS = ['-DBOOST_PYTHON_NO_PY_SIGNATURES']
LDFLAGS = []
i beleive i built boost with gcc version 4.1.2
the error im getting is.....
/usr/local/include/boost/type_traits/remove_const.hpp:61: instantiated
from ‘boost::remove_const<<unnamed>::pooled_host_allocation>’
/usr/local/include/boost/python/object/pointer_holder.hpp:127:
instantiated from ‘void* boost::python::objects::pointer_holder<Pointer,
Value>::holds(boost::python::type_info, bool) [with Pointer =
std::auto_ptr<<unnamed>::pooled_host_allocation>, Value =
<unnamed>::pooled_host_allocation]’
src/wrapper/mempool.cpp:278: instantiated from here
/usr/local/include/boost/type_traits/detail/cv_traits_impl.hpp:38: internal
compiler error: in make_rtl_for_nonlocal_decl, at cp/decl.c:5067
i only included the ends. if you want the entire thing let me know. but the
error seems to point to a gcc problem. ive read thru
your archives but doesnt seem to solve this problem
if someone could shed some light on this issue, i would very appreciate it.
thanks
-nhieu
Hi Thomsz,Andreas,
I believe there are some errors in the implementation. Im
basing my comments only on the exclusive version.
The final call to finish adds the "each" of the partial sums to
every element of the result. That is to say that if my array size was
1024x1024 and each thread block worked on 1024 elements. My partial
sum array would be as large as 1024 and the last(or second to last)
block would have to iterate 1024 sums to produce the result.
Isn't this wrong? shouldn't the partial sums be prefix scanned
and then each block adds the associated partial sum o/p to each of its
elements. That way the loop for (int i = 1; i <= blockIdx.x; i++) is
not needed.
Regards
Nithin.
PS: Please feel free to ignore if this has already been observed. Do
let me know though..:)
Hi,
is there any way to interpret a gpuarray's dtype as a vector? I
understand that it tied to numpy.dtype. But I also saw code in
pycuda/tools.py that interprets pycuda::complex<float> .
Regards
Nithin
Hi
consider this code fragment
shape = np.array(gpuarr.shape,dtype=np.int16)
another_gpuarr = gpuarray.zeros(shape, np.uint8)
this code faults at
pycuda/gpuarray.py, line 81
where a call is made to
self.gpudata = self.allocator(self.size * self.dtype.itemsize)
this is because now type(self.size * self.dtype.itemsize) =
np.int64 and not int. This messes up the call to Boost.python
allocate.
I'm sure there are many places where this kind of error happens.
Regards
Nithin
Hello - I'm trying to run the SparseSolve.py example. I installed PyMetis
package after fixing the configuration like here :
./configure --python-exe=python2.6 --boost-inc-dir=/usr/include/boost
--boost-lib-dir=/usr/lib/ --boost-python-libname=boost_python-mt-py26
But when running the SparseSolve.py example I encountred this error :
ImportError:
/usr/local/lib/python2.6/dist-packages/PyMetis-0.91-py2.6-linux-x86_64.egg/pymetis/_internal.so:
undefined symbol: regerrorA
What does this error means? thanks for any suggestions.
--
View this message in context: http://pycuda.2962900.n2.nabble.com/PyCUDA-SparseSolve-py-example-tp5969743…
Sent from the PyCuda mailing list archive at Nabble.com.
Hi, I'm trying to include cuPrintf.cu (
http://code.google.com/p/stanford-cs193g-sp2010/wiki/TutorialHelloWorld) in
my pycuda kernel code but it said files not found
where should I put cuPrintf.cu so pycuda kernel could use it?? I was put
cuPrintf.cu in the same folder with python code but it doesn't work
1'st, thanks for developing pyCUDA. Just started playing with it last
week and have already code that outperforms the numpy version 10-100
fold. However, some things are still unclear to me so I will mix
explaining how I understand things and ask questions. Please correct
me if my understanding is faulty.
1: gpuarray: I only use gpuarray to send data to the device. Even if
I use my own kernels or scikit.cuda on the data. However, as the
following example demonstrates, you have to make copies of the numpy
array before sending it to the gpu to ensure consistent indexing
(c-type storage without any strange strides) for multi-dimensional
arrays.
import pycuda.autoinit
import pycuda.gpuarray as gpuarray
import numpy as N
a=N.zeros((2,2))
a[0,1] = 1
a[1,0] = 2
print "\na=\n",a
print "\ngpu a=\n",gpuarray.to_gpu(a).get()
aT=a.T
print "\na^T=\n",aT
print aT.__array_interface__
print "\ngpu a^T=\n",gpuarray.to_gpu(aT).get()
print aT.copy().__array_interface__
print "\ngpu a^T.copy()=\n",gpuarray.to_gpu(aT.copy()).get()
Note that the gpuarray.to_gpu(aT) is not transposed as it should be.
However, making the copy cures this.
2: Async to device: I read that you need page-locked memory on the
host for the async copies to work.
Does pycuda.gpuarray.to_gpu_async(x) lock the memory of the numpy
array x or copy the data to a locked memory area?
3: gpuarray.get_async(): Is control returned to python before the
transfer is completed (as async would indicate)? How do I check when
the transfer is complete? Page-locked memory? Do I have to create
streams and events to make async copies work?
4: Streams: My understanding is that each stream is executed serially
while different streams are running in parallel. Except stream "0"
which waits for all other streams to finish before starting. Any
simple example?
Time for lunch ... I'll come back with more questions.
-Magnus
--
-----------------------------------------------
Magnus Paulsson
Assistant Professor
School of Computer Science, Physics and Mathematics
Linnaeus University
Phone: +46-480-446308
Mobile: +46-70-6942987
Hi Hello everybody, I am a newcomer .
I run a example from http://documen.tician.de/pycuda/, but I got the
flowing Error information.
>pythonw -u "test1.py"
Traceback (most recent call last):
File "test1.py", line 12, in <module>
""")
File "C:\Python26\lib\site-packages\pycuda\compiler.py", line 235, in
__init__
arch, code, cache_dir, include_dirs)
File "C:\Python26\lib\site-packages\pycuda\compiler.py", line 201, in
compile
"pycuda-compiler-cache-v1-%s" % _get_per_user_string())
File "C:\Python26\lib\site-packages\pycuda\compiler.py", line 139, in
_get_per_user_string
checksum.update(environ["HOME"])
File "C:\Python26\lib\os.py", line 423, in __getitem__
return self.data[key.upper()]
KeyError: 'HOME'
How to solve it, help me
I use WindowsXP , Python2.6.6