On Mon, 12 Dec 2011 21:37:38 -0500, Yifei Li <yifli82(a)gmail.com> wrote:
> Hi all,
>
> It is said in CUDA 4.0 programming guide that "Blocks are organized into a
> one-dimensional, two-dimensional, or three-dimensional
> grid of thread blocks". Does PyCuda currently support 3-dimensional
> grid?
Nope, this will happen once PyCUDA switches over to stateless launches
internally, as described in my previous email. CUDA's stateful launch
interface is 2D-grid-only, and PyCUDA uses that internally for greater
backwards compatibility.
Andreas
On Wed, 14 Dec 2011 09:15:19 -0500, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> This is getting very weird. I went into the function with pdb now.
> np.dtype('uint32') is in DTYPE_TO_NAME but for some reason it fails to
> look it up:
>
> KeyError: dtype('uint32')
> > /usr/local/lib/python2.7/dist-packages/pycuda-2011.2.2-py2.7-linux-i686.egg/pycuda/compyte/dtypes.py(104)dtype_to_ctype()
> 103 print np.dtype('uint32') in DTYPE_TO_NAME
> --> 104 print DTYPE_TO_NAME[dtype]
> 105 raise ValueError, "unable to map dtype '%s'" % dtype
>
> ipdb> dtype
> dtype('uint32')
> ipdb> np.dtype('uint32') == dtype
> True
> ipdb> DTYPE_TO_NAME(np.dtype('uint32'))
> 'unsigned'
> ipdb> DTYPE_TO_NAME[dtype]
> *** KeyError: dtype('uint32')
I'm as confused as you. Can you go up the call stack and see who made
that dtype, and how?
Andreas
On Fri, 30 Dec 2011 20:03:44 +0100, Thomas Wiecki <Thomas_Wiecki(a)brown.edu> wrote:
> Hi Andreas,
>
> glad to see that you followed up on this issue. I will try to boil it
> down but noticed that when I was investigating the issue back then I
> could not easily reproduce the problem. For some reason the dict
> stored dtype was hashing to something else. Can you easily point me to
> the place where the dict is created?
Sure.
- Initial fill in pycuda.compyte.dtype._fill_dtype_registry().
- Vector types added in pycuda.gpuarray._create_vector_types().
Andreas
PS: Please make sure to keep the list cc'd.
Hi Thomas,
I asked on the numpy list regarding our dtype hashability issues, and it
seems PyCUDA's usage of dtypes is perfectly legitimate, as indicated by
Robert Kern's reply which I've forwarded below. To put us in a position
where we can file this as a bug with the numpy guys, would you be able
to boil your example down to a simple test script (that, ideally, only
uses numpy)?
Andreas
On Fri, 30 Dec 2011 11:55:51 -0500, Yifei Li <yifli82(a)gmail.com> wrote:
> Hi,
>
> I got the following error when trying to call some pycuda code in a CPU
> thread:
>
> explicit_context_dependent failed: invalid context - no currently active
> context?
>
> More details, I use pyQt for GUI, and hence QThread is used for threading.
> In Qt, all the GUI stuff live in the main thread.
>
> No such problem after I moved the pycuda code out of my QThread.
>
> Has anyone had this problem? Any help is appreciated.
PyCUDA + threads is possible, but somewhat complicated to manage correctly:
http://wiki.tiker.net/PyCuda/Examples/MultipleThreads
HTH,
Andreas
Hi,
I got the following error when trying to call some pycuda code in a CPU
thread:
explicit_context_dependent failed: invalid context - no currently active
context?
More details, I use pyQt for GUI, and hence QThread is used for threading.
In Qt, all the GUI stuff live in the main thread.
No such problem after I moved the pycuda code out of my QThread.
Has anyone had this problem? Any help is appreciated.
Yifei
On Wed, 14 Dec 2011 13:41:12 -0600, "Pazzula, Dominic J " <dominic.j.pazzula(a)citi.com> wrote:
> The subset index seems to drop (or is exclusive of) the last value. This can be confusing if you are using range() to generate the array of indices. Instead of needing the traditional range(start,end+1), you need to use range(start,end+2) as the subset_dot() and the range() routine are exclusive of the last value.
>
> Is this behavior as intended?
>
> import numpy as np
> import pycuda as cuda
> import pycuda.gpuarray
> import pycuda.autoinit
>
> n = 3
>
> a = np.array(range(0,n**2),ndmin=2)
> a = a.astype(np.float32)
> print a
> g_a = cuda.gpuarray.to_gpu(a)
> g_b = cuda.gpuarray.to_gpu(a)
>
> subset = cuda.gpuarray.to_gpu(np.array(range(2,5)))
>
> print "Subset Array", subset;
> x = cuda.gpuarray.subset_dot(subset,g_a,g_b)
> print "a[2:5] dot g[2:5]", x
>
> ------
> [[ 0. 1. 2. 3. 4. 5. 6. 7. 8.]]
> Subset Array [2 3 4]
> a[2:5] dot g[2:5] 13.0
The issue is that 'subset' ends up being an array of 64-bit integers,
but the subset_* routines tacitly assume that you're giving them 32-bit
integers. This is fixed in git, although for performance reasons I would
still recommend 32-bit ints for 'subset', if you can get away with it.
Andreas
On Wed, 14 Dec 2011 13:41:12 -0600, "Pazzula, Dominic J " <dominic.j.pazzula(a)citi.com> wrote:
> The subset index seems to drop (or is exclusive of) the last value. This can be confusing if you are using range() to generate the array of indices. Instead of needing the traditional range(start,end+1), you need to use range(start,end+2) as the subset_dot() and the range() routine are exclusive of the last value.
>
> Is this behavior as intended?
>
> import numpy as np
> import pycuda as cuda
> import pycuda.gpuarray
> import pycuda.autoinit
>
> n = 3
>
> a = np.array(range(0,n**2),ndmin=2)
> a = a.astype(np.float32)
> print a
> g_a = cuda.gpuarray.to_gpu(a)
> g_b = cuda.gpuarray.to_gpu(a)
>
> subset = cuda.gpuarray.to_gpu(np.array(range(2,5)))
>
> print "Subset Array", subset;
> x = cuda.gpuarray.subset_dot(subset,g_a,g_b)
> print "a[2:5] dot g[2:5]", x
>
> ------
> [[ 0. 1. 2. 3. 4. 5. 6. 7. 8.]]
> Subset Array [2 3 4]
> a[2:5] dot g[2:5] 13.0
For the record: this is not the intended behavior. I'm debugging this,
but I haven't got very far yet.
Andreas
On Sat, 24 Dec 2011 12:48:37 +0100, Martin Kempf <martin.kempf(a)gmail.com> wrote:
> Hi Andreas,
>
> Am 23.12.2011 12:54, schrieb Andreas Kloeckner:
> > Hi Martin,
> >
> > first of all, sorry for taking so long to reply to this. I had a busy
> > end of the (US) semester, with teaching, projects and all.
> no problem, thanks for the answer! Meanwhile I have read more on this
> topic and it helped me understanding your clarifications. But there are
> still some points I am curious about:
> > Now onward to the usefulness of both of these packages in conjunction
> > with PyCUDA. cgen can be useful if more 'textual' ways of generating
> > code don't work for your specific application. That said, I have found
> > that textual generation is sufficient in very many settings, and only
> > very few types of codegen require the flexibility that cgen offers. (but
> > those do exist!)
>
> Is the example on loopy found in this paper [1] a case where the
> flexibiliy of cgen is needed? Where can I find more information on
> loopy?
That's loopy as of two prototypes ago. I'll release the current version
of loopy as soon as I submit the article that goes with it.
> > However, I've made the experience using cgen when it
> > isn't required ends up resulting in odd-looking code that's harder to
> > maintain than necessary. I am still using cgen in my projects (loopy,
> > yet to be announced, being the most recent one)--I'm just more judicious
> > about its use.
> >
> > codepy (in its compile-link variety) can also be used with PyCUDA. Bryan
> > Catanzaro has done this in Copperhead, where he uses codepy to drive
> > nvcc to compile host-side (!) Python extension modules that execute CUDA
> > code. This removes much of PyCUDA from the picture, as now you're using
> > the CUDA run-time interface, rather than the driver interface.
>
> Is this achieved by using the CudaModule of CodePy, combined with the
> NVCCToolchain?
Correct. Bryan contributed that code.
> It is an interesting topic I came accros as it is the topic of my
> seminar [2] at the university of applied sciences in Rapperswil,
> Switzerland [3].
Great! Good luck with your seminar talk.
Andreas
On Fri, 23 Dec 2011 15:24:12 -0500, Yifei Li <yifli82(a)gmail.com> wrote:
> Hi folks,
>
> I did the following to bind a gpuarray to a 3D texture with multiple
> channels:
>
> a_gpu = gpuarray.to_gpu(np.random.randn(d, h, w, 4).astype(np.float32))
> a_gpu.bind_to_texref_ext(mytex, channels=4)
>
> And I got all zeros when the values in texture were printed
>
> However, if I change the 3D texture to a 1D texture with the same number of
> channels, the values in the 1D texture are correct.
>
> Does bind_to_texref_ext only works with 1D textures?
bind_to_texref_ext assumes you're binding a 1D array. Binding flat
memory to a texture only works for 1D and 2D. 3D is not allowed by
CUDA. A convenience function for 2D has not been created yet.
HTH,
Andreas