[PyCUDA] index multiple blocks and grids
mikethesoilsguy at yahoo.com
Thu Mar 24 15:30:20 PDT 2011
Thanks for the explanation...that definitely helps. But how does the indexing
work for a 2d case?
z1 = numpy.zeros((1024)).astype(numpy.float32)
int idy =??
From: Lev Givon <lev at columbia.edu>
To: Mike Tischler <mikethesoilsguy at yahoo.com>
Cc: pycuda at tiker.net
Sent: Thu, March 24, 2011 6:13:54 PM
Subject: Re: [PyCUDA] index multiple blocks and grids
Received from Mike Tischler on Thu, Mar 24, 2011 at 03:41:30PM EDT:
> I'm new to CUDA and PyCUDA, and have having a problem indexing multiple grids.
> I'm using an older CUDA enabled card (Quadro FX 1700) before I begin writing
> a larger GPU. I've been trying to understand the relationship between threads,
> blocks, and grids in the context of my individual card. To do so, I've set up
> simple script.
> However, what if I have an array that's 1024 in length? If I understand the
> documentation correctly, block=(16,16,1) is the max value (256 threads) allowed
> for my hardware, which means I have to increase the number of grids. If I
> change the parameters of my script to:
> z1 = numpy.zeros((1024)).astype(numpy.float32)
> How do I correctly index the array locations in my kernel function given
> multiple grids (z1[???]=???) ? There is a gridDim property, but not gridIdx
> property, like with threads and blocks.
threadIdx identifies the thread in a single block. To access a 1D
array of 1024 elements assuming a maximum of 256 threads per block,
you can combine the values in threadIdx and blockIdx, e.g.,
int idx = blockIdx.x*blockDim.x + threadIdx.x;
and launch the kernel with a thread block with dimensions (256, 1, 1) and a
grid with dimensions (4, 1). See Chapter 2 of the CUDA Programming
Guide for more info.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the PyCUDA