Sorry for delay in response.
Dnia 2011-07-29, pią o godzinie 02:15 -0400, Andreas Kloeckner pisze:
On Mon, 21 Mar 2011 20:15:35 +0100, "=?UTF-8?B?VG9tYXN6IFJ5YmFr?="
> I attach patch updating pycuda.tools.DeviceData and
> to take new devices into consideration. I have tried to maintain "style"
> those classes
> and introduced changes only when necessary. I have done changes using my old
> and NVIDIA Occupancy Calculator. Unfortunately I currently do not have
> access to Fermi
> to test those fully.
- self.smem_granularity = 16
+ if dev.compute_capability() >= (2,0):
+ self.smem_granularity = 128
+ self.smem_granularity = 512
Way back in March, you submitted this patch, where smem_granularity is
documented as the number of threads taking part in a simultaneous smem
access. The new values just seem wrong. What am I missing, or rather,
what did you have in mind?
I have taken those values from CUDA_Occupancy_Calculator.xls,
from sheet "GPU Data", cells C11-H12.
Sorry for mess. It looks like I have misunderstood smem_granularity
meaning. I assumed (after xls file) that it was minimum size of shared
memory that can be allocated. It looks like that from analysis of
source code in OccupancyRecord (tools.py:294):
alloc_smem = _int_ceiling(shared_mem, devdata.smem_granularity)
If I understand it correctly, it computes amount of allocated shared
memory, rounding it to the nearest multiplication of smem_granularity.
With such assumptions, my patch makes sense - one can allocate shared
memory in block of 512 for 1.x devices, and blocks of 128 for 2.x
So I do not understand why there is difference between documentation
" .. attribute:: smem_granularity
The number of threads that participate in banked, simultaneous
to shared memory."
and code, which does not take threads into consideration when
dealing with smem_granularity.
In any case, I've reverted them to 16/32 in git.
Why those values (where did you get the original 16 from)?
Tomasz Rybak <bogomips(a)post.pl> GPG/PGP key ID: 2AD5 9860
Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860