The hardware-independent way to solve this is to call clGetKernelWorkGroupInfo() with an argument of CL_KERNEL_WORK_GROUP_SIZE to get the maximum number of work-items that you can enqueue at once.

See also CL_KERNEL_LOCAL_MEM_SIZE to find out how much local memory will be available for that kernel.

Andreas, I assume that this is also exposed through PyOpenCL somehow :)



On Thu, Apr 22, 2010 at 4:37 PM, Michael Rule <> wrote:
thank you, that solved all my problems ! ( I mean.... that one
problem, which was pervasive in my program )

On Thu, Apr 22, 2010 at 4:25 PM, Andreas Klöckner
<> wrote:
> On Donnerstag 22 April 2010, Michael Rule wrote:
>> NV Tesla T10,
>> I assume its something I am doing wrong...
> If you're using NV hardware, it helps to know the CUDA literature. You
> must supply a local_size to partition your work into NV's thread
> blocks. If you don't you're only submitting one thread block, which has
> hardware size limits.
> Andreas
> PS: Please keep replies on the list. Thanks.

PyOpenCL mailing list