The hardware-independent way to solve this is to call clGetKernelWorkGroupInfo() with an argument of CL_KERNEL_WORK_GROUP_SIZE to get the maximum number of work-items that you can enqueue at once.

See also CL_KERNEL_LOCAL_MEM_SIZE to find out how much local memory will be available for that kernel.

Andreas, I assume that this is also exposed through PyOpenCL somehow :)

Cheers,

David


On Thu, Apr 22, 2010 at 4:37 PM, Michael Rule <mrule7404@gmail.com> wrote:
thank you, that solved all my problems ! ( I mean.... that one
problem, which was pervasive in my program )

On Thu, Apr 22, 2010 at 4:25 PM, Andreas Klöckner
<lists@informa.tiker.net> wrote:
> On Donnerstag 22 April 2010, Michael Rule wrote:
>> NV Tesla T10,
>> I assume its something I am doing wrong...
>
> If you're using NV hardware, it helps to know the CUDA literature. You
> must supply a local_size to partition your work into NV's thread
> blocks. If you don't you're only submitting one thread block, which has
> hardware size limits.
>
> Andreas
>
> PS: Please keep replies on the list. Thanks.
>

_______________________________________________
PyOpenCL mailing list
PyOpenCL@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pyopencl_tiker.net