The hardware-independent way to solve this is to
call clGetKernelWorkGroupInfo() with an argument
of CL_KERNEL_WORK_GROUP_SIZE to get the maximum number of work-items that
you can enqueue at once.
See also CL_KERNEL_LOCAL_MEM_SIZE to find out how much local memory will be
available for that kernel.
Andreas, I assume that this is also exposed through PyOpenCL somehow :)
On Thu, Apr 22, 2010 at 4:37 PM, Michael Rule <mrule7404(a)gmail.com> wrote:
thank you, that solved all my problems ! ( I mean....
problem, which was pervasive in my program )
On Thu, Apr 22, 2010 at 4:25 PM, Andreas Klöckner
On Donnerstag 22 April 2010, Michael Rule wrote:
NV Tesla T10,
I assume its something I am doing wrong...
If you're using NV hardware, it helps to know the CUDA literature. You
must supply a local_size to partition your work into NV's thread
blocks. If you don't you're only submitting one thread block, which has
hardware size limits.
PS: Please keep replies on the list. Thanks.
PyOpenCL mailing list