[PyOpenCL] why do calls for large number of threads fail ?