[PyOpenCL] Large buffers slow kernel calls?