[PyOpenCL] Fixed: curious slowness of PyOpenCL matrix-multiply example