This maybe caused by memory access collisions and/or lack of coalesced memory access. This technical report gives some pointers:
Do you use atomic operations? Or maybe you have too many thread fences?
I have no problem starting many threads: the number of threads alone is not the issues.
Op 6-6-2018 om 8:37 schreef aseem hegshetye:
Hi,Does GPU speed exponentially drop as number of threads increase beyond a certain number?. I used to allocate number of threads= number of transactions in data under consideration.For Tesla K80 I see exponential drop in speed above 30290 Threads.If true, is it a best practice to keep number of threads low and iterate over the data to get results at optimum speed.How to find best number of threads for a GPU?
_________________ PyOpenCL mailing list PyOpenCL@tiker.net https://lists.tiker.net/ listinfo/pyopencl
PyOpenCL mailing list