The findings below are considering I already have a 20 million * 57 bits int array in the GPu.
> On Jun 6, 2018, at 3:05 AM, aseem hegshetye <aseem.hegshetye(a)gmail.com> wrote:
>
> Hi,
> I did some testing with number of threads. I changed number of threads and recorded the time in seconds it took for the pyopencl kernel to execute.
> Following are the results:
> No_of_threads --- Time in seconds
> 10,000 -- 202
> 20,000 -- 170
> 24,000 -- 209
> 30,000 -- 224
> 30714 -- 659
> Thanks
> Aseem
>
>> On Wed, Jun 6, 2018 at 1:54 AM, Sven Warris <sven(a)warris.nl> wrote:
>> Hi Aseem,
>>
>> This maybe caused by memory access collisions and/or lack of coalesced memory access. This technical report gives some pointers:
>> https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
>> Do you use atomic operations? Or maybe you have too many thread fences?
>> I have no problem starting many threads: the number of threads alone is not the issues.
>>
>> Cheers,
>> Sven
>>
>>
>> Op 6-6-2018 om 8:37 schreef aseem hegshetye:
>>> Hi,
>>> Does GPU speed exponentially drop as number of threads increase beyond a certain number?. I used to allocate number of threads= number of transactions in data under consideration.
>>> For Tesla K80 I see exponential drop in speed above 30290 Threads.
>>> If true, is it a best practice to keep number of threads low and iterate over the data to get results at optimum speed.
>>> How to find best number of threads for a GPU?
>>>
>>> Thanks
>>> Aseem
>>>
>>>
>>> _______________________________________________
>>> PyOpenCL mailing list
>>> PyOpenCL(a)tiker.net
>>> https://lists.tiker.net/listinfo/pyopencl
>>
>>
>> _______________________________________________
>> PyOpenCL mailing list
>> PyOpenCL(a)tiker.net
>> https://lists.tiker.net/listinfo/pyopencl
>>
>
Hi Aseem,
This maybe caused by memory access collisions and/or lack of coalesced
memory access. This technical report gives some pointers:
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
Do you use atomic operations? Or maybe you have too many thread fences?
I have no problem starting many threads: the number of threads alone is
not the issues.
Cheers,
Sven
Op 6-6-2018 om 8:37 schreef aseem hegshetye:
> Hi,
> Does GPU speed exponentially drop as number of threads increase beyond
> a certain number?. I used to allocate number of threads= number of
> transactions in data under consideration.
> For Tesla K80 I see exponential drop in speed above 30290 Threads.
> If true, is it a best practice to keep number of threads low and
> iterate over the data to get results at optimum speed.
> How to find best number of threads for a GPU?
>
> Thanks
> Aseem
>
>
> _______________________________________________
> PyOpenCL mailing list
> PyOpenCL(a)tiker.net
> https://lists.tiker.net/listinfo/pyopencl
Hi,
Does GPU speed exponentially drop as number of threads increase beyond a
certain number?. I used to allocate number of threads= number of
transactions in data under consideration.
For Tesla K80 I see exponential drop in speed above 30290 Threads.
If true, is it a best practice to keep number of threads low and iterate
over the data to get results at optimum speed.
How to find best number of threads for a GPU?
Thanks
Aseem
Hi ,
My pyopencl kernel just stops responding for specific data.
Code runs smoothly if data has total 30296 transactions. But becomes non responsive when data has 30297 trans.
If I take that one extra trans and run the code separately on that trans , it works fine.
Code can run on more than 40k trans of another data smoothly.
So my conclusions are : data is not the problem, code is not the problem , size of data is not the problem.
Can someone please advise what could the problem be or how to find out the problem.
Thanks
Aseem