[PyCUDA] weird "if branch" in all tutorial example

ericyosho ericyosho at gmail.com
Tue Sep 27 08:08:45 PDT 2011


I'm not sure if it is the right place, but since it is so elementary,
I just appreciate some explanation.
So in every CUDA tutorial example, e.g., to double each element in an
array, in kernel function, we have the following lines:

int idx = // calculate a unique value for each thread
if (idx < N) // N is the number of elements of an array
    a[idx] *= 2;

"if branch" is a rather expensive operation, why do we want each
thread to go for this check?
Since on each device, only one kernel function is allowed to evaluate
at a time, why don't we let each thread double its own associated
value, and afterwards we simply copy N elements back to the host.
Basically, we just omit the "if" check, and go for the "double values"
line unconditionally.

It seems this approach is more straightforward.
Do I miss anything?

Best,
Zhe Yao
--------------
Department of Electrical and Computer Engineering
McGill University
Montreal, QC, Canada
H3A 2A7

zhe.yao at mail.mcgill.ca



More information about the PyCUDA mailing list