On Wed, Aug 12, 2009 at 11:31 AM, M. Badawy<m.cohomology(a)gmail.com> wrote:
Well put. Thing is I'm attending a summer school
on CUDA right now and
it seems that micro managing the threads, blocks, warps,
registers...etc. is not for the faint of heart.
Think of it as a puzzle. I'll take memory bank conflict avoidance
over sudoku any day :)
I am not a programmer
and I doubt that I will ever have the time to do all this fine tuning
to achieve optimal performance.This also depends on the code, so, it
may not be that hard for a lot of tasks that lends themselves well to
What problem are you trying to solve? Maybe you're blessed with a
large problem with ridiculously fine-grained parallelism.
An interesting remark was mentioned today is that
there is a lot of
testing going on right now to automate the fine-tuning process, and it
was mentioned that a certain algorithm managed to squeeze 15~20% more
performance than the human optimized code. The optimizations done by
the algorithm would have taken a person weeks to implement. These fine
tuning features will be implemented later in CUDA. But it seems not
any time soon.
My guess is that once CUDA gets smart enough, it maybe
then easier for
the non-professional programmer to use any tool whatsoever without
worrying too much about performance.
I would not hold my breath :) I would be surprised if CUDA
programming changes qualitatively before some some completely
different architecture with a different programming model comes along.