Alistair McDougall <alistair.mcdougall(a)pg.canterbury.ac.nz> writes:
I'm have previously been using PyCUDA on a
Tesla C2075 as part my of
astrophysics research. We recently installed a Tesla K40c and I was hoping
to just run the same code on the new card, however I am receiving "pycuda
._driver.LaunchError: cuLaunchKernel failed: launch out of resources"
errors.
A quick google search for "PyCUDA Tesla K40c" returned a minimal set of
results, which led me to wonder has anyone tried running PyCUDA on this
card?
As the PyCUDA code works fine on an older card, I am unsure why a newer
card would return an "out of resources" error, when it should have more
available resources. Is there an obvious solution I have overlooked as to
why the same code will not run on the new Tesla card?
Things like number of registers and available shared memory per SM do
vary between cards, and it's quite possible that your old code uses too
much of a given resource for a given card. Shrinking the block size
tends to let the code run, at the expense of some performance.
HTH,
Andreas