As a general note: Once you sort out the resources issue, it is *very* important to retune
your block and grid sizes after switching from compute capability 2.0 (Tesla C2075) to
compute capability 3.x (Tesla K40c). When first switched my code to the new architecture,
I saw almost no improvement or actual regressions in performance. It wasn't until I
re-benchmarked different grid configurations that I discovered the problem.
In fact, I now sometimes include an auto-tuning stage in my CUDA programs to dynamically
select from a range of reasonable block sizes based on runtime benchmarks of my important
On Apr 2, 2014, at 1:46 AM, Jerome Kieffer <Jerome.Kieffer(a)esrf.fr> wrote:
On Wed, 2 Apr 2014 17:41:59 +1300
Alistair McDougall <alistair.mcdougall(a)pg.canterbury.ac.nz> wrote:
I'm have previously been using PyCUDA on a Tesla C2075 as part my of
astrophysics research. We recently installed a Tesla K40c and I was hoping
to just run the same code on the new card, however I am receiving "pycuda
._driver.LaunchError: cuLaunchKernel failed: launch out of resources"
A quick google search for "PyCUDA Tesla K40c" returned a minimal set of
results, which led me to wonder has anyone tried running PyCUDA on this
I ran into similar bugs with our K20 and I was
scratching my head for a while when people from Nvidia told me that the
driver 319 from nvidia had problems with the GK110 based Tesla cards.
Driver 331 runs without glitches for a while now.
Hope this helps.
tel +33 476 882 445
PyCUDA mailing list