bob zigon <bob.zigon(a)gmail.com> writes:
If a kernel is called from within a python loop, how
frequently is nvcc called?
If the kernel is essentially static, I would hope that nvcc is called once irregardless
the number of times the loop iterates.
On the other hand, if the kernel is templated, and the template is a function of the
counter, it seems to me that nvcc would need to be called on every iteration.
Creating a SourceModule is somewhat expensive, and it's definitely
something that you should avoid doing in the inner loop of your
application. Just hold on to the module handle.
PyCUDA tries to be smart about not recompiling when not necessary, but
even in the no-recompile case, it has to look up the kernel in its cache
on disk and check that no include files have changed. Hence my advice
If you look at the PyCUDA GPUArray, it caches *readily instantiated*
SourceModules by way of pycuda.tools.context_dependent.memoize. That
way, it only incurs the instantiation penalty for each genuinely new