Moving the elementwise kernel instantiation to the beginning of my code
fixed the issue and also made my code a tad faster.
I have not managed to replicate the issue outside of my development
Thanks for the pointer
On Sat, Jul 27, 2013 at 6:07 PM, Andreas Kloeckner
Matthias Lee <matthias.a.lee(a)gmail.com> writes:
> I'm working on a bit of pycuda that for image processing. Today I have
> into an interesting issues, when I run a high number of iterations of my
> code, I end up getting an error executing one of my Elementwise kernels.
> is not always the same one and can happen at slightly different
> (these are extraordinarily simple kernels, nothing complicated)
> pytools.prefork.ExecError: error invoking 'nvcc --preprocess -arch sm_35
> /tmp/tmplAgMsh.cu --compiler-options -P': [Errno 12] Cannot allocate
> This error is not directly in the pycuda codebase, but happens in
> pycuda/compiler.py in the preprocess_source function when it calls out to
> see stack trace: http://pastebin.com/WQUFeTSG
> At this point in time I am neither out of memory on the Host(10GB+ free)
> nor on the Device(3GB+ free). I have the feeling this may have something
> do with stack size or open file-descriptors.
> Has anyone seen this before?
No, I haven't--sorry. If you could provide a small script that
reproduces this behavior, I'd have a better chance of figuring out
what's going on.
In the meantime, your report suggests that you're creating
ElementwiseKernel instances in a loop. That's usually a bad idea,
because it incurs a compilation cost (or, at best, a cache lookup cost)
on every trip through the loop. It's usually much better to cache or
reuse a single ElementwiseKernel instance.
IDIES/Johns Hopkins University
Performance @ Rational/IBM
(320) 496 6293
To know recursion, you must first know recursion.