Moving GitHub issue #85 (https://github.com/inducer/loopy/issues/85
) to mailing list:
> @maedoc wrote:
> I'd like to start working on a target for C augmented with Openmp or OpenACC
> pragmas. This is more a question than an issue: would it be reasonable to
> start by overriding behavior of CASTBuilder.emit_sewuential_loop based on
> iname tags? Or is there a better place to start?
FWIW, while I'm not opposed to the idea of having
such targets in loopy, they
don't expose vectorization as explicitly as ISPC or CL or CUDA, and so
they're less of a natural fit. Personally, I think using ISPC results in more
control over program performance.
I would surely like to use ISPC on any AVX CPU. Intel's CL driver is really
great too but not supported for KNL. Anyway, for ISPC on a ppc64le system,
it needs to be compiled from source and even then might not support POWER8
vector instructions. Someone on their mailing list reported getting it to
work on POWER8 through the general vector header approach but never shared
What I have in mind at least for OpenMP is pretty simple actually: adding a
`#pragma omp parallel for` pragma around the outermost g.N tag, and a
`#pragma omp simd` around the l.N tag.
OpenACC OTOH I am unfamiliar with but will spend some time week after next in
a GPU hackathon in Juelich, so I figured it was worth considering as well. But
if it turns out to be easier to target CUDA directly, so much the better.
I think the right tag to use for OpenMP loops would be
The current target structure isn't ideally set up for that--having an
ostensibly parallel loop go through emit_sequential_loop is wrong at least by
name. So I wouldn't be opposed to some restructuring of the target interface
to avoid doing things that are nonsensical in name.
Sounds like I should be looking at how ISPC or other targets with explicit ILP
I won't be able to contribute a PR for a few weeks, so if you prefer to close the
issue, that's fine with me too.