I'm working through an issue regarding assignment of temporaries inside
vectorized loops in loo.py, and I thought I'd ask about what direction we
want the implementation to go.
The issue is easiest to see with this kernel:
Basically, what we're doing here is reading a flag from a global temporary
('mask') and using it in an if-statement to see whether we should update
the 'a' array.
The issue (without my current patch) is that
*will always convert 'temp' to a vector dtype. However, a) this is
unnecessary as temp doesn't actually depend on the vector-iname ('j'), and
b) use of vector dtypes in if statements doesn't appear to be allowed by
OpenCL, see my gist log of the unpatched test here:
which fails with:
1:17:7: error: statement requires expression of scalar type ('long4'
(vector of 4 'long' values) invalid)
My solution was to add a simple parsing extension to temporary variable
assignment, i.e., a temporary variable created with:
<type*:s*> variable = whatever
will be forced to a scalar dtype regardless of its status in a vectorized
This is reasonably easy for me to specify in code, and fixes at least some
of the problems.
However, a few more issues remain:
1. This is a bit hackish -- ideally we could apply the heuristic that if
the instruction does not directly depend on the vector-index it would not
be converted into a vector dtype. However, this would fail for something
<> test = 1
if indeed we wanted 'test' to be a vector variable. I could extend my
parsing solution such that the user could apply a *:v* to the temporary
variable initialization to force 'temp' to a vector dtype.
2. The more serious issue is that if-statements depending on a vector dtype
are simply not implemented in OpenCL. Ideally we should try and convert
these to a select
if possible, and throw an exception if the enclosed instruction is not a
simple assignment. This opens up further questions -- do we allow multiple
statements inside such an if statement?
So to recap:
- Does this seem like a workable patch to force temporaries to scalar /
- Should I code up the mentioned heuristic (i.e,, only vectorize
temporaries if they directly depend on the vector-iname)?
- Should we throw an exception for a vector-dependent if statement that
contains instructions that are not simple assignments? What about multiple