Hello all,
I have an old macbook that has a discrete GeForce on it, and have run into
the following problem. The simplified example is here:
https://gist.github.com/fjarri/9aff0474868e2faf438f7e8229d194ec
Basically, what I'm trying to do:
- create a two-device context
- create a buffer
- split it into two subregions to use on each device
- run a kernel on each device in parallel working with the corresponding
subregion
- get the result back on the host
(the expected result is [0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7])
First, it turned out that if the context includes an nVidia card, the
Buffer must be necessarily created with the cl.mem_flags.ALLOC_HOST_PTR
flag, otherwise if one uses its subregion in a kernel, the program crashes.
If the context is created on a CPU + Iris Pro (the other two devices
available), everything works fine without this flag, giving the expected
result.
After fixing that, the program finishes without crashing when run on a CPU
+ GeForce or Iris Pro + GeForce context, but the result is [0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0] - that is, the second kernel (on the GeForce device)
either did not run, or its changes to the subregion were not incorporated
into the whole buffer. Uncommenting the explicit migration in the end does
not help either. Does anyone know what I'm missing here? Or is it an
nVidia/Apple bug?