I was wondering something about OpenCL's execution model.
Here's a quote from AMD's documentation , page 132:
Execution of kernel dispatches can overlap if there are no
dependencies between them and if there are resources available in
the GPU. This is critical when writing benchmarks it is important
that the measurements are accurate and that “false dependencies” do
not cause unnecessary slowdowns. An example of false dependency is:
a. Application creates a kernel “foo”.
b. Application creates input and output buffers.
c. Application binds input and output buffers to kernel “foo”.
d. Application repeatedly dispatches “foo” with the same parameters.
If the output data is the same each time, then this is a false dependency because
there is no reason to stall concurrent execution of dispatches. To avoid stalls,
use multiple output buffers. The number of buffers required to get peak
performance depends on the kernel.
Now, I thought OpenCL would only look at events passed to wait_for when
determining what kernel is allowed to run concurrently with what other
kernel. This sounds like some dependency information is also conveyed by
what mem objects are used, especially that two kernels aren't allowed to
write to the same one at the same time.
Is that AMD-specific, or is that part of the spec?
I'd be grateful for any clues.