Alexandre Linhares <linhares(a)clubofrome.org.br> writes:
> Suppose the results from copy_if() are few, in relation to the buffer.
> Wouldn't it be advantageous (performance-wise) to create a results-sized
> buffer, copy the results there, and minimize buffer transfer?
>
> In other words, am I introducing any bugs?
>
> Source code here:
> https://stackoverflow.com/questions/28249558/pyopencl-copy-if-is-it-possibl…
>
> Thank you for your work. It is wonderful!
http://documen.tician.de/pyopencl/algorithm.html#pyopencl.algorithm.copy_if
The interface to copy_if is as it is on purpose. As we start the scan to
do the main operation, we don't yet know how large the result will be,
so we have to allocate the full length.
From here, there are two possible ways forward, but which you choose is
a tradeoff.
- You can allocate a 'right-sized' output array and copy your result to
it. This is what you're suggesting, if I understand you right. The
advantage is that you don't waste memory for entries that are unused.
The downside is that the copy takes time. This is usually what you'd
do if the result is intended to be long-lived.
Btw, you can achieve this just via out[:count].copy().
- You can also just grab a smaller view of the results array by
out[:count], knowing that there's a larger array behind that being
kept alive 'behind' it. This is a good solution if the result is
short-lived.
If you're mainly concerned about copy costs, out[:count].get() should
be yet faster than your solution, because you don't incur the cost of
allocating the temporary and copying into it.
Since copy_if cannot make that trade-off for you (for example, it can't
know how long-lived the result is intended to be), the interface is the
way it is, allowing you to achieve a cost-optimal solution for your
situation with just a tiny bit of extra work. I'd love to take a patch
to the documentation that explains this, if you have time.
Hope that helps,
Andreas
Dear Andreas et al.,
Suppose the results from copy_if() are few, in relation to the buffer.
Wouldn't it be advantageous (performance-wise) to create a results-sized
buffer, copy the results there, and minimize buffer transfer?
In other words, am I introducing any bugs?
Source code here:
https://stackoverflow.com/questions/28249558/pyopencl-copy-if-is-it-possibl…
Thank you for your work. It is wonderful!
please... Help! :)
--Alex
Neal Becker <ndbecker2(a)gmail.com> writes:
> I'm assuming if I use nvidia driver I have to use their libOpenCL?
No, you're free to use whichever one you please. libOpenCL only
dispatches between different drivers ("ICDs"). See here for more:
http://wiki.tiker.net/OpenCLHowTo
Nvidia's is perhaps the worst choice, although it is viable as long as
you either install matching headers or configure PyOpenCL with
CL_PRETEND_VERSION = "1.1"
in siteconf.py.
Better choices include AMD's libOpenCL or this open-source one:
https://forge.imag.fr/projects/ocl-icd/
Hope that helps,
Andreas