On Tue, Mar 13, 2018 at 9:48 AM, <torwartwechsel(a)yahoo.de> wrote:
> Hello,
>
>
> I do not have experience in GPU(CUDA) computation.
>
> I googled a lot, but I did not find how to calculate the covariance on the
> GPU using pycuda or skcuda.
>
> All I found is the following code snippet
>
> https://github.com/OrangeOwlSolutions/cuBLAS/blob/master/Covariance.cu
>
> Is there a simpler way?
>
>
> Kind regards,
> Till
The arrayfire library provides GPU support and a covariance function
that is accessible via Python bindings. I did encounter issues when
trying to use the bindings recently, however, so YMMV.
https://github.com/arrayfire/arrayfirehttps://github.com/arrayfire/arrayfire-python
--
L
Hello,
I do not have experience in GPU(CUDA) computation.
I googled a lot, but I did not find how to calculate the covariance on the
GPU using pycuda or skcuda.
All I found is the following code snippet
https://github.com/OrangeOwlSolutions/cuBLAS/blob/master/Covariance.cu
Is there a simpler way?
Kind regards,
Till
Emanuel Rietveld <e.j.rietveld(a)gmail.com> writes:
> If I understand correctly, the current PyCUDA multithreading examples
> assume you create a separate context for each thread.
>
> If I want to use CUDA 4.0+'s one-context-per-process model instead,
> how would I do that in PyCUDA?
>
> I think you'd call cudaSetDevice instead of cuCtxCreate? Does the
> equivalent exist in PyCUDA? If it does not, can I add it?
Yes, in fact that would be very welcome. PyCUDA has some complicated and
brittle logic in place to manage CUDA's context stacks that I've been
meaning to rip out. Here's an example:
https://github.com/inducer/pycuda/blob/master/src/cpp/cuda.hpp#L525
Patches that get rid of all that code and simplify it would be very
welcome.
Andreas
If I understand correctly, the current PyCUDA multithreading examples
assume you create a separate context for each thread.
If I want to use CUDA 4.0+'s one-context-per-process model instead,
how would I do that in PyCUDA?
I think you'd call cudaSetDevice instead of cuCtxCreate? Does the
equivalent exist in PyCUDA? If it does not, can I add it?
I need to share memory between threads.
Cheers,
Emanuel