Re: [Hedge] PointEvaluator is slow in CUDA
by Andreas Kloeckner

Dear Peter,
On Thu, 19 Jan 2012 16:33:34 +0100, Peter17 <peter017(a)gmail.com> wrote:
> I am encountering a problem when trying to get the value of the fields
> at some nodes every time step.
>
> The code I am using is similar to the one in examples/maxwell/inhom-cavity.py:
>
> point_getter = discr.get_point_evaluator(numpy.array(<coordinates>))
> ...
> val = point_getter(fields)
>
> except that I am using several point_getters, in order to plot the
> curve (field value vs. time) for each point.
>
> My problem is that:
> * the execution is becoming slower and slower as I increase the number
> of points in CUDA:
> * the execution is slower in GPU (1 Tesla C1060) than in MPI (4 cores
> of Xeon X5650):
>
> * MPI-4
> ** No getter: ~ 109 s. for 400 steps
> ** 1 getter: ~ 117 s. for 400 steps
> ** 10 getters: ~ 117 s. for 400 steps
> ** 20 getters: ~ 117 s. for 400 steps
> * 1 GPU
> ** No getter: ~ 66 s. for 400 steps
> ** 1 getter: ~ 67 s. for 400 steps
> ** 10 getters: ~ 156 s. for 400 steps
> ** 20 getters: ~ 281 s. for 400 steps
>
> I made some tests and this issue seems related to
> hedge/discretization/__init__.py:54+:
> class _PointEvaluator(object):
> ...
> def __call__(self, field):
> ...
> result[i] = numpy.dot(self.interp_coeff, field[i][self.el_range])
>
> The numpy.dot() product seems to consume much of the time (when I
> remove this instruction, I get a constant time of ~ 66 s. in CUDA).
>
> The difference of speed between MPI and GPU might be due to the fact
> that MPI-4 will divide the mesh in 4 smaller parts. Alternatively, it
> could be related to the difference of data type, and so to this issue:
> [1], but I'm not totally sure about how all this works...
>
> Is there a simpler way to get the value of a point? I am only using
> nodes of the mesh, so Hedge should already know the value without
> interpolating anything...
Call discr.convert_volume(kind="numpy") once and then give that vector
to your point evaluators. That should save you large amounts of time,
because it won't incur a GPU transfer for every evaluator. It's possible
to be even cleverer and save more time, but this might be good enough.
HTH,
Andreas
8 years

PointEvaluator is slow in CUDA
by Peter17

Dear Andreas,
I am encountering a problem when trying to get the value of the fields
at some nodes every time step.
The code I am using is similar to the one in examples/maxwell/inhom-cavity.py:
point_getter = discr.get_point_evaluator(numpy.array(<coordinates>))
...
val = point_getter(fields)
except that I am using several point_getters, in order to plot the
curve (field value vs. time) for each point.
My problem is that:
* the execution is becoming slower and slower as I increase the number
of points in CUDA:
* the execution is slower in GPU (1 Tesla C1060) than in MPI (4 cores
of Xeon X5650):
* MPI-4
** No getter: ~ 109 s. for 400 steps
** 1 getter: ~ 117 s. for 400 steps
** 10 getters: ~ 117 s. for 400 steps
** 20 getters: ~ 117 s. for 400 steps
* 1 GPU
** No getter: ~ 66 s. for 400 steps
** 1 getter: ~ 67 s. for 400 steps
** 10 getters: ~ 156 s. for 400 steps
** 20 getters: ~ 281 s. for 400 steps
I made some tests and this issue seems related to
hedge/discretization/__init__.py:54+:
class _PointEvaluator(object):
...
def __call__(self, field):
...
result[i] = numpy.dot(self.interp_coeff, field[i][self.el_range])
The numpy.dot() product seems to consume much of the time (when I
remove this instruction, I get a constant time of ~ 66 s. in CUDA).
The difference of speed between MPI and GPU might be due to the fact
that MPI-4 will divide the mesh in 4 smaller parts. Alternatively, it
could be related to the difference of data type, and so to this issue:
[1], but I'm not totally sure about how all this works...
Is there a simpler way to get the value of a point? I am only using
nodes of the mesh, so Hedge should already know the value without
interpolating anything...
Thanks in advance
Best regards
[1] http://lists.tiker.net/pipermail/pycuda/2011-November/003471.html
--
Peter Potrowl
8 years

Re: [Hedge] Memory leak in CUDA backend
by Andreas Kloeckner

On Tue, 10 Jan 2012 09:59:09 +0100, Peter17 <peter017(a)gmail.com> wrote:
> Dear Andreas,
>
> Did you have time to look at my new fix?
Yep, just merged it. Once again, sorry for the delay. Thanks for the
patch.
> Do you have any update about the new loopy-based backend for Hedge?
No updates other than it's still going to happen. Hopefully sooner
rather than later, but only parts of my time are under my own control.
Andreas
8 years

Re: [Hedge] Memory leak in CUDA backend
by Peter17

Dear Andreas,
Did you have time to look at my new fix?
Do you have any update about the new loopy-based backend for Hedge?
Thanks in advance
Best regards
--
Peter Potrowl
8 years