On Thu, 19 Jan 2012 16:33:34 +0100, Peter17 <peter017(a)gmail.com> wrote:
I am encountering a problem when trying to get the value of the
at some nodes every time step.
The code I am using is similar to the one in examples/maxwell/inhom-cavity.py:
point_getter = discr.get_point_evaluator(numpy.array(<coordinates>))
val = point_getter(fields)
except that I am using several point_getters, in order to plot the
curve (field value vs. time) for each point.
My problem is that:
* the execution is becoming slower and slower as I increase the number
of points in CUDA:
* the execution is slower in GPU (1 Tesla C1060) than in MPI (4 cores
of Xeon X5650):
** No getter: ~ 109 s. for 400 steps
** 1 getter: ~ 117 s. for 400 steps
** 10 getters: ~ 117 s. for 400 steps
** 20 getters: ~ 117 s. for 400 steps
* 1 GPU
** No getter: ~ 66 s. for 400 steps
** 1 getter: ~ 67 s. for 400 steps
** 10 getters: ~ 156 s. for 400 steps
** 20 getters: ~ 281 s. for 400 steps
I made some tests and this issue seems related to
def __call__(self, field):
result[i] = numpy.dot(self.interp_coeff, field[i][self.el_range])
The numpy.dot() product seems to consume much of the time (when I
remove this instruction, I get a constant time of ~ 66 s. in CUDA).
The difference of speed between MPI and GPU might be due to the fact
that MPI-4 will divide the mesh in 4 smaller parts. Alternatively, it
could be related to the difference of data type, and so to this issue:
, but I'm not totally sure about how all this works...
Is there a simpler way to get the value of a point? I am only using
nodes of the mesh, so Hedge should already know the value without
Call discr.convert_volume(kind="numpy") once and then give that vector
to your point evaluators. That should save you large amounts of time,
because it won't incur a GPU transfer for every evaluator. It's possible
to be even cleverer and save more time, but this might be good enough.