sorry for the delay, I was working on many other thing at the same
time and I was sick. As the conference is the 7-9 mai, I need to
finish a draft proposal next week. So thing should advance faster.
2011/4/22 Andreas Kloeckner <lists(a)informa.tiker.net>:
On Fri, 22 Apr 2011 08:35:03 -0400, Frédéric Bastien <nouiz(a)nouiz.org> wrote:
> Thanks for yours modification. Here is some questions/comments:
> I'm not sure I understand the idea in the to_gpu() function that put
> the same stride on the GpuArray as on the host, but we copy the data
> to make it c contiguous before we transfer it. That way the stride in
> the GpuArray won't represent what is on the gpu. In that case, I think
> we should put the stride as the contiguous version of the data. That
> way the stride on the GpuArray represent what is on the gpu, not the
> original strides.
I disagree. As long as the host data is contiguous (C or Fortran, or in
principle any other way), a straight copy is the fastest and most
logical thing to do, IMO.
I agree with what you said. But I was trying to tell something else.
Here is an example that should make it more clear what I want to say:
x = numpy.random.rand(5,4)
print x.shape #(5, 4)
print x.strides #(32, 8)
x_strided = numpy.random.rand(5,4)[::2]
print x_strided.shape #(3, 4)
print x_strided.strides #(64, 8)
x_fortran = numpy.asfortranarray(x)
print x_fortran.shape #(5, 4)
print x_fortran.strides #(8, 40)
gx = pycuda.gpuarray.to_gpu(x)
assert gx.shape == x.shape
assert gx.strides == x.strides
assert (gx.get() == x).all()
gx_fortran = pycuda.gpuarray.to_gpu(x_fortran)
assert gx_fortran.shape == x_fortran.shape
assert gx_fortran.strides == x_fortran.strides
assert (gx_fortran.get() == x_fortran).all()
gx_NOT_strided = pycuda.gpuarray.to_gpu(x_strided)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
line 784, in to_gpu
line 222, in set
assert gx_NOT_strided.shape == x_strided.shape
assert gx_NOT_strided.strides != x_strided.strides
assert (gx_NOT_strided.get() == x_strided).all()
As you see the last case don't work. I think that the attached patch
strided_transfer.diff should fix that.
> When I talked about having a strided GpuArray, I was thinking
> having a new object with a different attribute name for the gpudata
> attribute. The goal is to make it not compatible with current code
> without modification.
Are you talking about package code or user code? As far as user code is
concerned, the new code accepts the same arrays that the old code did,
and it doesn't change semantics on any one. Thus no need to switch to a
different type, IMO. In general, nothing generates non-contiguous GPU
arrays for now.
Suppose that I implement today slicing (gpu_array[::2]) that generate
strided gpuarray and modif all pycuda code to check if the input if
contiguous before working. Then all in pycuda will be consistent. In
that case, there is no code that will use this new gpu_array in
pycuda, but error will be raised if it get used.
In that case, I still see a problem that will give bad result without
error or warning. I think that is problem WILL happen (this is why I
continue discussing it).
The problem is that if one user made its own gpu code and forget to
modify it to check for the stride. Then if he start using the new
strided gpuarray with its own code, he will have bad result without
error/warning. It is easy to tell that it is the user fault(and it
is...) But in the case where there is a few people doing gpu code and
many using it(like in our lab) If the user that do the gpu code don't
have time to update all the code and some other user start using the
strided gpu array, it get harder to tell it is the user fault.
Also think of people using the strided gpu array with pycula or other
library that tell they support the pycuda.gpuarray. In that case
everybody need to update the code at the same time to make this
I have a simple work around on this. Just do a class that inherit from
GpuArray. In the init call GpuArray.__init__ and after that rename the
self.gpudata to self.gpunddata. Also, don't allow GpuArray to have
stride that are not contiguous. So if someone use old code with
strided gpu array, it will raise an error(without good message, but we
could document it well). To make it a little bit easier on the
interface, we can keep the self.gpudata when the array is contiguous.
So people will be able to use the new class with old code if they
don't use the stride feature.
I prefer 2 arrays and that we deprecate one instead of having silent
wrong error computed. As I said, I don't think it is hard or long to
do. I will try to do one next week.
Great--please do keep me (and the lists!) in the loop, and if
consider that similar changes are needed in PyOpenCL and PyCUDA.
I will keep the list and you updated. I always thinked that PyOpenCL
had a similar python interface then PyCUDA. So I think the change to
PyCUDA could be redone to PyOpenCL without too much difficulty. But I
can be wrong as I didn't used it.
> I think we can compare the current python gpu base array object
> before numpy exit. i.e. There is many variable that are not directly
> compatible, but very close.
I disagree--GPUArray was explicitly conceived as a numpy array
workalike, and it'll slowly grow to get closer to that goal.
I was refering to the fact that in Theano currently we use our own
CudaNdarray object that is not directly compatible with your GpuArray.
There is also a different base type in CudaMat. At that level, I still
think we need to find a common base object that could be used by
everybody in the python gpu community. That will help to make code
more portable and will help to keep the community closer(instead of
being fragmented as now)