I ran into the following problem with PuCUDA. When I calling for example
import pycuda.gpuarray as gpu
a = np.array([ [ 0, 1, 2 ], [ 1, 2, 3 ] ], dtype=np.float64)
a_gpu = gpu.to_gpu(np.transpose(a))
a_gpu does not contain the transposed, but the original array.
When looking into the numpy source I find that the transpose operation
just flips the FORTRAN flag. This can be easily checked:
GPUArray.set just copies the raw data. So if the FORTRAN flag is set,
the array on the GPU will end up transposed. Does anybody know an easy
fix for this behavior?