[PyCUDA] pinned memory
Nick Rayrider
nick.rayrider at googlemail.com
Fri Oct 7 08:56:40 PDT 2011
On Fri, Oct 7, 2011 at 5:15 PM, Andreas Kloeckner
<lists at informa.tiker.net> wrote:
> On Fri, 7 Oct 2011 17:03:21 +0200, Nick Rayrider <nick.rayrider at googlemail.com> wrote:
>> On Fri, Oct 7, 2011 at 4:00 PM, Andreas Kloeckner
>> <lists at informa.tiker.net> wrote:
>> > On Fri, 7 Oct 2011 15:38:59 +0200, Nick Rayrider <nick.rayrider at googlemail.com> wrote:
>> >> On Thu, Oct 6, 2011 at 2:47 PM, Andreas Kloeckner
>> >> <lists at informa.tiker.net> wrote:
>> >> > On Thu, 6 Oct 2011 12:25:12 +0200, Nick Rayrider <nick.rayrider at googlemail.com> wrote:
>> >> >> Hi,
>> >> >>
>> >> >> first thanks for this fine piece of software.
>> >> >>
>> >> >> Optimizing my kernels, the nvidia's visual profiler recommended, that
>> >> >> I should use more pinned memory. I read the PyCUDA documentation [1]
>> >> >> and tried to understand the sparse solve example [3], but I could not
>> >> >> make out how to turn an existing numpy array into pinned memory. I did
>> >> >> not find further examples of PageLockedMemoryPool [2].
>> >> >
>> >> > pool = PageLockedMemoryPool()
>> >> > empty_pinned_array = pool.allocate((300, 300), np.float64)
>> >> >
>> >> > Now empty_pinned_array is backed by pinned storage, and when you memcpy
>> >> > to/from it, it'll go faster.
>> >>
>> >> Thanks for the fast answer. I read that you shouldn't use in() as it
>> >> performs a copy [1], so I tried following version to no avail.
>> >> What am I missing? Problably something about how numnpy and pycuda
>> >> handle pointers...
>> >>
>> >> from pycuda.tools import PageLockedMemoryPool
>> >> pool = PageLockedMemoryPool()
>> >> empty_pinned_array = pool.allocate(data.shape,np.float32)
>> >> empty_pinned_array = gpuarray.to_gpu(data)
>> >> my_kernel(empty_pinned_array,... )
>> >>
>> >> [1] http://lists.tiker.net/pipermail/pycuda/2009-August/001784.html
>> >
>> > If you'd like to pass a GPUArray to a kernel, you need to pass
>> > empty_pinned_array.gpudata.
>>
>> I think I am totally on the wrong path. I attached what I have so far.
>> The example compiles and 'calculates' the correct results, but the
>> visual profiler still says that the "host mem transfer" is still
>> pageable.
>
> The transfer off the device behind the 'get' is still pageable, the
> transfer in shouldn't be.
This is what the profiler outputs. Can anyone confirm it?
# CUDA_PROFILE_LOG_VERSION 2.0
# CUDA_DEVICE 0 Quadro FX 580
# CUDA_PROFILE_CSV 1
# TIMESTAMPFACTOR fffff6d463c55b20
method gputime cputime occupancy memtransferhostmemtype
fill 3.072 17 0.042
memcpyHtoD 5.024 14 0
kernel 2.208 12 0.333
memcpyDtoH 5.024 26 0
More information about the PyCUDA
mailing list