does anyone have any thoughts? is this feasible?
On Mon, Nov 23, 2015 at 3:14 PM, Keith Brown <keith6014(a)gmail.com> wrote:
Thanks all for the replies.
My goal is simple. Atleast, I though it was simple :-)
I have function where I calculate the dot product
I need to do this 8k times. The max size of 'a' and 'b' are (3 million,
For smaller size of a and b. linalg.dot is working great. But I want a
more efficient way using GPU.
Perhaps, GPU isn't the way to go since the memory is too large?
On Mon, Nov 23, 2015 at 2:26 PM, Stanley Seibert <stan(a)mtrr.org> wrote:
> From the cuBLAS-XT description:
> "By using a streaming design, cuBLAS-XT efficiently manages transfers across the
PCI-Express bus automatically, which allows input and output data to be stored on the
host’s system memory. This provides out-of-core operation – the size of operand data is
only limited by system memory size, not by GPU on-board memory size.”
> So I don’t think cuBLAS-XT can help unless you have more than 95 GB of system RAM.
If that is not the case, I think you have to step back and think about what you need to do
with this array ultimately, and where you want to stage the data if you need to compute
all 95 GB of it at once.
>> On Nov 23, 2015, at 12:58 PM, Keith Brown <keith6014(a)gmail.com> wrote:
>> Correct. My result matrix will be too large.
>> I would think cublasXT would take care of this for me. I though it
>> would do some sort of divide and conquer.
>> Is there a way to attack this sort of problem?
>> On Mon, Nov 23, 2015 at 11:38 AM, Jonas Bardino <bardino(a)nbi.ku.dk> wrote:
>>> Ehmm, I'm not sure I understand exactly what you do, but to me it sounds
>>> like you try to calculate the dot product of a 160080 x 3 matrix and a
>>> similar one transposed, i.e. a 3 x 160080 matrix. That would give you a
>>> 160080 x 160080 matrix result - which surely won't fit your 3GB of GPU
>>> Cheers, Jonas
>>> On 2015-11-23 17:10, Keith Brown wrote:
>>>> I have a 2 small matrix (160080,3) of type float32 and I am
>>>> calculating their dot product. While doing this, I keep getting
>>>> pycuda.__driver.MemoryError: cuMemAlloc failed out of memory.
>>>> I have 2 cards, each with 3GB of memory. Each matrix takes about 1875
>>>> kilobytes. I am not sure why this is occuring.
>>>> c_gpu = linalg.dot(a_gpu,b_gpu,'N','T',handle=handle)
>>>> My handle is a cublasxt (not regular cublas since blasxt apprently
>>>> does better memory handling).
>>>> Any idea what is going on?
>>>> PyCUDA mailing list
>>> PyCUDA mailing list
>> PyCUDA mailing list