GPUArray class gives negative "s" with large size gpuarray
by takayanagi.tetsuya＠jp.panasonic.com

Hi, All.
I have developed Lattice Boltzmann Method Code with PyCUDA in our company for simulating Air flow.
Then, I need to handle large gpuarray such like arr[velocity][Z][Y][X] for 3-dimensional fluid flow.
My code run correctly relatively small size gpuarray such as (27, 300, 300, 300).
But Changing gpuarray size from (27, 300, 300, 300) to (27, 450, 450, 450) gives following error.
Error message
OverflowError : can't convert negative int to unsigned
For debugging it, I'm testing following simple code, which also arise error if I designate large size numpy array such like (27, 450, 450, 450).
//
// sample code start
//
import math
import numpy as np
import pycuda.gpuarray as gpuarray
from pycuda.compiler import SourceModule
import pycuda.autoinit
module = SourceModule("""
__global__ void plus_one_3d(int nx, int ny, int nz, int nv, float *arr){
const int x = threadIdx.x + blockDim.x * blockIdx.x;
const int y = threadIdx.y + blockDim.y * blockIdx.y;
const int z = threadIdx.z + blockDim.z * blockIdx.z;
const int nxyz = nx * ny * z + nx * y + x;
int ijk = nx * ny * z + nx * y + x;
if (x < nx && y < ny && z < nz){
for (int c = 0; c < nv; c++){
arr[nxyz * c + ijk] += 1.0;
}
}
}
""")
plus_one = module.get_function("plus_one_3d")
num_x, num_y, num_z = np.int32(450), np.int32(450), np.int32(450)
nv = np.int32(27)
arr_gpu = gpuarray.zeros([nv, num_z, num_y, num_x], dtype=np.float32)
threads_per_block = (6, 6, 6)
block_x = math.ceil(num_x / threads_per_block[0])
block_y = math.ceil(num_y / threads_per_block[1])
block_z = math.ceil(num_z / threads_per_block[2])
blocks_per_grid = (block_x, block_y, block_z)
plus_one(num_x, num_y, num_z, nv, arr_gpu, block=threads_per_block, grid=blocks_per_grid)
arr = arr_gpu.get()
print("result :", arr)
//
// sample code end
//
Debugging with pycharm leads variables "s" become negative of GPUArray class when I designate shape (27, 450, 450, 450) as a gpuarray.
But s is calculated correctly when I designate shape (27, 300, 300, 300). I think data type of s is something wrong.
Any advise ?
Besh wishes,
t-tetsuya
5 months, 3 weeks

How to free page-locked memory?
by Rengan Xu

Hi All,
In PyCUDA, what is the API to free the allocated page-locked memory? In
CUDA, we have cudaFreeHost(void* ptr) to free the page-locked memory, but I
didn't find the corresponding API in PyCUDA. Any help would be appreciated.
Regards,
Rengan
6 months