Hi, All.
I have developed Lattice Boltzmann Method Code with PyCUDA in our company for simulating Air flow.
Then, I need to handle large gpuarray such like arr[velocity][Z][Y][X] for 3-dimensional fluid flow.
My code run correctly relatively small size gpuarray such as (27, 300, 300, 300).
But Changing gpuarray size from (27, 300, 300, 300) to (27, 450, 450, 450) gives following error.
Error message
OverflowError : can't convert negative int to unsigned
For debugging it, I'm testing following simple code, which also arise error if I designate large size numpy array such like (27, 450, 450, 450).
//
// sample code start
//
import math
import numpy as np
import pycuda.gpuarray as gpuarray
from pycuda.compiler import SourceModule
import pycuda.autoinit
module = SourceModule("""
__global__ void plus_one_3d(int nx, int ny, int nz, int nv, float *arr){
const int x = threadIdx.x + blockDim.x * blockIdx.x;
const int y = threadIdx.y + blockDim.y * blockIdx.y;
const int z = threadIdx.z + blockDim.z * blockIdx.z;
const int nxyz = nx * ny * z + nx * y + x;
int ijk = nx * ny * z + nx * y + x;
if (x < nx && y < ny && z < nz){
for (int c = 0; c < nv; c++){
arr[nxyz * c + ijk] += 1.0;
}
}
}
""")
plus_one = module.get_function("plus_one_3d")
num_x, num_y, num_z = np.int32(450), np.int32(450), np.int32(450)
nv = np.int32(27)
arr_gpu = gpuarray.zeros([nv, num_z, num_y, num_x], dtype=np.float32)
threads_per_block = (6, 6, 6)
block_x = math.ceil(num_x / threads_per_block[0])
block_y = math.ceil(num_y / threads_per_block[1])
block_z = math.ceil(num_z / threads_per_block[2])
blocks_per_grid = (block_x, block_y, block_z)
plus_one(num_x, num_y, num_z, nv, arr_gpu, block=threads_per_block, grid=blocks_per_grid)
arr = arr_gpu.get()
print("result :", arr)
//
// sample code end
//
Debugging with pycharm leads variables "s" become negative of GPUArray class when I designate shape (27, 450, 450, 450) as a gpuarray.
But s is calculated correctly when I designate shape (27, 300, 300, 300). I think data type of s is something wrong.
Any advise ?
Besh wishes,
t-tetsuya
Hi All,
In PyCUDA, what is the API to free the allocated page-locked memory? In
CUDA, we have cudaFreeHost(void* ptr) to free the page-locked memory, but I
didn't find the corresponding API in PyCUDA. Any help would be appreciated.
Regards,
Rengan