[PyCUDA] Alright, now what... :)
William King
quentusrex at gmail.com
Sun Nov 22 20:22:18 PST 2009
Ok, I'm testing pycuda now. I think I'll want to use pyopencl but until
I test that I'll use this.
So, it is working now... How would I get an md5 sum to get run? would
that be put into the kernel? Then the kernel would be run per 'thread' ?
Also, on a Nvidia GeForce 9500 GT, how many threads can be run at one
time? http://www.nvidia.com/object/product_geforce_9500gt_us.html
It is 'compute Capability' 1.1, so it has 32 multiprocessors so isn't
it 32*768=24,576 threads?
Can pycuda automatically max out all the cards on the machine? Or how
would I tell it to use both cards?
Also, here is the devices I'm using:
root at quentusrex-desktop:~/NVIDIA_GPU_Computing_SDK/C/bin/linux/release#
./deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
There are 2 devices supporting CUDA
Device 0: "GeForce 9500 GT"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073020928 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host
threads can use this device simultaneously)
Device 1: "GeForce 9500 GT"
CUDA Driver Version: 2.30
CUDA Runtime Version: 2.30
CUDA Capability Major revision number: 1
CUDA Capability Minor revision number: 1
Total amount of global memory: 1073479680 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of registers available per block: 8192
Warp size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512 x 512 x 64
Maximum sizes of each dimension of a grid: 65535 x 65535 x 1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.38 GHz
Concurrent copy and execution: Yes
Run time limit on kernels: Yes
Integrated: No
Support host page-locked memory mapping: No
Compute mode: Default (multiple host
threads can use this device simultaneously)
Test PASSED
Press ENTER to exit...
More information about the PyCUDA
mailing list