Yes, that was indeed the problem. Works really nice now, getting speed
ups of up to ~5x.
As regards to parameter input checking, would it be possible to have a
switch for type-checking as an argument to ElementWise?
On Wed, Jun 29, 2011 at 3:13 PM, Andreas Kloeckner
On Wed, 29 Jun 2011 12:56:09 -0400, Thomas Wiecki
> This is with the version from the trunk
> import pycuda.driver as cuda
> import pycuda.compiler
> import pycuda.autoinit
> import pycuda.gpuarray as gpuarray
> from pycuda.elementwise import ElementwiseKernel
> zero_kernel = ElementwiseKernel(
> "float *out",
> "out[i] = pdf()",
> __device__ float pdf()
> return 0;
> size = 100
> out_gpu = gpuarray.empty(size, float)
> print all(out_gpu.get() == 0)
> print all(out_gpu.get()[:size/2] == 0)
> Produces output (for varying size):
> The second half is the same as before the elementwise kernel call.
What's slightly treacherous here (but this is numpy's fault) is that
"float" in the gpuarray.empty arg refers to Python's "float"
numpy will read as "float64", i.e. double precision. In the interest of
speed, ElementwiseKernel does not do argument type checking. Maybe it