Hi, I'm invoking a kernel like using pycuda.driver.Out and using the time_kernel=True argument to return the time in seconds. Does this time include the device-to-host copy time? Thanks! I'm happy to report a preliminary 165x speed up over our existing radar imaging implementation thanks to PyCUDA, developed in 5 days.