[PyOpenCL] typecast real to complex

Karsten Wiesner karsten.wiesner at aei.mpg.de
Thu Mar 8 05:30:57 PST 2012


Hi Andreas

On 3/8/12 1:55 AM, Andreas Kloeckner wrote:
> <#part sign=pgpmime>
> Hi Alex, Karsten,
>
> On Wed, 7 Mar 2012 16:43:22 -0500, Alex Nitz<alex.nitz at ligo.org>  wrote:
>> This is what I am getting on an NVIDIA GTX580.
>> [snip]
>> :20:20: error: can't convert between vector values of different size
>> ('cfloat_t __attribute__((address_space(1)))' (aka 'float2
>> __attribute__((address_space(1)))') and 'double')
>>              z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
>>                     ^~~~~~~~~~~~~~~~~~~
>> :20:33: note: instantiated from:
>>              z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
> I've committed what I think might be a workaround for this to git. Can
> you please try it out and report back?
>
> Andreas
>

I rebuild on commit:

* commit 9e925431b16a09a1e18794a30b95f820d56b037a (HEAD, origin/master,
origin/HEAD, mast
| Author: Andreas Kloeckner<inform at tiker.net>
| Date:   Wed Mar 7 19:52:08 2012 -0500
|
|     Try and work around an Nvidia compiler issue in cfloat_cast().
|
|     (Issue reported by Karsten Wiesner and Alex Nitz)

But still got the error at different GPUs:
(below you'll find a oclDeviceQuery output)


Build on<pyopencl.Device 'nForce 980a/780a SLI' on 'NVIDIA CUDA' at
0xe31f60>:

:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                    ^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                                ~^~~~~


Build on<pyopencl.Device 'Tesla C1060' on 'NVIDIA CUDA' at 0x1add190>:

:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                    ^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                                ~^~~~~

Build on<pyopencl.Device 'Tesla C2050' on 'NVIDIA CUDA' at 0x19214f0>:

:20:20: error: can't convert between vector values of different size
('cfloat_t __attribute__((address_space(1)))' (aka 'float2
__attribute__((address_space(1)))') and 'double')
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                    ^~~~~~~~~~~~~~~~~~~
:20:33: note: instantiated from:
             z[i] = cfloat_cast(a*x[i]) + cfloat_cast(b);
                                ~^~~~~

Cheers, Karsten

PS.:

[oclDeviceQuery] starting...
/usr/local/nvidia/OpenCL/bin/linux/release/oclDeviceQuery Starting...

OpenCL SW Info:

  CL_PLATFORM_NAME: 	NVIDIA CUDA
  CL_PLATFORM_VERSION: 	OpenCL 1.1 CUDA 4.1.1
  OpenCL SDK Revision: 	7027912


OpenCL Device Info:

  3 devices found supporting OpenCL:

  ---------------------------------
  Device Tesla C2050
  ---------------------------------
   CL_DEVICE_NAME: 			Tesla C2050
   CL_DEVICE_VENDOR: 			NVIDIA Corporation
   CL_DRIVER_VERSION: 			290.10
   CL_DEVICE_VERSION: 			OpenCL 1.1 CUDA
   CL_DEVICE_OPENCL_C_VERSION: 		OpenCL C 1.1
   CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU
   CL_DEVICE_MAX_COMPUTE_UNITS:		14
   CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3
   CL_DEVICE_MAX_WORK_ITEM_SIZES:	1024 / 1024 / 64
   CL_DEVICE_MAX_WORK_GROUP_SIZE:	1024
   CL_DEVICE_MAX_CLOCK_FREQUENCY:	1147 MHz
   CL_DEVICE_ADDRESS_BITS:		32
   CL_DEVICE_MAX_MEM_ALLOC_SIZE:		671 MByte
   CL_DEVICE_GLOBAL_MEM_SIZE:		2687 MByte
   CL_DEVICE_ERROR_CORRECTION_SUPPORT:	yes
   CL_DEVICE_LOCAL_MEM_TYPE:		local
   CL_DEVICE_LOCAL_MEM_SIZE:		48 KByte
   CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE
   CL_DEVICE_IMAGE_SUPPORT:		1
   CL_DEVICE_MAX_READ_IMAGE_ARGS:	128
   CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8
   CL_DEVICE_SINGLE_FP_CONFIG:		denorms INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

   CL_DEVICE_IMAGE<dim>			2D_MAX_WIDTH	 32768
					2D_MAX_HEIGHT	 32768
					3D_MAX_WIDTH	 2048
					3D_MAX_HEIGHT	 2048
					3D_MAX_DEPTH	 2048

   CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store
					cl_khr_icd
					cl_khr_gl_sharing
					cl_nv_compiler_options
					cl_nv_device_attribute_query
					cl_nv_pragma_unroll
					cl_khr_global_int32_base_atomics
					cl_khr_global_int32_extended_atomics
					cl_khr_local_int32_base_atomics
					cl_khr_local_int32_extended_atomics
					cl_khr_fp64


   CL_DEVICE_COMPUTE_CAPABILITY_NV:	2.0
   NUMBER OF MULTIPROCESSORS:		14
   NUMBER OF CUDA CORES:			448
   CL_DEVICE_REGISTERS_PER_BLOCK_NV:	32768
   CL_DEVICE_WARP_SIZE_NV:		32
   CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE
   CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE
   CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE
   CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


  ---------------------------------
  Device Tesla C1060
  ---------------------------------
   CL_DEVICE_NAME: 			Tesla C1060
   CL_DEVICE_VENDOR: 			NVIDIA Corporation
   CL_DRIVER_VERSION: 			290.10
   CL_DEVICE_VERSION: 			OpenCL 1.0 CUDA
   CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU
   CL_DEVICE_MAX_COMPUTE_UNITS:		30
   CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3
   CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64
   CL_DEVICE_MAX_WORK_GROUP_SIZE:	512
   CL_DEVICE_MAX_CLOCK_FREQUENCY:	1296 MHz
   CL_DEVICE_ADDRESS_BITS:		32
   CL_DEVICE_MAX_MEM_ALLOC_SIZE:		1023 MByte
   CL_DEVICE_GLOBAL_MEM_SIZE:		4095 MByte
   CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no
   CL_DEVICE_LOCAL_MEM_TYPE:		local
   CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte
   CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE
   CL_DEVICE_IMAGE_SUPPORT:		1
   CL_DEVICE_MAX_READ_IMAGE_ARGS:	128
   CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8
   CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

   CL_DEVICE_IMAGE<dim>			2D_MAX_WIDTH	 4096
					2D_MAX_HEIGHT	 16383
					3D_MAX_WIDTH	 2048
					3D_MAX_HEIGHT	 2048
					3D_MAX_DEPTH	 2048

   CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store
					cl_khr_icd
					cl_khr_gl_sharing
					cl_nv_compiler_options
					cl_nv_device_attribute_query
					cl_nv_pragma_unroll
					cl_khr_global_int32_base_atomics
					cl_khr_global_int32_extended_atomics
					cl_khr_local_int32_base_atomics
					cl_khr_local_int32_extended_atomics
					cl_khr_fp64


   CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.3
   NUMBER OF MULTIPROCESSORS:		30
   NUMBER OF CUDA CORES:			240
   CL_DEVICE_REGISTERS_PER_BLOCK_NV:	16384
   CL_DEVICE_WARP_SIZE_NV:		32
   CL_DEVICE_GPU_OVERLAP_NV:		CL_TRUE
   CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE
   CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_FALSE
   CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 1


  ---------------------------------
  Device nForce 980a/780a SLI
  ---------------------------------
   CL_DEVICE_NAME: 			nForce 980a/780a SLI
   CL_DEVICE_VENDOR: 			NVIDIA Corporation
   CL_DRIVER_VERSION: 			290.10
   CL_DEVICE_VERSION: 			OpenCL 1.0 CUDA
   CL_DEVICE_TYPE:			CL_DEVICE_TYPE_GPU
   CL_DEVICE_MAX_COMPUTE_UNITS:		1
   CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS:	3
   CL_DEVICE_MAX_WORK_ITEM_SIZES:	512 / 512 / 64
   CL_DEVICE_MAX_WORK_GROUP_SIZE:	512
   CL_DEVICE_MAX_CLOCK_FREQUENCY:	1200 MHz
   CL_DEVICE_ADDRESS_BITS:		32
   CL_DEVICE_MAX_MEM_ALLOC_SIZE:		128 MByte
   CL_DEVICE_GLOBAL_MEM_SIZE:		253 MByte
   CL_DEVICE_ERROR_CORRECTION_SUPPORT:	no
   CL_DEVICE_LOCAL_MEM_TYPE:		local
   CL_DEVICE_LOCAL_MEM_SIZE:		16 KByte
   CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE:	64 KByte
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
   CL_DEVICE_QUEUE_PROPERTIES:		CL_QUEUE_PROFILING_ENABLE
   CL_DEVICE_IMAGE_SUPPORT:		1
   CL_DEVICE_MAX_READ_IMAGE_ARGS:	128
   CL_DEVICE_MAX_WRITE_IMAGE_ARGS:	8
   CL_DEVICE_SINGLE_FP_CONFIG:		INF-quietNaNs round-to-nearest round-to-zero round-to-inf fma

   CL_DEVICE_IMAGE<dim>			2D_MAX_WIDTH	 4096
					2D_MAX_HEIGHT	 16383
					3D_MAX_WIDTH	 2048
					3D_MAX_HEIGHT	 2048
					3D_MAX_DEPTH	 2048

   CL_DEVICE_EXTENSIONS:			cl_khr_byte_addressable_store
					cl_khr_icd
					cl_khr_gl_sharing
					cl_nv_compiler_options
					cl_nv_device_attribute_query
					cl_nv_pragma_unroll
					cl_khr_global_int32_base_atomics
					cl_khr_global_int32_extended_atomics


   CL_DEVICE_COMPUTE_CAPABILITY_NV:	1.1
   NUMBER OF MULTIPROCESSORS:		1
   NUMBER OF CUDA CORES:			8
   CL_DEVICE_REGISTERS_PER_BLOCK_NV:	8192
   CL_DEVICE_WARP_SIZE_NV:		32
   CL_DEVICE_GPU_OVERLAP_NV:		CL_FALSE
   CL_DEVICE_KERNEL_EXEC_TIMEOUT_NV:	CL_FALSE
   CL_DEVICE_INTEGRATED_MEMORY_NV:	CL_TRUE
   CL_DEVICE_PREFERRED_VECTOR_WIDTH_<t>	CHAR 1, SHORT 1, INT 1, LONG 1, FLOAT 1, DOUBLE 0


   ---------------------------------
   2D Image Formats Supported (71)
   ---------------------------------
   #     Channel Order   Channel Type

   1     CL_R            CL_FLOAT
   2     CL_R            CL_HALF_FLOAT
   3     CL_R            CL_UNORM_INT8
   4     CL_R            CL_UNORM_INT16
   5     CL_R            CL_SNORM_INT16
   6     CL_R            CL_SIGNED_INT8
   7     CL_R            CL_SIGNED_INT16
   8     CL_R            CL_SIGNED_INT32
   9     CL_R            CL_UNSIGNED_INT8
   10    CL_R            CL_UNSIGNED_INT16
   11    CL_R            CL_UNSIGNED_INT32
   12    CL_A            CL_FLOAT
   13    CL_A            CL_HALF_FLOAT
   14    CL_A            CL_UNORM_INT8
   15    CL_A            CL_UNORM_INT16
   16    CL_A            CL_SNORM_INT16
   17    CL_A            CL_SIGNED_INT8
   18    CL_A            CL_SIGNED_INT16
   19    CL_A            CL_SIGNED_INT32
   20    CL_A            CL_UNSIGNED_INT8
   21    CL_A            CL_UNSIGNED_INT16
   22    CL_A            CL_UNSIGNED_INT32
   23    CL_RG           CL_FLOAT
   24    CL_RG           CL_HALF_FLOAT
   25    CL_RG           CL_UNORM_INT8
   26    CL_RG           CL_UNORM_INT16
   27    CL_RG           CL_SNORM_INT16
   28    CL_RG           CL_SIGNED_INT8
   29    CL_RG           CL_SIGNED_INT16
   30    CL_RG           CL_SIGNED_INT32
   31    CL_RG           CL_UNSIGNED_INT8
   32    CL_RG           CL_UNSIGNED_INT16
   33    CL_RG           CL_UNSIGNED_INT32
   34    CL_RA           CL_FLOAT
   35    CL_RA           CL_HALF_FLOAT
   36    CL_RA           CL_UNORM_INT8
   37    CL_RA           CL_UNORM_INT16
   38    CL_RA           CL_SNORM_INT16
   39    CL_RA           CL_SIGNED_INT8
   40    CL_RA           CL_SIGNED_INT16
   41    CL_RA           CL_SIGNED_INT32
   42    CL_RA           CL_UNSIGNED_INT8
   43    CL_RA           CL_UNSIGNED_INT16
   44    CL_RA           CL_UNSIGNED_INT32
   45    CL_RGBA         CL_FLOAT
   46    CL_RGBA         CL_HALF_FLOAT
   47    CL_RGBA         CL_UNORM_INT8
   48    CL_RGBA         CL_UNORM_INT16
   49    CL_RGBA         CL_SNORM_INT16
   50    CL_RGBA         CL_SIGNED_INT8
   51    CL_RGBA         CL_SIGNED_INT16
   52    CL_RGBA         CL_SIGNED_INT32
   53    CL_RGBA         CL_UNSIGNED_INT8
   54    CL_RGBA         CL_UNSIGNED_INT16
   55    CL_RGBA         CL_UNSIGNED_INT32
   56    CL_BGRA         CL_UNORM_INT8
   57    CL_BGRA         CL_SIGNED_INT8
   58    CL_BGRA         CL_UNSIGNED_INT8
   59    CL_ARGB         CL_UNORM_INT8
   60    CL_ARGB         CL_SIGNED_INT8
   61    CL_ARGB         CL_UNSIGNED_INT8
   62    CL_INTENSITY    CL_FLOAT
   63    CL_INTENSITY    CL_HALF_FLOAT
   64    CL_INTENSITY    CL_UNORM_INT8
   65    CL_INTENSITY    CL_UNORM_INT16
   66    CL_INTENSITY    CL_SNORM_INT16
   67    CL_LUMINANCE    CL_FLOAT
   68    CL_LUMINANCE    CL_HALF_FLOAT
   69    CL_LUMINANCE    CL_UNORM_INT8
   70    CL_LUMINANCE    CL_UNORM_INT16
   71    CL_LUMINANCE    CL_SNORM_INT16

   ---------------------------------
   3D Image Formats Supported (71)
   ---------------------------------
   #     Channel Order   Channel Type

   1     CL_R            CL_FLOAT
   2     CL_R            CL_HALF_FLOAT
   3     CL_R            CL_UNORM_INT8
   4     CL_R            CL_UNORM_INT16
   5     CL_R            CL_SNORM_INT16
   6     CL_R            CL_SIGNED_INT8
   7     CL_R            CL_SIGNED_INT16
   8     CL_R            CL_SIGNED_INT32
   9     CL_R            CL_UNSIGNED_INT8
   10    CL_R            CL_UNSIGNED_INT16
   11    CL_R            CL_UNSIGNED_INT32
   12    CL_A            CL_FLOAT
   13    CL_A            CL_HALF_FLOAT
   14    CL_A            CL_UNORM_INT8
   15    CL_A            CL_UNORM_INT16
   16    CL_A            CL_SNORM_INT16
   17    CL_A            CL_SIGNED_INT8
   18    CL_A            CL_SIGNED_INT16
   19    CL_A            CL_SIGNED_INT32
   20    CL_A            CL_UNSIGNED_INT8
   21    CL_A            CL_UNSIGNED_INT16
   22    CL_A            CL_UNSIGNED_INT32
   23    CL_RG           CL_FLOAT
   24    CL_RG           CL_HALF_FLOAT
   25    CL_RG           CL_UNORM_INT8
   26    CL_RG           CL_UNORM_INT16
   27    CL_RG           CL_SNORM_INT16
   28    CL_RG           CL_SIGNED_INT8
   29    CL_RG           CL_SIGNED_INT16
   30    CL_RG           CL_SIGNED_INT32
   31    CL_RG           CL_UNSIGNED_INT8
   32    CL_RG           CL_UNSIGNED_INT16
   33    CL_RG           CL_UNSIGNED_INT32
   34    CL_RA           CL_FLOAT
   35    CL_RA           CL_HALF_FLOAT
   36    CL_RA           CL_UNORM_INT8
   37    CL_RA           CL_UNORM_INT16
   38    CL_RA           CL_SNORM_INT16
   39    CL_RA           CL_SIGNED_INT8
   40    CL_RA           CL_SIGNED_INT16
   41    CL_RA           CL_SIGNED_INT32
   42    CL_RA           CL_UNSIGNED_INT8
   43    CL_RA           CL_UNSIGNED_INT16
   44    CL_RA           CL_UNSIGNED_INT32
   45    CL_RGBA         CL_FLOAT
   46    CL_RGBA         CL_HALF_FLOAT
   47    CL_RGBA         CL_UNORM_INT8
   48    CL_RGBA         CL_UNORM_INT16
   49    CL_RGBA         CL_SNORM_INT16
   50    CL_RGBA         CL_SIGNED_INT8
   51    CL_RGBA         CL_SIGNED_INT16
   52    CL_RGBA         CL_SIGNED_INT32
   53    CL_RGBA         CL_UNSIGNED_INT8
   54    CL_RGBA         CL_UNSIGNED_INT16
   55    CL_RGBA         CL_UNSIGNED_INT32
   56    CL_BGRA         CL_UNORM_INT8
   57    CL_BGRA         CL_SIGNED_INT8
   58    CL_BGRA         CL_UNSIGNED_INT8
   59    CL_ARGB         CL_UNORM_INT8
   60    CL_ARGB         CL_SIGNED_INT8
   61    CL_ARGB         CL_UNSIGNED_INT8
   62    CL_INTENSITY    CL_FLOAT
   63    CL_INTENSITY    CL_HALF_FLOAT
   64    CL_INTENSITY    CL_UNORM_INT8
   65    CL_INTENSITY    CL_UNORM_INT16
   66    CL_INTENSITY    CL_SNORM_INT16
   67    CL_LUMINANCE    CL_FLOAT
   68    CL_LUMINANCE    CL_HALF_FLOAT
   69    CL_LUMINANCE    CL_UNORM_INT8
   70    CL_LUMINANCE    CL_UNORM_INT16
   71    CL_LUMINANCE    CL_SNORM_INT16

oclDeviceQuery, Platform Name = NVIDIA CUDA, Platform Version = OpenCL 1.1 CUDA 4.1.1, SDK Revision = 7027912, NumDevs = 3, Device = Tesla C2050, Device = Tesla C1060, Device = nForce 980a/780a SLI

System Info:

  Local Time/Date =  14:28:48, 03/08/2012
  CPU Name: AMD Phenom(tm) 9750 Quad-Core Processor
  # of CPU processors: 4
  Linux version 2.6.32.28-atlas-generic (root at bob) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Wed Feb 2 09:00:17 CET 2011






More information about the PyOpenCL mailing list