OpenCL Peter Holvenstot
OpenCL Designed as an API and language specification Standards maintained by the Khronos group Currently 1.0, 1.1, and 1.2 Manufacturers release their own SDK and drivers Major backers: Apple, AMD/ATI, Intel
OpenCL Alternative to CUDA Not limited to ATI GPUs Designed for “heterogenous computing” Executable on many devices, including CPUs, GPUs, DSPs, and FPGAs
OpenCL Similar structure of host programs and kernels Set of compute devices is called a 'context' Kernels executed by 'processing elements' Kernels can be compiled at run-time or build-time
OpenCL Task Parallelism – many kernels running at once OpenCL 1.2 – device can be partitioned down to single Compute Unit Built-in kernels for device-specific functionality
Advantages Same code can be run on different devices Can also be run on NVIDIA GPUs! AMD/ATI attempting to integrate compute elements into other platforms (Accelerated Processing Units) Limited library of portable math routines Most common BLAST and FFT routines
Performance
Disadvantages No “official” implementation Vendors may meet specs or add restrictions Apple adds restrictions on group size Devices need appropriate settings to perform well Different capabilities → different performance Solution: Tuning/load balancing framework
Non-Optimized Performance
Restrictions No recursion, variadics, or function pointer Cannot dynamically allocate memory from device No native variable-length arrays, double-precision Some can be worked around by extensions
Terminology CUDA: Scalar Core Streaming Multiprocssr Warp PTX OpenCL: Stream Core Compute Unit Wavefront Intermediate Language
Terminology CUDA: Host Memory Global/Device Memory Local Memory Constant Memory Shared Memory Registers OpenCL: Host Memory Global Memory Constant Memory Local Memory Private Memory
Terminology CUDA: Grid Block Thread Thread ID Block Index Thread Index OpenCL: NDRange Work group Work item Global ID Block ID Local ID
References content/uploads/2012/02/CUDAvsOpenCL.pdf content/uploads/2012/02/CUDAvsOpenCL.pdf /Cuda+and+OpenCL+API+comparison_presented.p df /Cuda+and+OpenCL+API+comparison_presented.p df 28/opencl_gains_ground_on_cuda.html 28/opencl_gains_ground_on_cuda.html ERS/parcocudaopencl.pdf ERS/parcocudaopencl.pdf