Is There a Real Difference between DSPs and GPUs? by Stephanie Mitchell and Tim Knudtson
Main Topics Examples Used in this Presentation D.S.P. Processor Features of the D.S.P. Processor D.S.P. Architecture D.S.P. Programming G.P.U. Processor Features of the G.P.U. Processor G.P.U. Architecture G.P.U. Programming Conclusions
Examples Used in this Presentation Information is given for the following processors: Digital Signal Processor (DSP) TigerSHARC Graphics Processor (GPU) Nvidia GeForce Series 6
D.S.P. Processor A digital signal processor (DSP) is a specialized microprocessor designed specifically for digital signal processing, generally in real-time. Programmable Digital Signal Processor (DSPs) are tuned to efficiently execute the computationally-intensive loops that typically characterize digital signal processing algorithms (i.e. FIR and IIR filters).
Features of the D.S.P. Processor Designed for real-time processing Optimum performance with streaming data Separate program and data memories (Harvard architecture) Special Instructions for SIMD operations No hardware support for multitasking The ability to act as a direct memory access device if in a host environment
D.S.P. Architecture Memory architecture DSPs often use special memory architectures that are able to fetch multiple data and/or instructions at the same time: Harvard architecture modified von Neumann architecture Use of direct memory access Memory-address calculation unit
D.S.P. Architecture … continued Data operations Saturation arithmetic operations that produce overflows will accumulate at the maximum (or minimum) values that the register can hold rather than wrapping around (maximum+1 doesn't overflow to minimum as in many general-purpose CPUs, instead it stays at maximum). Fixed-point arithmetic is often used to speed up arithmetic processing. Single-cycle operations to increase the benefits of pipelining.
D.S.P. Programming Floating-point unit integrated directly into the data-path Special looping hardware. Low-overhead or Zero-overhead looping capability Multiply-accumulate (MAC) operations, which are good for all kinds of matrix operations, such as convolution for filtering, dot product, or even polynomial evaluation.
D.S.P. Programming … continued Instructions to increase parallelism: SIMD, VLIW, superscalar architecture. Specialized instructions for modulo addressing in ring buffers and bit-reversed addressing mode for FFT cross-referencing. Digital signal processors sometimes use time-stationary encoding to simplify hardware and increase coding efficiency
G.P.U. Processor A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated graphics rendering device for a personal computer, workstation, or game console. A GPU is the main processing unit in the architecture of every graphic cards used on computers or game consoles.
Features of the G.P.U. Processor GPU architecture offers a large degree of parallelism. It supports Single Instruction, Multiple Data (SIMD) Most of them have two different types of processing units: Vertex processor (or vertex shader): it is responsible of mathematical operations Pixel (or fragment) processor: it is responsible of texturing operations The third stage is for detailed processing, and may change from architecture to another.
G.P.U. Architecture Processing Unit Focus on Floating point math fp32 and fp16 precision support for intermediate calculations 6 four-wide fp32 vector in shaders and 1scalar multifunction op 16 four-wide fp32 vector in frag-proc plus 16 four-wide fp32 MULs Dedicated fp16 normalization hardware
G.P.U. Architecture… continued Memory Use dedicated but standard memory architectures (eg DRAM) Multiple small independent memory partitions for improved latency Memory used to store buffers and optionally textures In low-end system (Intel 855GM) system memory is shared as the Graphics memory
G.P.U. Architecture… continued Cache Texture caches (2 level) Shared between vertex processors and fragment processors Cache processed/filtered textures Vertex caches cache processed and unprocessed vertexes improve computation and fetch performance Z and buffer cache and write queues
G.P.U. Programming Optimization Texture caches (2 level) Super-scalability resulting in high parallelism SIMD (single instruction multiple data) structure RISC (reduced instruction set computer) architecture neither a board design nor an extra high speed data link is necessary a programmable pipeline (shading and lighting calculations programmed by the user) Non graphical applications to be executed on GPUs has been named GPGPU, or General Purpose Computations on GPUs.
Is There a Real Difference between DSPs and GPUs? Conclusions The answer to the title of this presentation: Is There a Real Difference between DSPs and GPUs? The is no ‘real’ difference simply because these two technologies are always in competition with one of another and both architectures offer a large degree of parallelism at a relatively low cost. But …
Conclusions … continued There pipelines have different units. The GPU is a specialist of gaming graphics so, Vertex Unit: transforms primitives from global 3D into 2D coordinates system. Rasterizer Unit = primitives are converted into square fragments Fragment Unit = the final color for each fragment is computed, (i.e. texture) Composing Unit = fragments are combined with the current rendering The DSP is a specialist digital processing so, Data ALU unit = performs multiply/accumulate and other ALU operations AGU unit = performs memory operand address calculation Program Control Pipeline (PCP) Unit = performs all other instructions (branches, loops, bit tests, etc.)
References [1] P. Trancoso and M. Charalambous. Exploring Graphics Processor Performance for General Purpose Applications. Nicosia, Byprus. [2] M. Takefman and P. Chow. A Streamlined DSP Microprocessor Architecture. Toronto, Canada. 1991. [3] M. Saghir, P. Chow, and C. Lee. Application-Driven Design of DSP Architectures and Compilers. Toronto, Canada. 1994. [4] D. Geer. Taking the Graphics Processor Beyond Graphics. Published by the IEE Computer Society. September, 2005.