Evolution of the Graphics Process Units Dr. Zhijie Xu
A few words about me Research Interests: Simulation – VR – Game – (backing to) VR and CG. Fascinated by the R&D in CG and progresses on rendering devices Retina display and brain wave control
Outline History of the GPUs Process Paradigm and Programming Model Current Research Hotspots Future Trend
Forewords from the IEEE Visualization 2005 Conference Desktop computer architecture is at a turning point. In the last two years, CPU speeds have nearly stopped increasing and all major CPU manufacturers have announced multi-core, parallel processors. Future performance improvements will predominantly come from parallelism rather than from an ever-increasing uni-processor clock speed. Commodity graphics processors (GPUs), in contrast, already contain many parallel processing units and are capable of sustaining computation rates greater than ten times that of a modern CPU. The GPU programming model, however, is very different from traditional CPU models.
What is a GPU? “A Graphics Processing Unit or GPU (also occasionally called Visual Processing Unit or VPU) is a dedicated graphics rendering device for a personal computer or game console. Modern GPUs are very efficient at manipulating and displaying computer graphics, and their highly-parallel structure makes them more effective than typical CPUs for a range of complex algorithms.” - Definition from wikipedia.org Radeon 9800 Pro
History of GPUs The pre-GPU era VGAs in the 80s 4 (or even 5) generations of GPUs in the last decade Fixed functions vs. programmability API support OpenGL, Direct3D (v6.0 to v9.0) Shader models (v1.0 – v3.0)
History of GPUs – generations in the function’s term First-Generation GPUs Up to 1998; Nvidia’s TNT2, ATi’s Rage, and 3dfx’s Voodoo3;DX6 feature set. Second-Generation GPUs ; Nvidia’s GeForce256 and GeForce2, ATi’s Radeon7500, and S3’s Savage3D; T&L; OpenGL and DX7;Configurable. Third-Generation GPUs 2001; GeForce3/4Ti, Radeon8500, MS’s Xbox; OpenGL ARB, DX7/8; Vertex Programmability + ASM Fourth-Generation GPUs 2002 onwards; GeForce FX family, Radeon 9700; OpenGL+extensions, DX9; Vertex/Pixel Programability + HLSL; 0.13μ Process, 125M T/C, 200M T/S. I have just seen Radeon X1900 last Thursday
History of GPUs - generations in the stream processing’s term Pre-NV2x: no explicit support for stream processing. Kernel operations are usually hidden in the API and provide too little flexibility for general use. NV2x: kernel stream operations are now explicitly under the programmer's control but only for vertex processing (fragments are still using old paradigms). No branching support severely hampers flexibility but some algorithms can be run (notably, low-precision fluid simulation). RD3xx: increased performance and precision with limited support for branching/looping in both vertex and fragment processing. The model is now flexible enough to cover many purposes. NV4x: Very flexible branching support although some limitations still exists on the number of operations to be executed and strict recursion depth. Performance is estimated to be from 20 to 44GFLOPs.
What GPUs are capable of?
Why shifting from CPU to GPU? Why not just keep increasing the CPU speed and leave the GPU to handle what is its best? CPU speed is reaching a bottle neck (how many transistors can be integrated on a chip) Solution, in the future, nano technology, short term, dual core machines (double CPUs), clustered CPUs, …, even grid computing and supercomputing GPU facing the same problem, but still have space to press on due to its task specific designs and parallelism paradigm
Hungers for More Computational Powers – volume, speed, accuracy Supercomputing (parallel computing) Applications, particle dynamics, network analysis, finite element analysis, ocean tide analysis, virtual universe simulation, airplane design, other military simulation, etc. Japanese Earth Simulator, champion of 2002 (5120 NEC CPUs) IBM Blue Gene winner in 2005 (65536 Duel-core PowerPC CPUs) What’s missing in the formula? - COST
Process Paradigm and Programming Model Real-time computer graphics hardware is transiting from supporting a few fixed algorithms to being fully programmable. At the same time, the performance of graphics processors (GPUs) is increasing at a rapid rate because GPUs can effectively exploit the enormous parallelism available in graphics computations. These improvements in GPU flexibility and performance are likely to continue in the future, and will allow developers to write increasingly sophisticated and diverse programs that execute on the GPU.
From Sequential to Parallel Paradigm Conventional, sequential paradigm for(int i = 0; i < 100 * 4; i++) result[i] = source0[i] + source1[i]; Parallel SIMD paradigm, packed registers for(int el = 0; el < 100; el++) vector_sum(result[el], source0[el], source1[el]); Parallel Stream paradigm (SIMD/MIMD) streamElements 100 streamElementFormat 4 numbers elementKernel result = kernel(source0, source1) Stream processing is a relatively new, yet very successful paradigm to allow parallel processing at never-seen-before efficiency with minimal effort. Compared to existing architectures, stream processors are able to provide up to 20X the performance at the same power dissipation and die size.
GPU Rendering Pipeline Source nVidia – “Vertex Shader Introduction”
Data Flow in the Pipeline A scene description: vertices, triangles, colors, lighting Transformations that map the scene to a camera viewpoint “Effects”: texturing, shadow mapping, lighting calculations Rasterizing: converting geometry into pixels Pixel processing: depth tests, stencil tests, and other per-pixel operations.
The Motivation for High Level Languages Graphics hardware has become increasingly more powerful Programming powerful hardware with assembly code is hard GeForce FX supports programs more than 1,000 assembly instructions long Programmers need the benefits of a high-level language: Easier programming Easier code reuse Easier debugging Assembly … DP3 R0, c[11].xyzx, c[11].xyzx; RSQ R0, R0.x; MUL R0, R0.x, c[11].xyzx; MOV R1, c[3]; MUL R1, R1.x, c[0].xyzx; DP3 R2, R1.xyzx, R1.xyzx; RSQ R2, R2.x; MUL R1, R2.x, R1.xyzx; ADD R2, R0.xyzx, R1.xyzx; DP3 R3, R2.xyzx, R2.xyzx; RSQ R3, R3.x; MUL R2, R3.x, R2.xyzx; DP3 R2, R1.xyzx, R2.xyzx; MAX R2, c[3].z, R2.x; MOV R2.z, c[3].y; MOV R2.w, c[3].y; LIT R2, R2;... HLL float4 cSpecular = pow(max(0, dot(Nf, H)), phongExp).xxx; float4 cPlastic = Cd * (cAmbient + cDiffuse) + Cs * cSpecular;
GPU Programming Game Applications: Per-pixel lighting Vertex displacement Furs and Shines (ATi demos) Various Shading Models (Treasure box and RenderMonkey) Bump map creation and the virtual earth
One more reason to have a decent Graphics Card with a decent GPU mounted.. Microsoft Windows Vista Operating System To be released at the end of this year Aero glass 3D interface More than half of all PCs (more than 63% of 203million PCs) won’t support it because the integrated graphics adaptor only support Windows2000 and WindowsXP’s 2D interface Aero Glass is part of the Vista’s interface – Aero, which requires the graphics card to support DirectX9.0c, for example, Nvidia GeForce5900 In 2005, there were over 22.3 million standalone graphics cards (market value over 10 billion dollars) sold globally, in which more than 72% (13.4 million) can support Aero Glass Microsoft announced last week, the next big game title released – Ring II – will only run on Vista Vista causes legal battles with PC manufacturers
Non-Game Applications: GPGPU Recent advances in programmability and architectural design have enabled the use of GPU processors for general purpose computation. Applications in: Linear algebra Geometric Computing Database and Stream Mining GPU Ray Tracing Advanced Image Processing Computational Fluid dynamics (CFD) and Finite Element Analysis
Problems Need to be Solved Significant barriers exist for the developer who wishes to use the inexpensive power of commodity graphics hardware, whether for in-game simulation of physics or for conventional computational science. These chips are designed for and driven by video game development; the programming model is unusual, the programming environment is tightly constrained, and the underlying architectures are largely secret.
Potential Research Areas GPGPU Building Blocks Mapping computational concepts to the GPU Linear algebra Sorting and searching Geometric Computing High-level Languages and Debugging Tools Computational Building blocks Math: Linear Algebra, Finite Difference, Finite Element General Algorithms: Searching, Sorting, etc.
Progress on GPGPU GPGPU Programming Library GLIT, Accelerator Increased pressure on manufacturers from "GPGPU users" to improve hardware design, usually focusing on adding more flexibility to the programming model.
Summary The graphics processor (GPU) on today's commodity video cards has evolved into an extremely powerful and flexible processor. The latest graphics architectures provide tremendous memory bandwidth and computational horsepower, with fully programmable vertex and pixel processing units that support vector operations up to full IEEE floating point precision. High level languages have emerged for graphics hardware, making this computational power accessible. Architecturally, GPUs are highly parallel streaming processors optimized for vector operations, with both MIMD (vertex) and SIMD (pixel) pipelines. GPUs are capable of general-purpose computation beyond the graphics applications for which they were designed. But application programming barriers need to be taken down.