ATI GPUs and Graphics APIs Mark Segal
ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs X1900: 48 fragment ALU cores Dynamic flow control 256Mb, 512Mb configurations 650 MHz engine, 775 MHz memory (X1900 XTX)
X1K Fragment Processor Features Dynamic Flow Control Branching (IF…ELSE), Looping, Subroutines 128-bit (4 x 32) Floating-Point Processing For pixel and vertex shaders Longer Shaders (512 instructions) X1900 XT 120 Gflops peak compute (fragment processors only) 60 Gflops (measured) on dense matrix-matrix multiply
Threading When a fragment program hits a stall, switch to another fragment that’s ready to go E.g. texture read takes many cycles Latency hiding Many fragments in various stages of completion at any one time Multiple calculations in flight Requires storage for stalled fragments’ data Can use unused temporary registers if available Flow control
Graphics Programming Interfaces Provide software interface to graphics hardware Lowest level: Expose full functionality of hardware at full performance Hide device-specific details Limit interface changes generation to generation Higher levels: Simplify application programming E.g. for graphics: scene graph, shading languages Current interfaces aren’t so lowest-level anymore
New low-level interface Distinguish two characteristics Data path, routing, memory parallel data processors Expose Programmability; jettison fixed function Machine language Compiler for higher-level languages Use libraries Expose memory capabilities and routing Stripped-down interface Command Processor Vertex Processor Rasterizer Fragment Processor Per-Pixel Operations Graphics Memory host P P P F F P: programmable F: fixed
Compare with OpenGL No Begin/End or immediate mode No vertex transform No texture environment OpenGL is an application layered on this Benefit: simplified driver Much less state management No software path Better support, faster addition of new features Better match to GPGPU Benefit: greater control over memory usage
Conclusion Good image quality requires lots of computation Recent GPUs have lots of computational power Don’t forget details, like memory Starting to see use for effects in games games still drive the market, and they always need more performance Current graphics APIs aren’t quite up to the task of presenting the hardware’s computational abilities outside of graphics
Games and Numerical Applications Game physics Collision detection, rigid body dynamics, particle systems, fluid (water) simulation, cloth and hair, etc. Collision detection + response “shaders” Game play physics vs. effects physics Game play physics affects game outcome Effects physics affects display only E.g. water, trees, rubble, cloth and hair