Real-World GPGPU Mark Harris NVIDIA Developer Technology
Copyright © NVIDIA Corporation 2004 GPGPU Research Promises Big Speedups physically-based simulation image processing scientific computing computer vision computational finance medical imaging bioinformatics databases and data mining sorting ray tracing Researchers have tried many applications on GPUs Research results promise big speedups LU-GPU dense linear system solver: 10x CPU (UNC) GPUTeraSort: 2006 Indy PennySort Champion (UNC) ClawHMMr streaming sequence search: 5-20x CPU (Stanford)
Copyright © NVIDIA Corporation 2004 Raw Data Promises High Perf, Too GPU Observed GFLOPS CPU Theoretical peak GFLOPS NVIDIA GPU Pixel Shader GFLOPS
Copyright © NVIDIA Corporation 2004 Real-World Performance Gains Do research results and high peak performance translate to real application speedups? Real-World Applications Medical Imaging (Mercury Computer Systems) Electromagnetic Simulations (NVIDIA Partner) Game Physics (Havok)
© 2006 Mercury Computer Systems, Inc. Digital Breast Tomosynthesis (DBT) 100X reconstruction speed-up with NVIDIA Quadro FX 4500 GPU From hours to minutes Facilitates clinical use Improved diagnostic value Clearer images Fewer obstructions Earlier detection Axis of rotation Compressed breast Digital detector X-Ray tube Compression paddle 11 Low-dose X-ray Projections Extremely Computationally Intense Reconstruction Advanced Imaging Solution of the Year “Mercury reduced reconstruction time from 5 hours to 5 minutes, making DBT clinically viable. …among 70 women diagnosed with breast cancer, DBT pinpointed 7 cases not seen with mammography” Pioneering DBT work at Massachusetts General Hospital
Copyright © NVIDIA Corporation 2004 Electromagnetic Simulation 3D Finite-Difference and Finite-Element Modeling of: Cell phone irradiation MRI Design / Modeling Printed Circuit Boards Radar Cross Section (Military) Computationally Intensive! Large speedups with Quadro GPUs Pacemaker with Transmit Antenna Commercial, Optimized, Mature Software Single CPU, 3.x GHz 5X 10X 1X 18X # Quadro FX 4500 GPUs
Copyright © NVIDIA Corporation 2004 Havok FX Physics on NVIDIA GPUs Physics-based effects on a massive scale 10,000s of objects at high frame rates Rigid bodies Particles Fluids Cloth and more
Copyright © NVIDIA Corporation 2004 Dedicated Performance For Physics Performance Measurement 15,000 Boulder Scene Frame Rate CPU Physics Dual Core P4EE GHz GeForce 7900GTX SLI CPU Multi-threading enabled GPU Physics Dual Core P4EE GHz GeForce 7900GTX SLI CPU Multi-threading enabled 6.2 fps 64.5 fps
Copyright © NVIDIA Corporation 2004 GPGPU Performance Strategies Choose applications with high Arithmetic Intensity Arithmetic Intensity = Arithmetic / Bandwidth Game physics top kernels = very high A.I. > 1500 cycles per collision, ~100 texture fetches Leverage strengths of all processors in the system GPUs: data-parallel computation CPUs: sequential computation Multi-core CPUs: task-parallel computation Find the parallelism in the application Data dependencies can make problem appear sequential Divide into batches of independent parallelism
Copyright © NVIDIA Corporation 2004 Rigid Body Dynamics Overview 3 phases to every simulation time step Integrate positions and velocities Detect collisions Resolve collisions Integration is very parallel No dependencies between objects: use the GPU Detecting collisions is basically scene traversal CPU is good at this – use it Resolving collisions is a tricky one Is it parallel enough for the GPU?
Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts & Velocities Body Slide courtesy of Andrew Bond, Havok
Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts & Velocities Body Slide courtesy of Andrew Bond, Havok
Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts Solve link 1 Solve link 2 Solve link N Solve link 1 Solve link 2 Solve link N Solve link 1 Solve link 2 Solve link N Batch 1Batch 2Batch M Slide courtesy of Andrew Bond, Havok
Copyright © NVIDIA Corporation 2004 Conclusion Real-World GPGPU is just beginning! Questions?