Real-World GPGPU Mark Harris NVIDIA Developer Technology.

Slides:



Advertisements
Similar presentations
Physical Simulation on GPUs Jim Van Verth OpenGL Software Engineer NVIDIA
Advertisements

Intro to GPU’s for Parallel Computing. Goals for Rest of Course Learn how to program massively parallel processors and achieve – high performance – functionality.
HPCC Mid-Morning Break High Performance Computing on a GPU cluster Dirk Colbry, Ph.D. Research Specialist Institute for Cyber Enabled Discovery.
FSOSS Dr. Chris Szalwinski Professor School of Information and Communication Technology Seneca College, Toronto, Canada GPU Research Capabilities.
IMGD 4000: Computer Graphics in Games Emmanuel Agu.
GPGPU Lessons Learned Mark Harris. General-Purpose Computation on GPUs Highly parallel applications Physically-based simulation image processing scientific.
Challenge the future Delft University of Technology Evaluating Multi-Core Processors for Data-Intensive Kernels Alexander van Amesfoort Delft.
A many-core GPU architecture.. Price, performance, and evolution.
GPU Computing with CUDA as a focus Christie Donovan.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.
Brook for GPUs Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, Pat Hanrahan Stanford University DARPA Site Visit, UNC.
Back-Projection on GPU: Improving the Performance Wenlay “Esther” Wei Advisor: Jeff Fessler Mentor: Yong Long April 29, 2010.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Evolutions of GPU Architectures Andrew Coile CMPE220 3/2007.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.
CSE 690 General-Purpose Computation on Graphics Hardware (GPGPU) Courtesy David Luebke, University of Virginia.
General-Purpose Computation on Graphics Hardware.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPU-accelerated Evaluation Platform for High Fidelity Networking Modeling 11 December 2007 Alex Donkers Joost Schutte.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Slide 1 / 16 On Using Graphics Hardware for Scientific Computing ________________________________________________ Stan Tomov June 23, 2006.
Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.
David Luebke NVIDIA Research GPU Computing: The Democratization of Parallel Computing.
Thermoacoustics in random fibrous materials Seminar Carl Jensen Tuesday, March
Computationally Efficient Histopathological Image Analysis: Use of GPUs for Classification of Stromal Development Olcay Sertel 1,2, Antonio Ruiz 3, Umit.
Havok. ©Copyright 2006 Havok.com (or its licensors). All Rights Reserved. HavokFX Next Gen Physics on ATI GPUs Andrew Bowell – Senior Engineer Peter Kipfer.
Computer Graphics Graphics Hardware
Implementation of Parallel Processing Techniques on Graphical Processing Units Brad Baker, Wayne Haney, Dr. Charles Choi.
GPUs and Accelerators Jonathan Coens Lawrence Tan Yanlin Li.
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
GPU Computing April GPU Outpacing CPU in Raw Processing GPU NVIDIA GTX cores 1.04 TFLOPS CPU GPU CUDA Architecture Introduced DP HW Introduced.
Gregory Fotiades.  Global illumination techniques are highly desirable for realistic interaction due to their high level of accuracy and photorealism.
GPU in HPC Scott A. Friedman ATS Research Computing Technologies.
1 © 2012 The MathWorks, Inc. Parallel computing with MATLAB.
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
Diane Marinkas CDA 6938 April 30, Outline Motivation Algorithm CPU Implementation GPU Implementation Performance Lessons Learned Future Work.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
CSE 690: GPGPU Lecture 7: Matrix Multiplications Klaus Mueller Computer Science, Stony Brook University.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CMPS 5433 Dr. Ranette Halverson Programming Massively.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
1 Ceng 545 GPU Computing. Grading 2 Midterm Exam: 20% Homeworks: 40% Demo/knowledge: 25% Functionality: 40% Report: 35% Project: 40% Design Document:
By Dirk Hekhuis Advisors Dr. Greg Wolffe Dr. Christian Trefftz.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
GPU Programming Shirley Moore CPS 5401 Fall 2013
The Effects of Parallel Programming on Gaming Anthony Waterman.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
CSE 690: GPGPU Lecture 8: Image Processing PDE Solvers Klaus Mueller Computer Science, Stony Brook University.
Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
The Effects of Parallel Programming on Gaming Anthony Waterman.
Bob Merrison-Hort Cambridge, 19 th December 2013.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
Computer Graphics Graphics Hardware
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
Creating distributed rendering applications
GP2: General Purpose Computation using Graphics Processors
Computer Graphics Graphics Hardware
Presentation transcript:

Real-World GPGPU Mark Harris NVIDIA Developer Technology

Copyright © NVIDIA Corporation 2004 GPGPU Research Promises Big Speedups physically-based simulation image processing scientific computing computer vision computational finance medical imaging bioinformatics databases and data mining sorting ray tracing Researchers have tried many applications on GPUs Research results promise big speedups LU-GPU dense linear system solver: 10x CPU (UNC) GPUTeraSort: 2006 Indy PennySort Champion (UNC) ClawHMMr streaming sequence search: 5-20x CPU (Stanford)

Copyright © NVIDIA Corporation 2004 Raw Data Promises High Perf, Too GPU Observed GFLOPS CPU Theoretical peak GFLOPS NVIDIA GPU Pixel Shader GFLOPS

Copyright © NVIDIA Corporation 2004 Real-World Performance Gains Do research results and high peak performance translate to real application speedups? Real-World Applications Medical Imaging (Mercury Computer Systems) Electromagnetic Simulations (NVIDIA Partner) Game Physics (Havok)

© 2006 Mercury Computer Systems, Inc. Digital Breast Tomosynthesis (DBT) 100X reconstruction speed-up with NVIDIA Quadro FX 4500 GPU  From hours to minutes  Facilitates clinical use Improved diagnostic value  Clearer images  Fewer obstructions  Earlier detection Axis of rotation Compressed breast Digital detector X-Ray tube Compression paddle 11 Low-dose X-ray Projections Extremely Computationally Intense Reconstruction Advanced Imaging Solution of the Year “Mercury reduced reconstruction time from 5 hours to 5 minutes, making DBT clinically viable. …among 70 women diagnosed with breast cancer, DBT pinpointed 7 cases not seen with mammography” Pioneering DBT work at Massachusetts General Hospital

Copyright © NVIDIA Corporation 2004 Electromagnetic Simulation 3D Finite-Difference and Finite-Element Modeling of: Cell phone irradiation MRI Design / Modeling Printed Circuit Boards Radar Cross Section (Military) Computationally Intensive! Large speedups with Quadro GPUs Pacemaker with Transmit Antenna Commercial, Optimized, Mature Software Single CPU, 3.x GHz 5X 10X 1X 18X # Quadro FX 4500 GPUs

Copyright © NVIDIA Corporation 2004 Havok FX Physics on NVIDIA GPUs Physics-based effects on a massive scale 10,000s of objects at high frame rates Rigid bodies Particles Fluids Cloth and more

Copyright © NVIDIA Corporation 2004 Dedicated Performance For Physics Performance Measurement 15,000 Boulder Scene Frame Rate CPU Physics Dual Core P4EE GHz GeForce 7900GTX SLI CPU Multi-threading enabled GPU Physics Dual Core P4EE GHz GeForce 7900GTX SLI CPU Multi-threading enabled 6.2 fps 64.5 fps

Copyright © NVIDIA Corporation 2004 GPGPU Performance Strategies Choose applications with high Arithmetic Intensity Arithmetic Intensity = Arithmetic / Bandwidth Game physics top kernels = very high A.I. > 1500 cycles per collision, ~100 texture fetches Leverage strengths of all processors in the system GPUs: data-parallel computation CPUs: sequential computation Multi-core CPUs: task-parallel computation Find the parallelism in the application Data dependencies can make problem appear sequential Divide into batches of independent parallelism

Copyright © NVIDIA Corporation 2004 Rigid Body Dynamics Overview 3 phases to every simulation time step Integrate positions and velocities Detect collisions Resolve collisions Integration is very parallel No dependencies between objects: use the GPU Detecting collisions is basically scene traversal CPU is good at this – use it Resolving collisions is a tricky one Is it parallel enough for the GPU?

Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts & Velocities Body Slide courtesy of Andrew Bond, Havok

Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts & Velocities Body Slide courtesy of Andrew Bond, Havok

Copyright © NVIDIA Corporation 2004 Is Game Physics A Data Parallel Task? Solve Collisions New Velocities Contacts Solve link 1 Solve link 2 Solve link N Solve link 1 Solve link 2 Solve link N Solve link 1 Solve link 2 Solve link N Batch 1Batch 2Batch M Slide courtesy of Andrew Bond, Havok

Copyright © NVIDIA Corporation 2004 Conclusion Real-World GPGPU is just beginning! Questions?