Graphics Processors CMSC 411. GPU graphics processing model Texture / Buffer Texture / Buffer Vertex Geometry Fragment CPU Displayed Pixels Displayed.

Slides:



Advertisements
Similar presentations
Vectors, SIMD Extensions and GPUs COMP 4611 Tutorial 11 Nov. 26,
Advertisements

Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Optimization on Kepler Zehuan Wang
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Utilization of GPU’s for General Computing Presenter: Charlene DiMeglio Paper: Aspects of GPU for General Purpose High Performance Computing Suda, Reiji,
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
GPUs. An enlarging peak performance advantage: –Calculation: 1 TFLOPS vs. 100 GFLOPS –Memory Bandwidth: GB/s vs GB/s –GPU in every PC and.
1 Threading Hardware in G80. 2 Sources Slides by ECE 498 AL : Programming Massively Parallel Processors : Wen-Mei Hwu John Nickolls, NVIDIA.
Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow Wilson W. L. Fung Ivan Sham George Yuan Tor M. Aamodt Electrical and Computer Engineering.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Evolutions of GPU Architectures Andrew Coile CMPE220 3/2007.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 April 4, 2013 © Barry Wilkinson CUDAIntro.ppt.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
1 ITCS 4/5010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Dec 31, 2012 Emergence of GPU systems and clusters for general purpose High Performance Computing.
GPGPU Ing. Martino Ruggiero Ing. Andrea Marongiu
© David Kirk/NVIDIA and Wen-mei W. Hwu, 2007 ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 7: Threading Hardware in G80.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Extracted directly from:
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
Cg Programming Mapping Computational Concepts to GPUs.
Lecture 8 : Manycore GPU Programming with CUDA Courtesy : Prof. Christopher Cooper’s and Prof. Chowdhury’s course note slides are used in this lecture.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 CS 395 Winter 2014 Lecture 17 Introduction to Accelerator.
Emergence of GPU systems and clusters for general purpose high performance computing ITCS 4145/5145 April 3, 2012 © Barry Wilkinson.
GPU Computation Strategies & Tricks Ian Buck NVIDIA.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Lectures 8: Threading Hardware in G80.
ICAL GPU 架構中所提供分散式運算 之功能與限制. 11/17/09ICAL2 Outline Parallel computing with GPU NVIDIA CUDA SVD matrix computation Conclusion.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
David Angulo Rubio FAMU CIS GradStudent. Introduction  GPU(Graphics Processing Unit) on video cards has evolved during the last years. They have become.
Lecture 3: Computer Architectures
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Computer Architecture Lecture 24 Parallel Processing Ralph Grishman November 2015 NYU.
GPGPU introduction. Why is GPU in the picture Seeking exa-scale computing platform Minimize power per operation. – Power is directly correlated to the.
My Coordinates Office EM G.27 contact time:
Advanced Science and Technology Letters Vol.43 (Multimedia 2013), pp Superscalar GP-GPU design of SIMT.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Oct 30, 2014.
NVIDIA® TESLA™ GPU Based Super Computer By : Adam Powell Student # For COSC 3P93.
Our Graphics Environment Landscape Rendering. Hardware  CPU  Modern CPUs are multicore processors  User programs can run at the same time as other.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 July 12, 2012 © Barry Wilkinson CUDAIntro.ppt.
GPU Architecture and Its Application
CMSC 611: Advanced Computer Architecture
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures Topic 17 NVIDIA GPU Computational Structures Prof. Zhang Gang
Graphics Hardware CMSC 491/691.
Emergence of GPU systems for general purpose high performance computing ITCS 4145/5145 © Barry Wilkinson GPUIntro.ppt Nov 4, 2013.
Presented by: Isaac Martin
Graphics Processing Unit
Coe818 Advanced Computer Architecture
UMBC Graphics for Games
Ray Tracing on Programmable Graphics Hardware
6- General Purpose GPU Programming
CSE 502: Computer Architecture
Multicore and GPU Programming
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
CIS 6930: Chip Multiprocessor: Parallel Architecture and Programming
Presentation transcript:

Graphics Processors CMSC 411

GPU graphics processing model Texture / Buffer Texture / Buffer Vertex Geometry Fragment CPU Displayed Pixels Displayed Pixels

GPU computation NVIDIA GeForce 7800 – ~860M vertices/sec – ~172M triangles/sec – ~6.9G fragments/sec – ~10.3G texels/sec – ~165 GFLOPS – ~2.4x increase/year 3 GHz Dual-core Pentium 4 – ~24.6 GFLOPS – ~1.5x increase/year [Luebke, GPGPU SIGGRAPH Course, 2005; Kilgard, Real-Time Shading SIGGRAPH course, 2006]

Pipeline vs. Parallel Pipeline – Break task up – Overlap sub-tasks Parallel – Processor per data element – Each runs full task – Many data elements …

NVIDIA GeForce 6 [Kilgaraff and Fernando, GPU Gems 2]

GeForce 6 Vertex Pixel Triangle Pipeline … Parallel More Parallel More Pipeline More Parallel

Old GPU graphics processing model Texture / Buffer Texture / Buffer Vertex Geometry Fragment CPU Displayed Pixels Displayed Pixels

New GPU graphics processing model Texture / Buffer Texture / Buffer CPU Displayed Pixels Displayed Pixels Vertex Geometry Fragment

NVIDIA G80 [NVIDIA 8800 Architectural Overview, NVIDIA TB _v01, November 2006]

Multiple Instruction, Multiple Data (MIMD) Multiple communicating processes

Single Instruction, Multiple Data (SIMD) Vector/Array computers

Architecture (MIMD vs SIMD) CTRL ALU CTRL ALU CTRL ALU MIMD(CPU-Like)SIMD (GPU-Like) CTRL Flexibility Horsepower Ease of Use

SIMD Branching if( x ) // mask threads // issue instructions else // invert mask // issue instructions // unmask Threads agree, issue if Threads disagree, issue if AND else Threads agree, issue else

SIMD Looping while(x) // update mask // do stuff They all run ‘till the last one’s done…. Useful Useless

Streaming Processors

NVIDIA Fermi [Beyond3D NVIDIA Fermi GPU and Architecture Analysis, 2010]

NVIDA Fermi SM [NVIDIA, NVIDIA’s Next Generation CUDA Compute Architecture: Fermi, 2009]

Architecture: Latency CPU: Make one thread go very fast Avoid the stalls –Branch prediction –Out-of-order execution –Memory prefetch –Big caches GPU: Make 1000 threads go very fast Hide the stalls –HW thread scheduler –Swap threads to hide stalls

NVIDIA Kepler [NVIDIA, NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110, 2012]

Kepler SMX [NVIDIA, NVIDIA’s Next Generation CUDA Compute Architecture: Kepler GK110, 2012]