第七课 GPU & GPGPU.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
CS-378: Game Technology Lecture #9: More Mapping Prof. Okan Arikan University of Texas, Austin Thanks to James O’Brien, Steve Chenney, Zoran Popovic, Jessica.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
Computer Graphics Hardware Acceleration for Embedded Level Systems Brian Murray
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Status – Week 277 Victor Moya.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Vertex & Pixel Shaders CPS124 – Computer Graphics Ferdinand Schober.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
CSE 690 General-Purpose Computation on Graphics Hardware (GPGPU) Courtesy David Luebke, University of Virginia.
General-Purpose Computation on Graphics Hardware.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Computer Graphics Graphics Hardware
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Cg Programming Mapping Computational Concepts to GPUs.
General-Purpose Computation on Graphics Hardware.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
The GPU Revolution: Programmable Graphics Hardware David Luebke University of Virginia.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
David Luebke 1 11/24/2015 Programmable Graphics Hardware.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Computing & Information Sciences Kansas State University Lecture 12 of 42CIS 636/736: (Introduction to) Computer Graphics CIS 636/736 Computer Graphics.
David Luebke 1 1/25/2016 Programmable Graphics Hardware.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mapping Computational Concepts to GPUs Mark Harris NVIDIA.
Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
Computer Graphics Graphics Hardware
GPU Architecture and Its Application
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Programmable Pipelines
A Crash Course on Programmable Graphics Hardware
CS427 Multicore Architecture and Parallel Computing
Graphics Processing Unit
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
Models and Architectures
Models and Architectures
Models and Architectures
Introduction to Computer Graphics with WebGL
Graphics Processing Unit
Models and Architectures
Computer Graphics Graphics Hardware
CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics
Models and Architectures
RADEON™ 9700 Architecture and 3D Performance
Computer Graphics Introduction to Shaders
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

第七课 GPU & GPGPU

Overview Traditional Graphics Pipeline Programmable Graphics Pipeline Vertex Shader Fragment (Pixel) Shader Brief Intro of Cg GPGPU (General Purpose GPU)

Generation I: 3dfx Voodoo (1996) One of the first true 3D game cards Worked by supplementing standard 2D video card. Did not do vertex transformations: these were done in the CPU Did do texture mapping, z-buffering. http://accelenation.com/?ac.id.123.2 Vertex Transforms Primitive Assembly Rasterization and Interpolation Raster Operations Frame Buffer CPU GPU PCI

Generation II: GeForce/Radeon 7500 (1998) Main innovation: shifting the transformation and lighting calculations to the GPU Allowed multi-texturing: giving bump maps, light maps, and others.. Faster AGP bus instead of PCI http://accelenation.com/?ac.id.123.5 Vertex Transforms Primitive Assembly Rasterization and Interpolation Raster Operations Frame Buffer GPU AGP

Generation III: GeForce3/Radeon 8500(2001) For the first time, allowed limited amount of programmability in the vertex pipeline Also allowed volume texturing and multi-sampling (for antialiasing) http://accelenation.com/?ac.id.123.7 Vertex Transforms Primitive Assembly Rasterization and Interpolation Raster Operations Frame Buffer GPU AGP Small vertex shaders

Generation IV: Radeon 9700/GeForce FX (2002) This generation is the first generation of fully-programmable graphics cards Different versions have different resource limits on fragment/vertex programs http://accelenation.com/?ac.id.123.8 Vertex Transforms Primitive Assembly Rasterization and Interpolation Raster Operations Frame Buffer AGP Programmable Vertex shader Programmable Fragment Processor

Traditional Graphics PipeLine CPU GPU Graphics State Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform & Light Assemble Primitives Rasterize Shade Vertices (3D) Video Memory (Textures) Render-to-texture A simplified graphics pipeline Note that pipe widths vary Many caches, FIFOs, and so on not shown

Pipeline : Transform Transform & light Transform from “world space” to “image space” Compute per-vertex lighting

ModelView Transformation Vertices mapped from object space to world space M = model transformation (scene) V = view transformation (camera) Each matrix transform is applied to each vertex in the input stream. Think of this as a kernel operator. X’ Y’ Z’ W’ X Y Z 1 M * V *

Color(v) = emissive + ambient + diffuse + specular Lighting Lighting information is combined with normals and other parameters at each vertex in order to create new colors. Color(v) = emissive + ambient + diffuse + specular Each term in the right hand side is a function of the vertex color, position, normal and material properties.

Pipeline : Rasterizer Rasterizer Convert geometric rep. (vertex) to image rep. (fragment) Fragment = image fragment Pixel + associated data: color, depth, stencil, etc. Interpolate per-vertex quantities across pixels

Pipeline: Shade Fragment processors (multiple in parallel) Compute a color for each pixel Optionally read colors from textures (images)

The Modern Graphics Pipeline CPU GPU Graphics State Vertex Processor Xformed, Lit Vertices (2D) Screenspace triangles (2D) Fragment Processor Fragments (pre-pixels) Final Pixels (Color, Depth) Application Transform & Light Assemble Primitives Rasterize Shade Vertices (3D) Video Memory (Textures) Render-to-texture Programmable vertex processor! Programmable pixel processor!

The Current Graphics Pipeline CPU GPU Graphics State Xformed, Lit Vertices (2D) Geometry Processor Screenspace triangles (2D) Fragments (pre-pixels) Final Pixels (Color, Depth) Application Vertex Processor Assemble Primitives Rasterize Fragment Processor Vertices (3D) Video Memory (Textures) Render-to-texture Programmable primitive assembly! More flexible memory access!

NVIDIA GeForce 6800 3D Pipeline Vertex Triangle Setup Z-Cull Shader Instruction Dispatch Fragment L2 Tex Fragment Crossbar Composite Memory Partition Memory Partition Memory Partition Memory Partition

Precision 32-bit IEEE floating-point throughout pipeline Framebuffer Textures Fragment processor Vertex processor Interpolants

Vertex Processor Fully programmable (SIMD / MIMD) Processes 4-vectors (RGBA / XYZW) Capable of scatter but not gather Can change the location of current vertex Cannot read info from other vertices Can only read a small constant memory Latest GPUs: Vertex Texture Fetch Random access memory for vertices Gather (But not from the vertex stream itself)

Vertex processor capabilities 4-vector FP32 operations Condition codes + true data-dependent control flow Conditional branches, subroutine calls, jump table Useful for avoiding extra work, e.g.: Don’t do animation, skinning if vertex will be clipped Do displacement mapping only for vertices near silhouette Transcendental arithmetic instructions (e.g. COS) User clip-plane support Texture reads (up to 4 textures, unlimited lookups)

Vertex processor limitations No arbitrary memory write No “vertex kill” Can put vertex off-screen Can make degenerate primitives Only 32-bit texture formats supported

Fragment Processor Fully programmable (SIMD) Processes 4-component vectors (RGBA / XYZW) Random access memory read (textures) Capable of gather but not scatter RAM read (texture fetch), but no RAM write Output address fixed to a specific pixel Typically more useful than vertex processor More fragment pipelines than vertex pipelines Direct output (fragment processor is at end of pipeline)

Fragment processor: texture mapping Texture reads are just another instruction Allows computed texture coordinates, nested to arbitrary depth This is a big difference w/ NVIDIA and ATI right now Allows multiple uses of a single texture unit Optional LOD control – can specify filter extent Think of it as a memory-read instruction, with optional user-controlled filtering

Fragment processor capabilities Dynamic branching Conditional fragment-kill instruction Read access to window-space position Read/write access to fragment Z (but not stencil) Multiple render targets Built-in derivative instructions Partial derivatives w.r.t. screen-space x or y Useful for anti-aliasing shaders FP32, FP16, and fixed-point data

Fragment processor limitations Dynamic branching less efficient than vertex proc. Especially for non-coherent branching (<~ 30x30 pixels) Can do a lot with condition codes No indexed reads from registers I.e., no indexed arrays Must use texture reads instead No arbitrary memory write

GPU vendor differences Note: this slide will be dated almost instantly NVIDIA: as described in previous slides ATI hardware today (1900XT current high-end part): No vertex texture fetch (but good render-to-vertex-array) Far fewer levels of computed texture coordinates Better at fine-grained (less coherent) dynamic branching ATI Xenos (Xbox 360 chip): Unified shader model: vertex proc == pixel proc Scatter support: shaders can write arbitrary memory loc

Cg : C for Graphics Cg is a high-level GPU programming language Designed by NVIDIA and Microsoft Competes with the (quite similar) GL Shading Language, a.k.a GLslang

Programming in assembly is painful Cg … FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D; … … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/64.0 + 1.0/128.0; ocoord2 = ocoord1 + 1.0/64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); Easier to read and modify Cross-platform Combine pieces etc.

Some points in the design space CPU languages C – close to the hardware; general purpose C++, Java, lisp – require memory management RenderMan – specialized for shading Real-time shading languages Stanford shading language Creative Labs shading language

Design strategy Start with C (and a bit of C++) Minimizes number of decisions Gives you known mistakes instead of unknown ones Allow subsetting of the language Add features desired for GPU’s To support GPU programming model To enable high performance Tweak to make it fit together well

How are GPUs different from CPUs? GPU is a stream processor Multiple programmable processing units Connected by data flows Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Application Framebuffer Textures

How are GPUs different from CPUs? Greater variation in basic capabilities Most processors don’t yet support branching Vertex processors don’t support texture mapping Some processors support additional data types Compiler can’t hide these differences Least-common-denominator is too restrictive Cg exposes differences via language profiles (list of capabilities and data types) Over time, profiles will converge

How are GPUs different from CPUs? Optimized for 4-vector arithmetic Useful for graphics – colors, vectors, texcoords Easy way to get high performance/cost C philosophy says: expose these HW data types Cg has vector data types and operations e.g. float2, float3, float4 Makes it obvious how to get high performance Cg also has matrix data types e.g. float3x3, float3x4, float4x4

How are GPUs different from CPUs? No support for pointers Arrays are first-class data types in Cg No integer data type Cg adds “bool” data type for boolean operations This change isn’t obvious except when declaring vars

Cg basic data types All profiles: All profiles with texture lookups: float bool All profiles with texture lookups: sampler1D, sampler2D, sampler3D, samplerCUBE NV_fragment_program profile: half -- half-precision float fixed -- fixed point [-2,2)

Cg Example The following fragment program implements a (very) simple toon shader Flat 3-tone shading Highlight Base color Shadow Black silhouettes

Cg Example – part 1 // Get eye-space eye vector. // In: // eye_space position = TEX7 // eye space T = (TEX4.x, TEX5.x, TEX6.x) denormalized // eye space B = (TEX4.y, TEX5.y, TEX6.y) denormalized // eye space N = (TEX4.z, TEX5.z, TEX6.z) denormalized fragout frag program main(vf30 In) { float m = 30; // power float3 hiCol = float3( 1.0, 0.1, 0.1 ); // lit color float3 lowCol = float3( 0.3, 0.0, 0.0 ); // dark color float3 specCol = float3( 1.0, 1.0, 1.0 ); // specular color // Get eye-space eye vector. float3 e = normalize( -In.TEX7.xyz ); // Get eye-space normal vector. float3 n = normalize(float3(In.TEX4.z, In.TEX5.z, In.TEX6.z));

Cg Example – part 2 float edgeMask = (dot(e, n) > 0.4) ? 1 : 0; float3 lpos = float3(3,3,3); float3 l = normalize(lpos - In.TEX7.xyz); float3 h = normalize(l + e); float specMask = (pow(dot(h, n), m) > 0.5) ? 1 : 0; float hiMask = (dot(l, n) > 0.4) ? 1 : 0; float3 ocol1 = edgeMask * (lerp(lowCol, hiCol, hiMask) + (specMask *specCol)); fragout O; O.COL = float4(ocol1.x, ocol1.y, ocol1.z, 1); return O; }

GPGPU The graphics processing unit (GPU) on commodity video cards has evolved into an extremely flexible and powerful processor Programmability Precision Power GPGPU: an emerging field seeking to harness GPUs for general-purpose computation

Motivation: Computational Power GPUs are fast… 3.0 GHz dual-core Pentium4: 24.6 GFLOPS NVIDIA GeForceFX 7800: 165 GFLOPs 1066 MHz FSB Pentium Extreme Edition : 8.5 GB/s ATI Radeon X850 XT Platinum Edition: 37.8 GB/s GPUs are getting faster, faster CPUs: 1.4× annual growth GPUs: 1.7×(pixels) to 2.3× (vertices) annual growth

Motivation: Computational Power

Motivation: Flexible and Precise Modern GPUs are deeply programmable Programmable pixel, vertex, video engines Solidifying high-level language support Modern GPUs support high precision 32 bit floating point throughout the pipeline High enough for many (not all) applications

Motivation: The Potential of GPGPU The power and flexibility of GPUs makes them an attractive platform for general-purpose computation Example applications range from in-game physics simulation to conventional computational science Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor

Problems: Difficult To Use GPUs designed for & driven by video games Programming model unusual Programming idioms tied to computer graphics Programming environment tightly constrained Underlying architectures are: Inherently parallel Rapidly evolving (even in basic feature set!) Largely secret Can’t simply “port” CPU code!

GPGPU Why GPU for General Purpose Computing? How Programming?