Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Advertisements

Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
CS-378: Game Technology Lecture #9: More Mapping Prof. Okan Arikan University of Texas, Austin Thanks to James O’Brien, Steve Chenney, Zoran Popovic, Jessica.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Status – Week 277 Victor Moya.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Vertex & Pixel Shaders CPS124 – Computer Graphics Ferdinand Schober.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
General-Purpose Computation on Graphics Hardware.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Over View of the GPU Architecture CS7080 Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad &
Real-time Graphical Shader Programming with Cg (HLSL)
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
09/09/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Event management Lag Group assignment has happened, like it or not.
Cg Programming Mapping Computational Concepts to GPUs.
Visibility III: Occlusion Queries CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia.
General-Purpose Computation on Graphics Hardware.
The programmable pipeline Lecture 3.
CSE 690: GPGPU Lecture 6: Cg Tutorial Klaus Mueller Computer Science, Stony Brook University.
The GPU Revolution: Programmable Graphics Hardware David Luebke University of Virginia.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
David Luebke 1 11/24/2015 Programmable Graphics Hardware.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Computing & Information Sciences Kansas State University Lecture 12 of 42CIS 636/736: (Introduction to) Computer Graphics CIS 636/736 Computer Graphics.
David Luebke 1 1/20/2016 Real-Time Rendering CS 446 David Luebke.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
David Luebke 1 1/25/2016 Programmable Graphics Hardware.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
第七课 GPU & GPGPU.
Programmable Pipelines
A Crash Course on Programmable Graphics Hardware
Graphics Processing Unit
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Graphics Processing Unit
CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics
RADEON™ 9700 Architecture and 3D Performance
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Presentation transcript:

Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia

2 David Luebke Real-Time Rendering Recap: Advanced Texturing Billboards –Screen-aligned, world-aligned Point sprites Imposters –Trees, buildings, portal textures, billboard clouds –Dynamic imposters for “caching” rendering results Depth textures Multitexturing –Low-res light maps, hi-res decals, etc

3 David Luebke Real-Time Rendering Textures: Other Important Stuff Render to texture – framebuffer objects (FBOs) –Multiple render targets Environment maps –Sphere map, cube maps (hardware supported) Shadow maps –A depth texture rendered from light source (more later) Relief textures –Demo now, details later

4 David Luebke Real-Time Rendering Textures: Still More Stuff Normal maps – especially for bump mapping –Gloss maps, reflectance maps, etc Generally: –Think of textures as global memory for fragment programs, with built-in filtering –Just starting to be able to access textures in vertex programs too (NVIDIA hardware only, today) Deferred shading Projective texture mapping

5 David Luebke Real-Time Rendering Next topic: Cg Many of the techniques we discuss in this class do not depend on programmable graphics hardware –But even those are often easier to implement! And programmable graphics opens up an endless number of tricks and techniques that could not have been efficiently implemented before So, the next topic is a brief intro to Cg –My apologies to those of you who’ve seen this –My apologies to those of you who haven’t

6 David Luebke Real-Time Rendering Acknowledgement & Aside Much of this lecture comes from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology For this reason, and because the lab is outfitted with NVIDIA cards, we will focus on NVIDIA tech I try to mention similarities and differences with ATI, the other main GPU vendor, in lecture and slides Note: many/most images are from NVIDIA as well

7 David Luebke Real-Time Rendering GPU The Graphics Pipeline A simplified graphics pipeline –Note that pipe widths vary –Many caches, FIFOs, and so on not shown CPU Application Transform & Light Rasterize Shade Video Memory (Textures) Graphics State Render-to-texture Assemble Primitives Vertices (3D) Final Pixels (Color, Depth) Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D)

8 David Luebke Real-Time Rendering GPU Pipeline: Transform Transform & light (a.k.a. vertex processor) –Transform from “world space” to “image space” –Compute per-vertex lighting Courtesy Mark Harris

9 David Luebke Real-Time Rendering GPU Pipeline: Rasterize Rasterizer –Convert geometric rep. (vertex) to image rep. (fragment) Fragment = image fragment –Pixel + associated data: color, depth, stencil, etc. –Interpolate per-vertex quantities across pixels Courtesy Mark Harris

GPU Pipeline: Shade Fragment processors (multiple in parallel) –Compute a color for each pixel –Optionally read colors from textures (images) Courtesy Mark Harris

11 David Luebke Real-Time Rendering GPU The Modern Graphics Pipeline CPU Application Transform & Light Rasterize Shade Video Memory (Textures) Graphics State Render-to-texture Assemble Primitives Vertices (3D) Final Pixels (Color, Depth) Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D) Programmable vertex processor! Programmable pixel processor! Fragment Processor Vertex Processor

12 David Luebke Real-Time Rendering GPU The Coming Soon Graphics Pipeline CPU Application Video Memory (Textures) Graphics State Render-to-texture Vertices (3D) Final Pixels (Color, Depth) Vertex Processor Fragments (pre-pixels) Screenspace triangles (2D) Xformed, Lit Vertices (2D) Rasterize Fragment Processor Assemble Primitives Programmable primitive assembly! More flexible memory access! Geometry Processor

13 David Luebke Real-Time Rendering 32-bit IEEE floating-point throughout pipeline –Framebuffer –Textures –Fragment processor –Vertex processor –Interpolants Precision

14 David Luebke Real-Time Rendering Multiple data types in hardware Can support 32-bit IEEE floating point throughout pipeline –Vertices, interpolants, framebuffer, textures, computations Fragment processor also supports: –16-bit “half” floating point, 12-bit fixed point –These may be faster than 32-bit Framebuffer/textures also support: –Large variety of fixed-point formats E.g., classical 8-bit per component RGBA, BGRA, etc. –These formats use less memory bandwidth than FP32

15 David Luebke Real-Time Rendering Vertex processor capabilities 4-vector FP32 operations Condition codes + true data-dependent control flow –Conditional branches, subroutine calls, jump table –Useful for avoiding extra work, e.g.: Don’t do animation, skinning if vertex will be clipped Do displacement mapping only for vertices near silhouette –Transcendental arithmetic instructions (e.g. COS) User clip-plane support Texture reads (up to 4 textures, unlimited lookups)

16 David Luebke Real-Time Rendering Vertex processor limitations No arbitrary memory write No “vertex kill” –Can put vertex off-screen –Can make degenerate primitives Only 32-bit texture formats supported

17 David Luebke Real-Time Rendering NV40-G70 vertex processor resources instructions per program Other statistics (NV30, not sure about NV40-G70): –16 temporary 4-vector registers –256 “uniform” parameter registers –2 address registers (4-vector) –6 clip-distance outputs

18 David Luebke Real-Time Rendering Fragment processor: texture mapping Texture reads are just another instruction Allows computed texture coordinates, nested to arbitrary depth –This is a big difference w/ NVIDIA and ATI right now Allows multiple uses of a single texture unit Optional LOD control – can specify filter extent Think of it as a memory-read instruction, with optional user-controlled filtering

19 David Luebke Real-Time Rendering Fragment processor capabilities Dynamic branching Conditional fragment-kill instruction Read access to window-space position Read/write access to fragment Z (but not stencil) Multiple render targets Built-in derivative instructions –Partial derivatives w.r.t. screen-space x or y –Useful for anti-aliasing shaders FP32, FP16, and fixed-point data

20 David Luebke Real-Time Rendering Fragment processor limitations Dynamic branching less efficient than vertex proc. –Especially for non-coherent branching (<~ 30x30 pixels) –Can do a lot with condition codes No indexed reads from registers –I.e., no indexed arrays –Must use texture reads instead No arbitrary memory write

21 David Luebke Real-Time Rendering Fragment processor resources instructions Nearly unlimited constants –Each constant counts as one instruction 16 texture units (NV30, still?), reuse as often as desired 10 FP32 x 4 perspective-correct inputs (e.g. tex coords) Up to bit framebuffer “color” outputs –Can pack as 4 x FP32, 8 x FP16, etc…) Can also set the depth output –24 or 32 bits, depending on stencil –Changing depth in fragment program may disable Z-optimizations

22 David Luebke Real-Time Rendering GPU vendor differences Note: this slide will be dated almost instantly NVIDIA: as described in previous slides ATI hardware today (1900XT current high-end part): –No vertex texture fetch (but good render-to-vertex-array) –Far fewer levels of computed texture coordinates –Better at fine-grained (less coherent) dynamic branching ATI Xenos (Xbox 360 chip): –Unified shader model: vertex proc == pixel proc –Scatter support: shaders can write arbitrary memory loc

23 David Luebke Real-Time Rendering Cg – “C for Graphics” Cg is a high-level GPU programming language Designed by NVIDIA and Microsoft Competes with the (quite similar) GL Shading Language, a.k.a GLslang

24 David Luebke Real-Time Rendering Programming in assembly is painful Easier to read and modify Cross-platform Combine pieces etc. Assembly … FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D; … … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/ /128.0; ocoord2 = ocoord /64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … Cg

25 David Luebke Real-Time Rendering Some points in the design space CPU languages –C – close to the hardware; general purpose –C++, Java, lisp – require memory management –RenderMan – specialized for shading Real-time shading languages –Stanford shading language –Creative Labs shading language

26 David Luebke Real-Time Rendering Design strategy Start with C (and a bit of C++) –Minimizes number of decisions –Gives you known mistakes instead of unknown ones Allow subsetting of the language Add features desired for GPU’s –To support GPU programming model –To enable high performance Tweak to make it fit together well

How are GPUs different from CPUs? 1.GPU is a stream processor –Multiple programmable processing units –Connected by data flows Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures

28 David Luebke Real-Time Rendering Cg separates vertex & fragment programs Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures Program

Cg programs have two kinds of inputs Varying inputs (streaming data) –e.g. normal vector – comes with each vertex –This is the default kind of input Uniform inputs (a.k.a. graphics state) –e.g. modelview matrix Note: Outputs are always varying vout MyVertexProgram( float4 normal, uniform float4x4 modelview) { …

Binding VP outputs to FP inputs a)Let compiler do it –Define a single structure –Use it for vertex-program output –Use it for fragment-program input struct vout { float4 color; float4 texcoord; … };

Binding VP outputs to FP inputs b)Do it yourself –Specify register bindings for VP outputs –Specify register bindings for FP inputs –May introduce HW dependence –Necessary for mixing Cg with assembly struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; … };

Some inputs and outputs are special E.g. the position output from vert prog –This output drives the rasterizer –It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; };