Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.

Slides:



Advertisements
Similar presentations
COMPUTER GRAPHICS SOFTWARE.
Advertisements

CS123 | INTRODUCTION TO COMPUTER GRAPHICS Andries van Dam © 1/16 Deferred Lighting Deferred Lighting – 11/18/2014.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Graphics Pipeline.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Patrick Cozzi University of Pennsylvania CIS Fall 2013
Patrick Cozzi University of Pennsylvania CIS Spring 2012
Computer Graphic Creator: Mohsen Asghari Session 2 Fall 2014.
Dr A Sahu Dept of Comp Sc & Engg. IIT Guwahati 1.
The Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Fall 2012.
Status – Week 277 Victor Moya.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The Graphics Pipeline CS2150 Anthony Jones. Introduction What is this lecture about? – The graphics pipeline as a whole – With examples from the video.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Presentation by David Fong
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
09/18/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Bump Mapping Multi-pass algorithms.
Computer Graphics: Programming, Problem Solving, and Visual Communication Steve Cunningham California State University Stanislaus and Grinnell College.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
GPU Programming Overview Spring 2011 류승택. What is a GPU? GPU stands for Graphics Processing Unit Simply – It is the processor that resides on your graphics.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
CSE 381 – Advanced Game Programming Basic 3D Graphics
Mapping Computational Concepts to GPUs Mark Harris NVIDIA Developer Technology.
Geometric Objects and Transformations. Coordinate systems rial.html.
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Cg Programming Mapping Computational Concepts to GPUs.
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
CSC 461: Lecture 3 1 CSC461 Lecture 3: Models and Architectures  Objectives –Learn the basic design of a graphics system –Introduce pipeline architecture.
CS 638, Fall 2001 Multi-Pass Rendering The pipeline takes one triangle at a time, so only local information, and pre-computed maps, are available Multi-Pass.
OpenGL Conclusions OpenGL Programming and Reference Guides, other sources CSCI 6360/4360.
The programmable pipeline Lecture 3.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
1Computer Graphics Lecture 4 - Models and Architectures John Shearer Culture Lab – space 2
Advanced Computer Graphics Advanced Shaders CO2409 Computer Graphics Week 16.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Computing & Information Sciences Kansas State University Lecture 12 of 42CIS 636/736: (Introduction to) Computer Graphics CIS 636/736 Computer Graphics.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Shadows David Luebke University of Virginia. Shadows An important visual cue, traditionally hard to do in real-time rendering Outline: –Notation –Planar.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
GLSL Review Monday, Nov OpenGL pipeline Command Stream Vertex Processing Geometry processing Rasterization Fragment processing Fragment Ops/Blending.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
GPU Architecture and Its Application
Programmable Pipelines
Graphics Processing Unit
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Graphics Processing Unit
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Course Roadmap ► Graphics Pipeline (GLSL) ► GPGPU (GLSL)  Briefly ► GPU Computing (CUDA, OpenCL) ► Choose your own adventure  Student Presentation  Final Project ► Goal: Prepare you for your presentation and project

Lecture Outline ► A historical perspective on the graphics pipeline  Dimensions of innovation.  Where we are today  Fixed-function vs programmable pipelines ► A closer look at the fixed function pipeline  Walk thru the sequence of operations  Reinterpret these as stream operations ► We can program the fixed-function pipeline !  Some examples ► What constitutes data and memory, and how access affects program design.

The evolution of the pipeline Elements of the graphics pipeline: 1.A scene description: vertices, triangles, colors, lighting 2.Transformations that map the scene to a camera viewpoint 3.“Effects”: texturing, shadow mapping, lighting calculations 4.Rasterizing: converting geometry into pixels 5.Pixel processing: depth tests, stencil tests, and other per-pixel operations. Parameters controlling design of the pipeline: 1.Where is the boundary between CPU and GPU ? 2.What transfer method is used ? 3.What resources are provided at each step ? 4.What units can access which GPU memory elements ?

Generation I: 3dfx Voodoo (1996) One of the first true 3D game cards Worked by supplementing standard 2D video card. Did not do vertex transformations: these were done in the CPU Did do texture mapping, z-buffering. Primitive Assembly Primitive Assembly Vertex Transforms Vertex Transforms Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation CPUGPU PCI

Aside: Mario Kart 64 Image from: ► High fragment load / low vertex load

Aside: Mario Kart Wii High fragment load / low vertex load? Image from:

Vertex Transforms Vertex Transforms Generation II: GeForce/Radeon 7500 (1998) Main innovation: shifting the transformation and lighting calculations to the GPU Allowed multi-texturing: giving bump maps, light maps, and others.. Faster AGP bus instead of PCI Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation GPU AGP

Vertex Transforms Vertex Transforms Generation III: GeForce3/Radeon 8500(2001) For the first time, allowed limited amount of programmability in the vertex pipeline Also allowed volume texturing and multi-sampling (for antialiasing) Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation GPU AGP Small vertex shaders Small vertex shaders

Vertex Transforms Vertex Transforms Generation IV: Radeon 9700/GeForce FX (2002) This generation is the first generation of fully-programmable graphics cards Different versions have different resource limits on fragment/vertex programs Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation AGP Programmable Vertex shader Programmable Vertex shader Programmable Fragment Processor Programmable Fragment Processor Texture Memory

Generation IV.V: GeForce6/X800 (2004) Slide adapted from Suresh Venkatasubramanian and Joe Kider ► Simultaneous rendering to multiple buffers ► True conditionals and loops ► PCIe bus ► Vertex texture fetch Vertex Transforms Vertex Transforms Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation PCIe Programmable Vertex shader Programmable Vertex shader Programmable Fragment Processor Programmable Fragment Processor Texture Memory

NVIDIA NV40 Architecture Image from GPU Gems 2: 6 vertex shader units 16 fragment shader units Vertex Texture Fetch

D3D 10 Pipeline Image from David Blythe :

Generation IV.V: GeForce6/X800 (2004) Not exactly a quantum leap, but… ► Simultaneous rendering to multiple buffers ► True conditionals and loops ► Higher precision throughput in the pipeline (64 bits end-to-end, compared to 32 bits earlier.) ► PCIe bus ► More memory/program length/texture accesses ► Texture access by vertex shader Vertex Transforms Vertex Transforms Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation AGP Programmable Vertex shader Programmable Vertex shader Programmable Fragment Processor Programmable Fragment Processor Texture Memory

Generation V: GeForce8800/HD2900 (2006) Complete quantum leap ► Ground-up rewrite of GPU ► Support for DirectX 10, and all it implies (more on this later) ► Geometry Shader ► Support for General GPU programming ► Shared Memory (NVIDIA only) Input Assembler Input Assembler Programmable Pixel Shader Programmable Pixel Shader Raster Operations Programmable Geometry Shader AGP Programmable Vertex shader Programmable Vertex shader Output Merger

Vertex Index Stream 3D API Commands Assembled Primitives Pixel Updates Pixel Location Stream Programmable Fragment Processor Programmable Fragment Processor Transformed Vertices Programmable Vertex Processor Programmable Vertex Processor GPU Front End GPU Front End Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation 3D API: OpenGL or Direct3D 3D API: OpenGL or Direct3D 3D Application Or Game 3D Application Or Game Pre-transformed Vertices Pre-transformed Fragments Transformed Fragments GPU Command & Data Stream CPU-GPU Boundary (AGP/PCIe) Fixed-function pipeline

Geometry Shaders: Point Sprites

Geometry Shaders Image from David Blythe :

NVIDIA G80 Architecture Slide from David Luebke:

NVIDIA G80 Architecture Slide from David Luebke:

Why Unify Shader Processors? Slide from David Luebke:

Why Unify Shader Processors? Slide from David Luebke:

Unified Shader Processors Slide from David Luebke:

Terminology Shader Model Direct3DOpenGLVideo card Example 292.x NVIDIA GeForce 6800 ATI Radeon X x3.x NVIDIA GeForce 8800 ATI Radeon HD x4.x NVIDIA GeForce GTX 480 ATI Radeon HD 5870

Shader Capabilities Table courtesy of A K Peters, Ltd.

Shader Capabilities Table courtesy of A K Peters, Ltd.

Evolution of the Programmable Graphics Pipeline Slide from Mike Houston:

Evolution of the Programmable Graphics Pipeline Slide from Mike Houston:

► Not covered today:  SM 5 / D3D 11 / GL 4  Tessellation shaders ► *cough* student presentation *cough*  Later this semester: NVIDIA Fermi ► Dual warp scheduler ► Configurable L1 / shared memory ► Double precision ► … Evolution of the Programmable Graphics Pipeline

New Tool: AMD System Monitor ► Released 01/04/2011 ►

A closer look at the fixed-function pipeline

Pipeline Input (x, y, z) (r, g, b,a) (Nx,Ny,Nz) (tx, ty,[tz]) (tx, ty) VertexImage F(x,y) = (r,g,b,a) Material properties*

ModelView Transformation ► Vertices mapped from object space to world space ► M = model transformation (scene) ► V = view transformation (camera) X’ Y’ Z’ W’ XYZ1XYZ1 M * V * Each matrix transform is applied to each vertex in the input stream. Think of this as a kernel operator.

Lighting Lighting information is combined with normals and other parameters at each vertex in order to create new colors. Color(v) = emissive + ambient + diffuse + specular Each term in the right hand side is a function of the vertex color, position, normal and material properties.

Clipping/Projection/Viewport(3D) ► More matrix transformations that operate on a vertex to transform it into the viewport space. ► Note that a vertex may be eliminated from the input stream (if it is clipped). ► The viewport is two-dimensional: however, vertex z-value is retained for depth testing. Clip test is first example of a conditional in the pipeline. However, it is not a fully general conditional. Why ?

Fragment attributes: (r,g,b,a) (x,y,z,w) (tx,ty), … Rasterizing+Interpolation ► All primitives are now converted to fragments. ► Data type change ! Vertices to fragments Texture coordinates are interpolated from texture coordinates of vertices. This gives us a linear interpolation operator for free. VERY USEFUL !

Per-fragment operations ► The rasterizer produces a stream of fragments. ► Each fragment undergoes a series of tests with increasing complexity. Test 1: Scissor If (fragment lies in fixed rectangle) let it pass else discard it Test 2: Alpha If( fragment.a >= ) let it pass else discard it. Scissor test is analogous to clipping operation in fragment space instead of vertex space. Alpha test is a slightly more general conditional. Why ?

Per-fragment operations ► Stencil test: S(x, y) is stencil buffer value for fragment with coordinates (x,y) ► If f(S(x,y)), let pixel pass else kill it. Update S(x, y) conditionally depending on f(S(x,y)) and g(D(x,y)). ► Depth test: D(x, y) is depth buffer value. ► If g(D(x,y)) let pixel pass else kill it. Update D(x,y) conditionally.

Per-fragment operations ► Stencil and depth tests are more general conditionals. Why ? ► These are the only tests that can change the state of internal storage (stencil buffer, depth buffer). ► One of the update operations for the stencil buffer is a “count” operation. Remember this! ► Unfortunately, stencil and depth buffers have lower precision (8, 24 bits resp.)

Post-processing ► Blending: pixels are accumulated into final framebuffer storage new-val = old-val op pixel-value If op is +, we can sum all the (say) red components of pixels that pass all tests. Problem: In generation<= IV, blending can only be done in 8-bit channels (the channels sent to the video card); precision is limited. We could use accumulation buffers, but they are very slow.

Quick Review: Buffers ► Color Buffers  Front-left  Front-right  Back-left  Back-right ► Depth Buffer (z-buffer) ► Stencil Buffer ► Accumulation Buffer

Quick Review: Tests ► Scissor Test If(fragment exists inside rectangle) keepElsedelete ► Alpha Test – Compare fragment’s alpha value against reference value ► Stencil Test – Compare fragment against stencil map ► Depth Test – Compare a fragment’s depth to the depth value already present in the depth buffer  Never  Always  Less  Less-Equal  Greater-Equal  Greater  Not-Equal

Readback = Feedback What is the output of a “computation” ? 1. Display on screen. 2. Render to buffer and retrieve values (readback) Readbacks are VERY slow ! PCI and AGP buses are asymmetric: DMA enables fast transfer TO graphics card. Reverse transfer has traditionally not been required, and is much slower. PCIe is symmetric but still very slow compared to GPU speeds. This motivates idea of “pass” being an atomic “unit cost” operation. What options do we have ? 1.Render to off-screen buffers like accumulation buffer 2.Copy from framebuffer to texture memory ? 3.Render directly to a texture ?

Time for a puzzle…

An Example: Voronoi Diagrams.

Definition ► You are given n sites (p 1, p 2, p 3, … p n ) in the plane (think of each site as having a color) ► For any point p in the plane, it is closest to some site p j. Color p with color i. ► Compute this colored map on the plane. In other words, Compute the nearest-neighbour diagram of the sites.

Example So how do we do this on the graphics card? Note, this does not use any programmable features of the card

Hint: Think in one dimension higher The lower envelope of “cones” centered at the points is the Voronoi diagram of this set of points.

The Procedure ► In order to compute the lower envelope, we need to determine, at each pixel, the fragment having the smallest depth value. ► This can be done with a simple depth test.  Allow a fragment to pass only if it is smaller than the current depth buffer value, and update the buffer accordingly. ► The fragment that survives has the correct color.

Let’s make this more complicated ► The 1-median of a set of sites is a point q* that minimizes the sum of distances from all sites to itself. q* = arg min Σ d(p, q) WRONG !RIGHT !

A First Step Can we compute, for each pixel q, the value F(q) = Σ d(p, q) We can use the cone trick from before, and instead of computing the minimum depth value, compute the sum of all depth values using blending. What’s the catch ?

We can’t blend depth values ! ► Using texture interpolation helps here. ► Instead of drawing a single cone, we draw a shaded cone, with an appropriately constructed texture map. ► Then, fragment having depth z has color component 1.0 * z. ► Now we can blend the colors. ► OpenGL has an aggregation operator that will return the overall min Warning: we are ignoring issues of precision.

Now we apply a streaming perspective…

Two kinds of data ► Stream data (data associated with vertices and fragments)  Color/position/texture coordinates.  Functionally similar to member variables in a C++ object.  Can be used for limited message passing: I modify an object state and send it to you. ► “Persistent” data (associated with buffers).  Depth, stencil, textures. ► Can be modifed by multiple fragments in a single pass. ► Functionally similar to a global array BUT each fragment only gets one location to change. ► Can be used to communicate across passes.

Who has access ? ► Memory “connectivity” in the graphics use of a GPU is tricky. ► In a traditional C program, all global variables can be written by all routines. ► In the fixed-function pipeline, certain data is private.  A fragment cannot change a depth or stencil value of a location different from its own.  The framebuffer can be copied to a texture; a depth buffer cannot be copied in this way, and neither can a stencil buffer.  Only a stencil buffer can count (efficiently) ► In the fixed-function pipeline, depth and stencil buffers can be used in a multi-pass computation only via readbacks. ► A texture cannot be written directly. ► In programmable GPUs, the memory connectivity becomes more open, but there are still constraints. Understanding access constraints and memory “connectivity” is a key step in programming the GPU.

How does this relate to stream programs ? ► The most important question to ask when programming the GPU is: What can I do in one pass ? ► Limitations on memory connectivity mean that a step in a computation may often have to be deferred to a new pass. ► For example, when computing the second smallest element, we could not store the current minimum in read/write memory. ► Thus, the “communication” of this value has to happen across a pass.

Graphics pipeline Graphics pipeline Vertex Index Stream 3D API Commands Assembled Primitives Pixel Updates Pixel Location Stream GPU Front End GPU Front End Primitive Assembly Primitive Assembly Frame Buffer Frame Buffer Raster Operations Rasterization and Interpolation 3D API: OpenGL or Direct3D 3D API: OpenGL or Direct3D 3D Application Or Game 3D Application Or Game GPU Command & Data Stream CPU-GPU Boundary Vertex pipelineFragment pipeline