David Luebke 1 11/24/2015 Programmable Graphics Hardware.

Slides:



Advertisements
Similar presentations
Fragment level programmability in OpenGL Evan Hart
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
CS-378: Game Technology Lecture #9: More Mapping Prof. Okan Arikan University of Texas, Austin Thanks to James O’Brien, Steve Chenney, Zoran Popovic, Jessica.
9/25/2001CS 638, Fall 2001 Today Shadow Volume Algorithms Vertex and Pixel Shaders.
Informationsteknologi Wednesday, December 12, 2007Computer Graphics - Class 171 Today’s class OpenGL Shading Language.
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
CS5500 Computer Graphics © Chun-Fa Chang, Spring 2007 CS5500 Computer Graphics April 19, 2007.
GLSL I May 28, 2007 (Adapted from Ed Angel’s lecture slides)
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
Cg: C for Graphics Jon Moon Based on slides by Eugene Lee.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Vertex & Pixel Shaders CPS124 – Computer Graphics Ferdinand Schober.
Cg Overview Cg is the High Level language from NVIDIA for programming GPUs, developed in close collaboration with Microsoft Cg is 100% compatible with.
Computer Science – Game DesignUC Santa Cruz Adapted from Jim Whitehead’s slides Shaders Feb 18, 2011 Creative Commons Attribution 3.0 (Except copyrighted.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Cg Kevin Bjorke GDC NVIDIA CONFIDENTIAL A Whole New World with Cg Graphics Program Written in Cg “C” for Graphics Compiled & Optimized Low Level,
Cg – C for Graphics Eric Vidal CS 280. History General graphics processing languages – Renderman shading language (1988) Assembly languages for graphics.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Enhancing GPU for Scientific Computing Some thoughts.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Real-time Graphical Shader Programming with Cg (HLSL)
Geometric Objects and Transformations. Coordinate systems rial.html.
Graphics Graphics Korea University cgvr.korea.ac.kr 1 Using Vertex Shader in DirectX 8.1 강 신 진
OpenGL Shading Language (Advanced Computer Graphics) Ernest Tatum.
GPU Shading and Rendering Shading Technology 8:30 Introduction (:30–Olano) 9:00 Direct3D 10 (:45–Blythe) Languages, Systems and Demos 10:30 RapidMind.
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
Cg Programming Mapping Computational Concepts to GPUs.
Real-Time Shading Using Programmable Graphics Hardware Introduction, Setup and Examples Wan-Chun Ma National Taiwan University.
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
The programmable pipeline Lecture 3.
CSE 690: GPGPU Lecture 6: Cg Tutorial Klaus Mueller Computer Science, Stony Brook University.
The GPU Revolution: Programmable Graphics Hardware David Luebke University of Virginia.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Finding Body Parts with Vector Processing Cynthia Bruyns Bryan Feldman CS 252.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Programmable Pipelines Ed Angel Professor of Computer Science, Electrical and Computer Engineering, and Media Arts Director, Arts Technology Center University.
The Cg Runtime Cyril Zeller. Cg Pipeline Graphics programs are written in Cg and compiled to low-level assembly code... Cg Runtime API...
A User-Programmable Vertex Engine Erik Lindholm Mark Kilgard Henry Moreton NVIDIA Corporation Presented by Han-Wei Shen.
CSE 381 – Advanced Game Programming GLSL. Rendering Revisited.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Week 3 Lecture 4: Part 2: GLSL I Based on Interactive Computer Graphics (Angel) - Chapter 9.
David Luebke 1 1/25/2016 Programmable Graphics Hardware.
09/25/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Shadows Stage 2 outline.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
OpenGL Shading Language
Programmable Graphics Hardware CS 446: Real-Time Rendering & Game Technology David Luebke University of Virginia.
Programming with OpenGL Part 3: Shaders Ed Angel Professor of Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive.
GLSL I.  Fixed vs. Programmable  HW fixed function pipeline ▪ Faster ▪ Limited  New programmable hardware ▪ Many effects become possible. ▪ Global.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
第七课 GPU & GPGPU.
A Crash Course on Programmable Graphics Hardware
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
Phil Tayco Slide version 1.0 Created Oct 2, 2017
UMBC Graphics for Games
CS5500 Computer Graphics April 17, 2006 CS5500 Computer Graphics
RADEON™ 9700 Architecture and 3D Performance
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CS 480/680 Fall 2011 Dr. Frederick C Harris, Jr. Computer Graphics
Presentation transcript:

David Luebke 1 11/24/2015 Programmable Graphics Hardware

David Luebke 2 11/24/2015 Admin ● Handout: Cg in two pages ● OLS 002a lab ● Impact on homework

David Luebke 3 11/24/2015 Acknowledgement & Aside ● The bulk of this lecture comes from slides from Bill Mark’s SIGGRAPH 2002 course talk on NVIDIA’s programmable graphics technology ● For this reason, and because the lab is outfitted with GeForce 3 cards, we will focus on NVIDIA tech

David Luebke 4 11/24/2015 Outline ● Programmable graphics ■ NVIDIA’s next-generation technology: GeForceFX (code name NV30) ● Programming programmable graphics ■ NVIDIA’s Cg language

David Luebke 5 11/24/2015 GPU Programming Model Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer GPU CPU Textures

David Luebke 6 11/24/2015 ● Framebuffer ● Textures ● Fragment processor ● Vertex processor ● Interpolants 32-bit IEEE floating-point throughout pipeline

David Luebke 7 11/24/2015 Hardware supports several other data types ● Fragment processor also supports: ■ 16-bit “half” floating point ■ 12-bit fixed point ■ These may be faster than 32-bit on some HW ● Framebuffer/textures also support: ■ Large variety of fixed-point formats ■ E.g., classical 8-bit per component ■ These formats use less memory bandwidth than FP32

David Luebke 8 11/24/2015 Vertex processor capabilities ● 4-vector FP32 operations, as in GeForce3/4 ● True data-dependent control flow ■ Conditional branch instruction ■ Subroutine calls, up to 4 deep ■ Jump table (for switch statements) ● Condition codes ● New arithmetic instructions (e.g. COS) ● User clip-plane support

David Luebke 9 11/24/2015 Vertex processor has high resource limits ● 256 instructions per program (effectively much higher w/branching) ● 16 temporary 4-vector registers ● 256 “uniform” parameter registers ● 2 address registers (4-vector) ● 6 clip-distance outputs

David Luebke 10 11/24/2015 Fragment processor has clean instruction set ● General and orthogonal instructions ● Much better than previous generation ● Same syntax as vertex processor: MUL R0, R1.xyz, R2.yxw; ● Full set of arithmetic instructions: RCP, RSQ, COS, EXP, …

David Luebke 11 11/24/2015 Fragment processor has flexible texture mapping ● Texture reads are just another instruction (TEX, TXP, or TXD) ● Allows computed texture coordinates, nested to arbitrary depth ● Allows multiple uses of a single texture unit ● Optional LOD control – specify filter extent ● Think of it as… A memory-read instruction, with optional user-controlled filtering

David Luebke 12 11/24/2015 Additional fragment processor capabilities ● Read access to window-space position ● Read/write access to fragment Z ● Built-in derivative instructions ■ Partial derivatives w.r.t. screen-space x or y ■ Useful for anti-aliasing ● Conditional fragment-kill instruction ● FP32, FP16, and fixed-point data

David Luebke 13 11/24/2015 Fragment processor limitations ● No branching ■ But, can do a lot with condition codes ● No indexed reads from registers ■ Use texture reads instead ● No memory writes

David Luebke 14 11/24/2015 Fragment processor has high resource limits ● 1024 instructions ● 512 constants or uniform parameters ■ Each constant counts as one instruction ● 16 texture units ■ Reuse as many times as desired ● 8 FP32 x 4 perspective-correct inputs ● 128-bit framebuffer “color” output (use as 4 x FP32, 8 x FP16, etc…)

David Luebke 15 11/24/2015 NV30 CineFX Technology Summary Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures FP32 throughout pipeline Clean instruction sets True branching in vertex processor Dependent texture in fragment processor High resource limits

David Luebke 16 11/24/2015 Programming in assembly is painful … FRC R2.y, C11.w; ADD R3.x, C11.w, -R2.y; MOV H4.y, R2.y; ADD H4.x, -H4.y, C4.w; MUL R3.xy, R3.xyww, C11.xyww; ADD R3.xy, R3.xyww, C11.z; TEX H5, R3, TEX2, 2D; ADD R3.x, R3.x, C11.x; TEX H6, R3, TEX2, 2D; … … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/ /128.0; ocoord2 = ocoord /64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … L2weight = timeval – floor(timeval); L1weight = 1.0 – L2weight; ocoord1 = floor(timeval)/ /128.0; ocoord2 = ocoord /64.0; L1offset = f2tex2D(tex2, float2(ocoord1, 1.0/128.0)); L2offset = f2tex2D(tex2, float2(ocoord2, 1.0/128.0)); … Easier to read and modify Cross-platform Combine pieces etc. Assembly

David Luebke 17 11/24/2015 Quick Demo

David Luebke 18 11/24/2015 Cg – C for Graphics ● Cg is a GPU programming language ● Designed by NVIDIA and Microsoft ● Compilers available in beta versions from both companies

David Luebke 19 11/24/2015 Design goals for Cg ● Enable algorithms to be expressed… ■ Clearly, and ■ Efficiently ● Provide interface continuity ■ Focus on DX9-generation HW and beyond ■ But provide support for DX8-class HW too ■ Support both OpenGL and Direct3D ● Allow easy, incremental adoption

David Luebke 20 11/24/2015 Easy adoption for applications ● Avoid owning the application’s data ■ No scene graph ■ No buffering of vertex data ● Compiler sits on top of existing APIs ■ User can examine assembly-code output ■ Can compile either at run time, or at application- development time ● Allow partial adoption e.g. Use Cg vertex program with assembly fragment program ● Support current hardware

David Luebke 21 11/24/2015 Some points in the design space ● CPU languages ■ C – close to the hardware; general purpose ■ C++, Java, lisp – require memory management ■ RenderMan – specialized for shading ● Real-time shading languages ■ Stanford shading language ■ Creative Labs shading language

David Luebke 22 11/24/2015 Design strategy ● Start with C (and a bit of C++) ■ Minimizes number of decisions ■ Gives you known mistakes instead of unknown ones ● Allow subsetting of the language ● Add features desired for GPU’s ■ To support GPU programming model ■ To enable high performance ● Tweak to make it fit together well

David Luebke 23 11/24/2015 How are current GPU’s different from CPU? 1. GPU is a stream processor ■ Multiple programmable processing units ■ Connected by data flows Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures

David Luebke 24 11/24/2015 Cg uses separate vertex and fragment programs Application Vertex Processor Fragment Processor Assembly & Rasterization Framebuffer Operations Framebuffer Textures Program

David Luebke 25 11/24/2015 Cg programs have two kinds of inputs ● Varying inputs (streaming data) ■ e.g. normal vector – comes with each vertex ■ This is the default kind of input ● Uniform inputs (a.k.a. graphics state) ■ e.g. modelview matrix ● Note: Outputs are always varying vout MyVertexProgram(float4 normal, uniform float4x4 modelview) { … vout MyVertexProgram(float4 normal, uniform float4x4 modelview) { …

David Luebke 26 11/24/2015 Two ways to bind VP outputs to FP inputs a) Let compiler do it ■ Define a single structure ■ Use it for vertex-program output ■ Use it for fragment-program input struct vout { float4 color; float4 texcoord; … }; struct vout { float4 color; float4 texcoord; … };

David Luebke 27 11/24/2015 Two ways to bind VP outputs to FP inputs b) Do it yourself ■ Specify register bindings for VP outputs ■ Specify register bindings for FP inputs ■ May introduce HW dependence ■ Necessary for mixing Cg with assembly struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; … }; struct vout { float4 color : TEX3 ; float4 texcoord : TEX5; … };

David Luebke 28 11/24/2015 Some inputs and outputs are special ● e.g. the position output from vert prog ■ This output drives the rasterizer ■ It must be marked struct vout { float4 color; float4 texcoord; float4 position : HPOS; }; struct vout { float4 color; float4 texcoord; float4 position : HPOS; };

David Luebke 29 11/24/2015 How are current GPU’s different from CPU? 2. Greater variation in basic capabilities ■ Most processors don’t yet support branching ■ Vertex processors don’t support texture mapping ■ Some processors support additional data types Compiler can’t hide these differences Least-common-denominator is too restrictive We expose differences via language profiles (list of capabilities and data types) Over time, profiles will converge Compiler can’t hide these differences Least-common-denominator is too restrictive We expose differences via language profiles (list of capabilities and data types) Over time, profiles will converge

David Luebke 30 11/24/2015 How are current GPU’s different from CPU? 3. Optimized for 4-vector arithmetic ■ Useful for graphics – colors, vectors, texcoords ■ Easy way to get high performance/cost C philosophy says: expose these HW data types Cg has vector data types and operations e.g. float2, float3, float4 Makes it obvious how to get high performance Cg also has matrix data types e.g. float3x3, float3x4, float4x4 C philosophy says: expose these HW data types Cg has vector data types and operations e.g. float2, float3, float4 Makes it obvious how to get high performance Cg also has matrix data types e.g. float3x3, float3x4, float4x4

David Luebke 31 11/24/2015 Some vector operations // // Clamp components of 3-vector to [minval,maxval] range // float3 clamp(float3 a, float minval, float maxval) { a = (a < minval.xxx) ? minval.xxx : a; a = (a > maxval.xxx) ? maxval.xxx : a; return a; } // // Clamp components of 3-vector to [minval,maxval] range // float3 clamp(float3 a, float minval, float maxval) { a = (a < minval.xxx) ? minval.xxx : a; a = (a > maxval.xxx) ? maxval.xxx : a; return a; } Swizzle – replicate and/or rearrange components. ? : is per-component for vectors Comparisons between vectors are per-component, and produce vector result

David Luebke 32 11/24/2015 Cg has arrays too ● Declared just as in C ● But, arrays are distinct from built-in vector types: float4 != float[4] ● Language profiles may restrict array usage vout MyVertexProgram(float3 lightcolor[10], …) { … vout MyVertexProgram(float3 lightcolor[10], …) { …

David Luebke 33 11/24/2015 How are current GPU’s different from CPU? 4. No support for pointers ■ Arrays are first-class data types in Cg 5. No integer data type ■ Cg adds “bool” data type for boolean operations ■ This change isn’t obvious except when declaring vars

David Luebke 34 11/24/2015 Cg basic data types ● All profiles: ■ float ■ bool ● All profiles with texture lookups: ■ sampler1D, sampler2D, sampler3D, samplerCUBE ● NV_fragment_program profile: ■ half -- half-precision float ■ fixed -- fixed point [-2,2)

David Luebke 35 11/24/2015 Other Cg capabilities ● Function overloading ● Function parameters are value/result ■ Use “out” modifier to declare return value ● “discard” statement – fragment kill void foo (float a, out float b) { b = a; } void foo (float a, out float b) { b = a; } if (a > b) discard; if (a > b) discard;

David Luebke 36 11/24/2015 Cg Built-in functions ● Texture mapping (in fragment profiles) ● Math ■ Dot product ■ Matrix multiply ■ Sin/cos/etc. ■ Normalize ● Misc ■ Partial derivative (when supported) ● See spec for more details

David Luebke 37 11/24/2015 New vector operators ● Swizzle – replicate/rearrange elements a = b.xxyy; ● Write mask – selectively over-write a.w = 1.0; ● Vector constructor builds vector a = float4(1.0, 0.0, 0.0, 1.0);

David Luebke 38 11/24/2015 Change to constant-typing mechanism ● In C, it’s easy to accidentally use high precision half x, y; x = y * 2.0; // Double-precision multiply! ● Not in Cg x = y * 2.0; // Half-precision multiply ● Unless you want to x = y * 2.0f; // Float-precision multiply

David Luebke 39 11/24/2015 Dot product, Matrix multiply ● Dot product ■ dot(v1,v2); // returns a scalar ● Matrix multiplications: ■ matrix-vector: mul(M, v); // returns a vector ■ vector-matrix: mul(v, M); // returns a vector ■ matrix-matrix: mul(M, N); // returns a matrix

David Luebke 40 11/24/2015 Demos and Examples ● Show Assn 4a skeleton code ● Show Cg effects browser ( ■ Fresnel ■ Simple lighting

David Luebke 41 11/24/2015 Cg runtime API helps applications use Cg ● Compile a program ● Select active programs for rendering ● Pass “uniform” parameters to program ● Pass “varying” (per-vertex) parameters ● Load vertex-program constants ● Other housekeeping

David Luebke 42 11/24/2015 Runtime is split into three libraries ● API-independent layer – cg.lib ■ Compilation ■ Query information about object code ● API-dependent layer – cgGL.lib and cgD3D.lib ■ Bind to compiled program ■ Specify parameter values ■ etc. ● NB: New API introduced since the following slide ■ See user’s manual for details ■ Pay attention to the basic idea here

David Luebke 43 11/24/2015 Runtime API for OpenGL // Create cgContext to hold vertex-profile code VertexContext = cgCreateContext(); // Add vertex-program source text to vertex-profile context // This is where compilation currently occurs cgAddProgram(VertexContext, CGVertProg, cgVertexProfile, NULL); // Get handle to 'main' vertex program VertexProgramIter = cgProgramByName(VertexContext, "main"); cgGLLoadProgram(VertexProgramIter, ProgId); VertKdBind = cgGetBindByName(VertexProgramIter, "Kd"); TestColorBind = cgGetBindByName(VertexProgramIter, "I.TestColor"); texcoordBind = cgGetBindByName(VertexProgramIter, "I.texcoord"); // Create cgContext to hold vertex-profile code VertexContext = cgCreateContext(); // Add vertex-program source text to vertex-profile context // This is where compilation currently occurs cgAddProgram(VertexContext, CGVertProg, cgVertexProfile, NULL); // Get handle to 'main' vertex program VertexProgramIter = cgProgramByName(VertexContext, "main"); cgGLLoadProgram(VertexProgramIter, ProgId); VertKdBind = cgGetBindByName(VertexProgramIter, "Kd"); TestColorBind = cgGetBindByName(VertexProgramIter, "I.TestColor"); texcoordBind = cgGetBindByName(VertexProgramIter, "I.texcoord");

David Luebke 44 11/24/2015 Runtime API for OpenGL // // Bind uniform parameters // cgGLBindUniform4f(VertexProgramIter, VertKdBind, 1.0, 1.0, 0.0, 1.0); … // Prepare to render cgGLEnableProgramType(cgVertexProfile); cgGLEnableProgramType(cgFragmentProfile); … // Immediate-mode vertex glNormal3fv(&CubeNormals[i][0]); cgGLBindVarying2f(VertexProgramIter, texcoordBind, 0.0, 0.0); cgGLBindVarying3f(VertexProgramIter, TestColorBind, 1.0, 0.0, 0.0); glVertex3fv(&CubeVertices[CubeFaces[i][0]][0]); // // Bind uniform parameters // cgGLBindUniform4f(VertexProgramIter, VertKdBind, 1.0, 1.0, 0.0, 1.0); … // Prepare to render cgGLEnableProgramType(cgVertexProfile); cgGLEnableProgramType(cgFragmentProfile); … // Immediate-mode vertex glNormal3fv(&CubeNormals[i][0]); cgGLBindVarying2f(VertexProgramIter, texcoordBind, 0.0, 0.0); cgGLBindVarying3f(VertexProgramIter, TestColorBind, 1.0, 0.0, 0.0); glVertex3fv(&CubeVertices[CubeFaces[i][0]][0]);

David Luebke 45 11/24/2015 Cg Summary ● C-like language ● With capabilities for GPU’s ● Compatible with Microsoft’s HLSL ● Use with OpenGL or DirectX ● NV20/DX8 and beyond ● NV30 + Cg = You control the graphics pipeline

David Luebke 46 11/24/2015 More Information ● The Cg Tutorial book ● NVIDIA’s “Learn About Cg” page: ■ ● For information, inspiration, and examples, a web forum on writing Cg shaders: ■ ● The Cg user’s manual: ■ ftp://download.nvidia.com/developer/cg/Cg_1.2.1/Docs/Cg_Toolkit.pdf ftp://download.nvidia.com/developer/cg/Cg_1.2.1/Docs/Cg_Toolkit.pdf ● Cg downloads—compiler, plug-ins, etc. ■