NVIDIA Graphics and Cg Mark Kilgard Graphics Software Engineer NVIDIA Corporation GPU Shading and Rendering Course 3 July 30, 2006.

Slides:



Advertisements
Similar presentations
Programming with OpenGL - Getting started - Hanyang University Han Jae-Hyek.
Advertisements

COMPUTER GRAPHICS SOFTWARE.
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
Graphics Hardware CMSC 435/634. Transform Shade Clip Project Rasterize Texture Z-buffer Interpolate Vertex Fragment Triangle A Graphics Pipeline.
Prepared 5/24/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
GI 2006, Québec, June 9th 2006 Implementing the Render Cache and the Edge-and-Point Image on Graphics Hardware Edgar Velázquez-Armendáriz Eugene Lee Bruce.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
The Programmable Graphics Hardware Pipeline Doug James Asst. Professor CS & Robotics.
Control Flow Virtualization for General-Purpose Computation on Graphics Hardware Ghulam Lashari Ondrej Lhotak University of Waterloo.
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
A Crash Course on Programmable Graphics Hardware Li-Yi Wei 2005 at Tsinghua University, Beijing.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Status – Week 277 Victor Moya.
GPU Simulator Victor Moya. Summary Rendering pipeline for 3D graphics. Rendering pipeline for 3D graphics. Graphic Processors. Graphic Processors. GPU.
ATI GPUs and Graphics APIs Mark Segal. ATI Hardware X1K series 8 SIMD vertex engines, 16 SIMD fragment (pixel) engines 3-component vector + scalar ALUs.
Evolutions of GPU Architectures Andrew Coile CMPE220 3/2007.
Evolution of the Programmable Graphics Pipeline Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Status – Week 283 Victor Moya. 3D Graphics Pipeline Akeley & Hanrahan course. Akeley & Hanrahan course. Fixed vs Programmable. Fixed vs Programmable.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
GPU Tutorial 이윤진 Computer Game 2007 가을 2007 년 11 월 다섯째 주, 12 월 첫째 주.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
Under the Hood: 3D Pipeline. Motherboard & Chipset PCI Express x16.
High Performance in Broad Reach Games Chas. Boyd
Background image by chromosphere.deviantart.com Fella in following slides by devart.deviantart.com DM2336 Programming hardware shaders Dioselin Gonzalez.
REAL-TIME VOLUME GRAPHICS Christof Rezk Salama Computer Graphics and Multimedia Group, University of Siegen, Germany Eurographics 2006 Real-Time Volume.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Programmable Pipelines. Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Real-time Graphical Shader Programming with Cg (HLSL)
Graphics Hardware and Graphics in Video Games COMP136: Introduction to Computer Graphics.
Computer Graphics Graphics Hardware
Programmable Pipelines. 2 Objectives Introduce programmable pipelines ­Vertex shaders ­Fragment shaders Introduce shading languages ­Needed to describe.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
The Graphics Rendering Pipeline 3D SCENE Collection of 3D primitives IMAGE Array of pixels Primitives: Basic geometric structures (points, lines, triangles,
1 Dr. Scott Schaefer Programmable Shaders. 2/30 Graphics Cards Performance Nvidia Geforce 6800 GTX 1  6.4 billion pixels/sec Nvidia Geforce 7900 GTX.
The programmable pipeline Lecture 3.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
CS662 Computer Graphics Game Technologies Jim X. Chen, Ph.D. Computer Science Department George Mason University.
Accelerated Stereoscopic Rendering using GPU François de Sorbier - Université Paris-Est France February 2008 WSCG'2008.
Ritual ™ Entertainment: Next-Gen Effects on Direct3D ® 10 Sam Z. Glassenberg Program Manager Microsoft ® – Direct3D ® Doug Service Director of Technology.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
May 8, 2007Farid Harhad and Alaa Shams CS7080 Overview of the GPU Architecture CS7080 Final Class Project Supervised by: Dr. Elias Khalaf By: Farid Harhad.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Fateme Hajikarami Spring  What is GPGPU ? ◦ General-Purpose computing on a Graphics Processing Unit ◦ Using graphic hardware for non-graphic computations.
David Luebke 1 1/25/2016 Programmable Graphics Hardware.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Lecture.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Ray Tracing using Programmable Graphics Hardware
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 GPU.
An Introduction to the Cg Shading Language Marco Leon Brandeis University Computer Science Department.
Computer Graphics Graphics Hardware
Programmable Shaders Dr. Scott Schaefer.
A Crash Course on Programmable Graphics Hardware
Graphics on GPU © David Kirk/NVIDIA and Wen-mei W. Hwu,
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
From Turing Machine to Global Illumination
Graphics Processing Unit
Computer Graphics Graphics Hardware
RADEON™ 9700 Architecture and 3D Performance
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

NVIDIA Graphics and Cg Mark Kilgard Graphics Software Engineer NVIDIA Corporation GPU Shading and Rendering Course 3 July 30, 2006

Outline NVIDIA graphics hardware – seven years for GeForce + the future Cg—C for Graphics – the cross-platform GPU programming language

Seven Years of GeForce ProductNew Features OpenGL Version Direct3D Version 2000GeForce 256 Hardware transform & lighting, configurable fixed-point shading, cube maps, texture compression, anisotropic texture filtering 1.3DX7 2001GeForce3 Programmable vertex transformation, 4 texture units, dependent textures, 3D textures, shadow maps, multisampling, occlusion queries 1.4DX8 2002GeForce4 Ti 4600 Early Z culling, dual-monitor 1.4DX GeForce FX Vertex program branching, floating-point fragment programs, 16 texture units, limited floating-point textures, color & depth compression 1.5DX9 2004GeForce 6800 Ultra Vertex textures, structured fragment branching, non-power-of-two textures, generalized floating-point textures, floating-point texture filtering and blending, dual-GPU 2.0DX9c 2005GeForce 7800 GTX Transparency antialiasing, quad-GPU 2.0DX9c 2006GeForce 7900 GTX Single-board dual-GPU, process efficiency 2.1DX9c

2006: the GeForce 7900 GTX board 512MB/256-bit GDDR MHz effective 8 pieces of 8Mx32 16x PCI-Express DVI x 2 sVideo TV Out SLI Connector

2006: the GeForce 7900 GTX GPU 278 million transistors 650 MHz core clock 1,600 MHz GDDR3 effective memory clock 256-bit memory interface Notable Functionality Non-power-of-two textures with mipmaps Floating-point (fp16) blending and filtering sRGB color space texture filtering and frame buffer blending Vertex textures 16x anisotropic texture filtering Dynamic vertex and fragment branching Double-rate depth/stencil-only rendering Early depth/stencil culling Transparency antialiasing

2006: GeForce 7950 GX2, SLI-on-a-card DVI x 2 sVideo TV Out 16x PCI-Express Sandwich of two printed circuit boards Two GeForce 7 Series GPUs 500 Mhz core 1 GB video memory 512 MB per GPU 1,200 Mhz effective Effective 512-bit memory interface!

GeForce Peak Vertex Processing Trends Millions of vertices per second Vertex units ×8 rate for trivial 4x4 vertex transform exceeds peak setup rates—allows excess vertex processing Assumes Alternate Frame Rendering (AFR) SLI Mode

GeForce Peak Triangle Setup Trends Millions of triangles per second assumes 50% face culling Assumes Alternate Frame Rendering (AFR) SLI Mode

GeForce Peak Memory Bandwidth Trends Gigabytes per second 128-bit interface256-bit interface Two physical 256-bit memory interfaces

Effective GPU Memory Bandwidth Compression schemes – Lossless depth and color (when multisampling) compression – Lossy texture compression (S3TC / DXTC) – Typically assumes 4:1 compression Avoid useless work – Early killing of fragments (Z cull) – Avoid useless blending and texture fetches Very clever memory controller designs – Combining memory accesses for improved coherency – Caches for texture fetches

NVIDIA Graphics Core and Memory Clock Rates Megahertz (Mhz) DDR memory transition— memory rates double physical clock rate

GeForce Peak Texture Fetch Trends Millions of texture fetches per second Texture units 2×4 2×4 2×4 2× ×24 assuming no texture cache misses

GeForce Peak Depth/Stencil-only Fill assuming no read-modify-write Millions of depth/stencil pixel updates per second double speed depth-stencil only

GeForce Transistor Count and Semiconductor Process Millions of transistors Process (nm) More performance with fewer transistors: Architectural & process efficiency!

GeForce 7900 GTX Parallelism

16 GeForce FX 5900 GeForce 6800 Ultra Vertex Fragment 2 nd Texture Fetch Raster Color Raster Depth GeForce 7900 GTX Hardware Unit

2005: Comparison to CPU Pentium Extreme Edition GHz Dual Core 230M Transistors 90nm process 206 mm^2 2 x 1MB Cache 25.6 GFlops GeForce 7800 GTX 430 MHz 302M Transistors 110nm process 326 mm^2 313 GFlops (shader) 1.3 TFlops (total)

2006: Comparison to CPU Intel Core 2 Extreme X GHz Dual Core 291M Transistors 65nm process 143 mm^2 4MB Cache 23.2 GFlops GeForce 7900 GTX 650 MHz 278M Transistors 90nm process 196 mm^2 477 GFlops (shader) 2.1 TFlops (total)

Giga Flops Imbalance Theoretical programmable IEEE 754 single-precision Giga Flops

Future NVIDIA GPU directions DirectX 10 feature set – Massive graphics functionality upgrade Language and tool support – Performance tuning and content development Improved GPGPU – Harness the bandwidth & Gflops for non-graphics Multi-GPU systems innovation – Next-generation SLI

DirectX 10-class GPU functionality Generalized programmability, including – Integer instructions – Efficient branching – Texture size queries, unfiltered texel fetches, & offset fetches – Shadow cube maps for omni-directional shadowing – Sourcing constants from bind-able buffer objects Per-primitive programmable processing – Emits zero or more strips of triangles/points/lines – New line and triangle adjacency primitives – Output to multiple viewports and buffers

Per-primitive processing example: Automatic silhouette edge rendering emit edge of adjacent triangles that face opposite directions New triangle adjacency primitive = 3 conventional vertices + 3 vertices for adjacent triangles

More DirectX 10-class GPU functionality Better blending – Improved blending control for multiple draw buffers – sRGB and 32-bit floating-point framebuffer blending Streamed output of vertex processing to buffers – Render to vertex array Texture improvements – Indexing into an “array” of 2D textures – Improved render-to-texture – Luminance-alpha compressed formats – Compact High Dynamic Range texture formats – Integer texture formats – 32-bit floating-point texture filtering

Uses of DirectX 10 functionality GPU Marching Cubes GPU Cloth Styled Line Drawing Deformable Collisions Sparkling SpritesTable-free Noise Deep WavesGPU Fluid Simulation

DirectX 10-class functionality parity Feature parity – DirectX 10-class features available via OpenGL – Cross API portability of programmable shading content through Cg Performance parity – 3D API agnostic performance parity on all Windows operating systems System support parity – Linux, Mac, FreeBSD, Solaris – Shared code base for drivers

Multi-GPU Support Original SLI was just the beginning – Quad-SLI – SLI support infuses all NVIDIA product design and development New SLI APIs for application-control of multiple GPUs SLI for notebooks – Better thermals and power

Vertex Cores Fragment Cores Raster Color Cores Raster Depth Cores GeForce 7900 GTX Hardware Unit GeForce 7900 GTX Quad SLI

Cg: C for Graphics

Cg as it exists today – High-level, inspired mostly by C – Graphics focused API-independent – GLSL tied to OpenGL; HLSL tied to Direct3D; Cg works for both Platform-independent – Cg works on PlayStation 3, ATI, NVIDIA, Linux, Solaris, Mac OS X, Windows, etc. Production language and system – Cg 1.5 is part of 3D content creation tool chains – Portability of Cg shaders is important

Evolution of Cg C (AT&T, 1970’s) C++ (AT&T, 1983) Java (Sun, 1994) RenderMan (Pixar, 1988) PixelFlow Shading Language (UNC, 1998) Real-Time Shading Language (Stanford, 2001) Cg / HLSL (NVIDIA/Microsoft, 2002) IRIS GL (SGI, 1982) OpenGL (ARB, 1992) Direct3D (Microsoft, 1995) Reality Lab (RenderMorphics, 1994) General-purpose languages Graphics Application Program Interfaces Shading Languages

Cg 1.5 Current release of Cg – Supports Windows, Linux, Mac (including x86 Macs) + now Solaris – Shader Model 3.0 profiles for Direct3D 9.0c – Matches Sony’s PlayStation 3 Cg support – Tool chain support: FX Composer 2.0 New functionality – Procedural effects generation – Combined programs for multiple domains – New GLSL profiles to compile Cg to GLSL Improved compiler optimization

FX Composer for Cg shader authoring Shaders are assets – Portability matters So express shaders in a multi- platform, multi-API language – That’s Cg

Cg Directions DirectX 10-class feature support – Primitive (geometry) programs – Constant buffers – Interpolation modes – Read-write index-able temporaries – New texture targets: texture arrays, shadow cube maps Incorporate established C++ features, examples: – Classes – Templates – Operator overloading – But not runtime features like new/delete, RTTI, or exceptions

Why C++? Already inspiration for much of Cg – Think of Cg’s first-class vectors simply as classes Functionality in C++ is well-understood and popular C++ is biased towards compile-time abstraction – Rather than more run-time focus of Java and C# – Compile-time abstraction is good since GPUs lack the run-time support for heaps, garbage collection, exceptions, and run-time polymorphism

Logical Programmable Graphics Pipeline 3D Application or Game 3D API: OpenGL or Direct3D Driver Programmable Vertex Processor Primitive Assembly Rasterization & Interpolation 3D API Commands Transformed Vertices Assembled Polygons, Lines, and Points GPU Command & Data Stream Programmable Fragment Processor Rasterized Pre-transformed Fragments Transformed Fragments Raster Operations Framebuffer Pixel Updates GPU Front End Pre-transformed Vertices Vertex Index Stream Pixel Location Stream CPU – GPU Boundary Program vertex and fragment domains

Future Logical Programmable Graphics Pipeline 3D Application or Game 3D API: OpenGL or Direct3D Driver Programmable Vertex Processor Primitive Assembly Rasterization & Interpolation 3D API Commands Transformed Vertices Output assembled Polygons, Lines, and Points GPU Command & Data Stream Programmable Fragment Processor Rasterized Pre-transformed Fragments Transformed Fragments Raster Operations Framebuffer Pixel Updates GPU Front End Pre-transformed Vertices Vertex Index Stream Pixel Location Stream CPU – GPU Boundary Programmable Primitive Processor Input assembled Polygons, Lines, and Points New per-primitive “geometry” programmable domain

Pass Through Geometry Program Example BufferInit flatColor; TRIANGLE void passthru(AttribArray position : POSITION, AttribArray texCoord : TEXCOORD0) { flatAttrib(flatColor:COLOR); for (int i=0; i<position.length; i++) { emitVertex(position[i], texCoord[i]); } Primitive’s attributes arrive as “templated” attribute arrays Length of attribute arrays depends on the input primitive mode, 3 for TRIANGLE Bundles a vertex based on parameter values and semantics Makes sure flat attributes are associated with the proper provoking vertex convention flatColor initialized from constant buffer 6

Conclusions NVIDIA GPUs – Expect more compute and bandwidth increases >> CPUs – DirectX 10 = large functionality upgrade for graphics Cg, the only cross-API, multi-platform language for programmable shading – Think shaders as content, not GPU programs trapped inside applications