DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology.

Slides:



Advertisements
Similar presentations
A Real Time Radiosity Architecture for Video Games
Advertisements

Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Deferred Shading Optimizations
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Understanding the graphics pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider.
Status – Week 257 Victor Moya. Summary GPU interface. GPU interface. GPU state. GPU state. API/Driver State. API/Driver State. Driver/CPU Proxy. Driver/CPU.
The Art and Technology Behind Bioshock’s Special Effects
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
Damon Rocco.  Tessellation: The filling of a plane with polygons such that there is no overlap or gap.  In computer graphics objects are rendered as.
Rasterization and Ray Tracing in Real-Time Applications (Games) Andrew Graff.
Morphing and Animation GPU Graphics Gary J. Katz University of Pennsylvania CIS 665 Adapted from articles taken from ShaderX 3, 4 and 5 And GPU Gems 1.
Particle Systems GPU Graphics. Sample Particle System Fire and SmokeWater.
The programmable pipeline Lecture 10 Slide Courtesy to Dr. Suresh Venkatasubramanian.
GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.
GEOMETRY SHADER. Breakdown  Basics  Review – graphics pipeline and shaders  What can the geometry shader do?  Working with the geometry shader  GLSL.
GPU Programming Robert Hero Quick Overview (The Old Way) Graphics cards process Triangles Graphics cards process Triangles Quads.
Real-time Graphical Shader Programming with Cg (HLSL)
CSE 381 – Advanced Game Programming Basic 3D Graphics
4.7. I NSTANCING Introduction to geometry instancing.
Geometric Objects and Transformations. Coordinate systems rial.html.
Chris Kerkhoff Matthew Sullivan 10/16/2009.  Shaders are simple programs that describe the traits of either a vertex or a pixel.  Shaders replace a.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS Textures.
Overview [See Video file] Architecture Overview.
TERRAIN SET09115 Intro to Graphics Programming. Breakdown  Basics  What do we mean by terrain?  How terrain rendering works  Generating terrain 
CS 450: COMPUTER GRAPHICS REVIEW: INTRODUCTION TO COMPUTER GRAPHICS – PART 2 SPRING 2015 DR. MICHAEL J. REALE.
The programmable pipeline Lecture 3.
1 Introduction to Computer Graphics with WebGL Ed Angel Professor Emeritus of Computer Science Founding Director, Arts, Research, Technology and Science.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Computer Graphics The Rendering Pipeline - Review CO2409 Computer Graphics Week 15.
Advanced Computer Graphics Advanced Shaders CO2409 Computer Graphics Week 16.
GRAPHICS PIPELINE & SHADERS SET09115 Intro to Graphics Programming.
09/16/03CS679 - Fall Copyright Univ. of Wisconsin Last Time Environment mapping Light mapping Project Goals for Stage 1.
CSCE 552 Spring D Models By Jijun Tang. Triangles Fundamental primitive of pipelines  Everything else constructed from them  (except lines and.
Advanced Computer Graphics Spring 2014 K. H. Ko School of Mechatronics Gwangju Institute of Science and Technology.
Emerging Technologies for Games Deferred Rendering CO3303 Week 22.
Review on Graphics Basics. Outline Polygon rendering pipeline Affine transformations Projective transformations Lighting and shading From vertices to.
Havok FX Physics on NVIDIA GPUs. Copyright © NVIDIA Corporation 2004 What is Effects Physics? Physics-based effects on a massive scale 10,000s of objects.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Maths & Technologies for Games Graphics Optimisation - Batching CO3303 Week 5.
From Turing Machine to Global Illumination Chun-Fa Chang National Taiwan Normal University.
Emerging Technologies for Games Capability Testing and DirectX10 Features CO3301 Week 6.
Ray Tracing using Programmable Graphics Hardware
What are shaders? In the field of computer graphics, a shader is a computer program that runs on the graphics processing unit(GPU) and is used to do shading.
Mesh Skinning Sébastien Dominé. Agenda Introduction to Mesh Skinning 2 matrix skinning 4 matrix skinning with lighting Complex skinning for character.
Parallel Lighting Directional Lighting Distant Lighting (take your pick) Paul Taylor 2010.
The Graphics Pipeline Revisited Real Time Rendering Instructor: David Luebke.
Programming with OpenGL Part 3: Shaders Ed Angel Professor of Emeritus of Computer Science University of New Mexico 1 E. Angel and D. Shreiner: Interactive.
GLSL I.  Fixed vs. Programmable  HW fixed function pipeline ▪ Faster ▪ Limited  New programmable hardware ▪ Many effects become possible. ▪ Global.
COMP 175 | COMPUTER GRAPHICS Remco Chang1/XX13 – GLSL Lecture 13: OpenGL Shading Language (GLSL) COMP 175: Computer Graphics April 12, 2016.
Build your own 2D Game Engine and Create Great Web Games using HTML5, JavaScript, and WebGL. Sung, Pavleas, Arnez, and Pace, Chapter 5 Examples 1.
- Introduction - Graphics Pipeline
Perspective, Scene Design, and Basic Animation
Graphics Processing Unit
Deferred Lighting.
Chapter 6 GPU, Shaders, and Shading Languages
The Graphics Rendering Pipeline
CS451Real-time Rendering Pipeline
Chapter VI OpenGL ES and Shader
Introduction to Programmable Hardware
Graphics Processing Unit
Introduction to geometry instancing
Programming with OpenGL Part 3: Shaders
Computer Graphics Introduction to Shaders
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
03 | Creating, Texturing and Moving Objects
Emerging Technologies for Games Review & Revision Strategy
OpenGL-Rendering Pipeline
CS 480/680 Fall 2011 Dr. Frederick C Harris, Jr. Computer Graphics
CIS 6930: Chip Multiprocessor: GPU Architecture and Programming
Presentation transcript:

DX10, Batching, and Performance Considerations Bryan Dudash NVIDIA Developer Technology

The Point of this talk  “The attempt to combine wisdom and power has only rarely been successful and then only for a short while. “- Albert Einstein  DX10 has many new features  Not just geometry shader!  Opportunity to re-org your graphics architecture

Agenda  Short History of DX9 performance  DX10 Performance Potential  Case Study: Geometry Particle System  Case Study: Skinned Instanced Characters  Conclusions

DX9 and Draw calls  “I wasted time, and now doth time waste me ”- William Shakespeare  Most DX9 games are CPU bound  Often this is because of high #’s of draw calls  Developers have mostly learned this by now  Often reducing draw calls isn’t trivial  Render state changes necessitate new draw

DX9 Instancing  Not really in the API  ID3DDevice9::SetStreamS ourceFreq()  Set up 2 streams  Modulus Vertex Stream 0  Base Mesh Stream  Divide Vertex Stream 1  Instance Data Stream (x 0 y 0 z 0 ) (n x0 n y0 n z0 ) (x 1 y 1 z 1 ) (n x1 n y1 n z1 ) … (x 99 y 99 z 99) (n x99 n y99 n z99 ) 0 1 … Vertex Stream 0 worldMatrix 0 worldMatrix 1 … worldMatrix … Vertex Stream 1

DX9 Instancing Performance  Test scene that draws 1 million diffuse shaded polys  Changing the batch size, changes the # of drawn instances  For small batch sizes, can provide an extreme win  There is a fixed overhead from adding the extra data into the vertex stream  The sweet spot changes based on CPU Speed, GPU speed, engine overhead, etc ………………………So what about Direct3D10?

How much faster is DX10? “I was gratified to be able to answer promptly. I said I don't know. “ – Mark Twain  But not just instancing  Fundamentally alter graphics data flow  Increase parallelism  Push more data processing to GPU

DX10 Performance Features  General Instancing Support  General data “buffer” concept  Texture Arrays  Geometry Shader  Yury will cover this in great detail later  Stream Out

General Instancing Support  Fundamentally unchanged  But, fundamentally in the API  Single draw just a special case  More useable due to other DX10 features (x 0 y 0 z 0 ) (n x0 n y0 n z0 ) (x 1 y 1 z 1 ) (n x1 n y1 n z1 ) … (x 99 y 99 z 99) (n x99 n y99 n z99 ) 0 1 … Vertex Data Buffer worldMatrix 0 worldMatrix 1 … worldMatrix … Instance Data Buffer

Instance ID  Unique “system” value  Incremented per instance  Custom per instance processing Instance 0 Instance 1 Instance 2 Color = float4(0,ID/2,0,0);

Data Buffer Object  Input Assembler accepts  Vertex Buffer  Index Buffer  General Buffer  Can only render to a general Buffer  And limited to 8k elements at a time  Multiple passes can get you a R2VB

Texture Arrays  All texture types can be used as an array  Indexable from Shader  Handy for instancing to store different maps for different instances

Texture Arrays and MRT  Interesting tradeoff  Texture Array is one big texture  With clamp constraints per “element” in the array  Can output tris from GS to different slice  Possibly not writing to all slices  Adds extra VS/GS operations  Regular MRT writes to all MRTs  Fixed B/W usage  But lower GS/VS ops

Geometry Shader  Handy to allow us to offload MORE work from CPU  Yury will go over GS potential in great depth

Stream Out  Data output from Geometry Shader  Or Vertex Shader if GS is NULL  Early out rendering pipeline before the Rasterization stage  Allows us to fill dynamic vertex buffers and use in later pass.  Even as per instance data

Case Study: Instanced Particles  Particle simulation takes up a lot of CPU  Updating a particle buffer costs bandwidth  Often particle system just for effects  Game object don’t need to know particle positions  Geometry particles are cool!  More accurate lighting than sprites  Debris, broken glass, lava blobs

Basic Idea  Simulation done in first pass  Position results used in second pass  Each particle is an instanced mesh  Buffer0 and Buffer1 swapped every frame Buffer1 Instance Data Buffer0 Position & Velocity Data VB0 Mesh Data Vertex Shader Instanced Rendering Stream Out Pass 1 Pass 2

Key Bits  Stream Out  Stream out into an instance data buffer  Do particle simulation in VS  Instance data  Vec4 – Position.xyz, lifetime  Vec3 – Velocity.xyz  On CPU Maintain freelist  “inject” updates into instance stream  UpdateSubresource with a subrect

D3D10_INPUT_ELEMENT_DESC { L"POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D10_INPUT_PER_VERTEX_DATA, 0 }, { L"TEXTURE0", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D10_INPUT_PER_VERTEX_DATA, 0 }, { L"NORMAL", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 20, D3D10_INPUT_PER_VERTEX_DATA, 0 }, { L"particlePosition", 0, DXGI_FORMAT_R32G32B32A32_FLOAT, 1, 0, D3D10_INPUT_PER_INSTANCE_DATA, 1 }, { L"particleVelocity", 0, DXGI_FORMAT_R32G32B32_FLOAT, 1, 16, D3D10_INPUT_PER_INSTANCE_DATA, 1 }, Note the 4x32 Format

Considerations  Collision  Can handle simple collision primitives in shader  Works for effects, not interactive objects  Dead Particles  Assign a special NAN value to be interpreted as dead particle

Extensions  Motion Blur  Setup final output to a RT  Use velocity data to calculate blur  Already have velocity from simulation  Add in simple collision primitives  Sphere  Box  Terrain texture

Case Study: Skinned Instancing  Would like to draw many animated characters  Often these characters require upwards of a dozen draw calls EACH  Lots of VS constants updated per draw  For palette skinning  We’d like to batch together same mesh characters

Basic Idea  Encode all animations into a texture  A single character mesh  Contains same info for traditional palette skinning  Each instance uses different animation  Time controlled by CPU Mesh VB Instance Animation Data Animation Data (Texture) Vertex Shader Rasterization VTF UpdateSubResource

Key Bits  Vertex Texture (VTF)  All animations  Vertex Mesh Stream (static)  Vertex Data (ref pose)  Bone indices & weights  Instance Stream (dynamic)  Animation offset  Frame offset  Time lerp

Animation Texture  A “texel” is a row of the bone matrix  4 texels form a single bone  Example  50 bone, 60 frame animation  12,000 pixels  Easily stored in a 128x128 Animation Frame Animation All Animations

Animation Texture  Cannot be 1D Texture or generic Buffer  Max size is 8192  Could be a Texture Array  Thus we encode our data linearly into a 2D texture

Load Bone HLSL Function // Calculate a UV for the bone for this vertex float2 uv = float2(0,0); // if this texture were 1D, what would be the offset? uint baseIndex = animationOffset + frameOffset + (4*bone); // Now turn that into 2D coords uint baseU = baseIndex%g_InstanceMatricesWidth; uint baseV = baseIndex/g_InstanceMatricesWidth; uv.x = (float)baseU / (float)g_InstanceMatricesWidth; uv.y = (float)baseV / (float)g_InstanceMatricesHeight; // Note that we assume the width of the texture is an even multiple of 4, // otherwise we'd have to recalculate the V component PER lookup float2 uvOffset = float2(1.0/(float)g_InstanceMatricesWidth,0); float4 mat1 = g_txInstanceMatrices.Sample( g_samPoint,float4(uv.xy,0,0)); float4 mat2 = g_txInstanceMatrices.Sample( g_samPoint,float4(uv.xy + uvOffset.xy,0,0)); float4 mat3 = g_txInstanceMatrices.Sample( g_samPoint,float4(uv.xy + 2*uvOffset.xy,0,0)); float4 mat4 = g_txInstanceMatrices.Sample( g_samPoint,float4(uv.xy + 3*uvOffset.xy,0,0)); return float4x4(mat1,mat2,mat3,mat4);

Considerations  This example is necessarily simple  Non-main characters/cutscenes  Real games have lots of data dependencies  Physics/Collision  In game cutscenes?  Processing and data loads onto GPU  But GPU is most often idle

Extensions  Use Texture Array to store single animation in a slice  Use TextureArray to encode multiple maps  Normals as well as Albedo  Conditionally kill geometry in GS  Armor, Shields, etc  Animation palette evaluation in GPU pass  Output the animation textures.

Conclusions  “Deliberation is the work of many men. Action, of one alone. “ – Charles De Gaulle  Instancing is more useful in DX10  Working with data is easier  Think about how you can restructure your data  More opportunity for GPU simulation

Questions? 