Firaxis LORE And other uses of D3D11.

Slides:



Advertisements
Similar presentations
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advertisements

GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
Scheduling Algorithems
DirectX11 Performance Reloaded
Multi-monitor Game Development Thomas Fortier AMD Graphics Developer Relations
Maximizing Multi-GPU Performance
Agenda Windows Display Driver Model (WDDM) What is GPUView?
DirectX Graphics: Direct3D 10 and Beyond
Introduction to Direct3D 12
Dragon Age II DX11 Technology
Direct3D12 and the future of graphics APIs
Improving Performance in Your Game
COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
1 | Introducing GPU PerfStudio 2.0 | GDC 2009 Introducing AMD GPU PerfStudio 2.0 Next Generation GPU Performance Analysis & Debugging Tool from AMD GPG.
Lecture 38: Chapter 7: Multiprocessors Today’s topic –Vector processors –GPUs –An example 1.
Ray tracing. New Concepts The recursive ray tracing algorithm Generating eye rays Non Real-time rendering.
G30™ A 3D graphics accelerator for mobile devices Petri Nordlund CTO, Bitboys Oy.
Operating Systems Lecture 10 Issues in Paging and Virtual Memory Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing.
Real-Time Rendering TEXTURING Lecture 02 Marina Gavrilova.
ABC HFG JKLW OPQR NTU VS YZ.
GRAPHICS AND COMPUTING GPUS Jehan-François Pâris
Optimizing and Debugging Graphics Applications with AMD's GPU PerfStudio 2.2 GPG Developer Tools Raul Aguaviva Gordon Selley Seth Sowerby.
Tools for Investigating Graphics System Performance
1 Shader Performance Analysis on a Modern GPU Architecture Victor Moya, Carlos González, Jordi Roca, Agustín Fernández Jordi Roca, Agustín Fernández Department.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE408, University of Illinois, Urbana-Champaign 1 Programming Massively Parallel Processors Chapter.
Adaptive Streaming and Rendering of Large Terrains: a Generic Solution WSCG 2009 Raphaël Lerbour Jean-Eudes Marvie Pascal Gautron THOMSON R&D, Rennes,
Status – Week 260 Victor Moya. Summary shSim. shSim. GPU design. GPU design. Future Work. Future Work. Rumors and News. Rumors and News. Imagine. Imagine.
University of Texas at Austin CS 378 – Game Technology Don Fussell CS 378: Computer Game Technology Beyond Meshes Spring 2012.
High Performance in Broad Reach Games Chas. Boyd
Games Development 2 Entity / Architecture Review CO3301 Week
CSE 381 – Advanced Game Programming 3D Game Architecture.
1 KIPA Game Engine Seminars Jonathan Blow Seoul, Korea November 29, 2002 Day 4.
Antigone Engine Kevin Kassing – Period
Codeplay CEO © Copyright 2012 Codeplay Software Ltd 45 York Place Edinburgh EH1 3HP United Kingdom Visit us at The unique challenges of.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 8: Main Memory.
4.7. I NSTANCING Introduction to geometry instancing.
Adaptive Real-Time Rendering of Planetary Terrains WSCG 2010 Raphaël Lerbour Jean-Eudes Marvie Pascal Gautron THOMSON R&D, Rennes, France.
Overview [See Video file] Architecture Overview.
4/23/2017 4:23 AM © 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered.
K. Harrison CERN, 20th April 2004 AJDL interface and LCG submission - Overview of AJDL - Using AJDL from Python - LCG submission.
BLU-ICE and the Distributed Control System Constraints for Software Development Strategies Timothy M. McPhillips Stanford Synchrotron Radiation Laboratory.
Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Porting Unity to new APIs Aras Pranckevičius Unity Technologies.
Maths & Technologies for Games DirectX 11 – New Features Tessellation & Displacement Mapping CO3303 Week 19.
Shader Study 이동현. Vision engine   Games Helldorado The Show Warlord.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
Xbox MB system memory IBM 3-way symmetric core processor ATI GPU with embedded EDRAM 12x DVD Optional Hard disk.
BUMP-MAPPING SET09115 Intro to Graphics Programming.
Final Project Ideas Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Next Generation ISA Itanium / IA-64. Operating Environments IA-32 Protected Mode/Real Mode/Virtual Mode - if supported by the OS IA-64 Instruction Set.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
Maths & Technologies for Games Graphics Optimisation - Batching CO3303 Week 5.
COMPUTER GRAPHICS CS 482 – FALL 2015 SEPTEMBER 29, 2015 RENDERING RASTERIZATION RAY CASTING PROGRAMMABLE SHADERS.
UW EXTENSION CERTIFICATE PROGRAM IN GAME DEVELOPMENT 2 ND QUARTER: ADVANCED GRAPHICS The GPU.
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
Image Fusion In Real-time, on a PC. Goals Interactive display of volume data in 3D –Allow more than one data set –Allow fusion of different modalities.
Computer Architecture & Operations I
Computer Architecture & Operations I
COMPUTER GRAPHICS CHAPTER 38 CS 482 – Fall 2017 GRAPHICS HARDWARE
Scene Manager Creates and places movable objects like lights and cameras so as to access them efficiently, e.g. for rendering. Loads and assembles world.
Multi-threading the Oxman Game Engine Sean Oxley CS 523, Fall 2012
Graphics Processing Unit
Chapter 6 GPU, Shaders, and Shading Languages
Introduction to geometry instancing
UMBC Graphics for Games
UE4 Vulkan Updates & Tips
An Incremental Rendering VM
Games Development 2 Entity / Architecture Review
Presentation transcript:

Firaxis LORE And other uses of D3D11

Low Overhead Rendering Engine Or, how I learned to Render 15,000+ batches at 60 FPS

Overview Civ 5 is a big game, covers 6000 years of history The entire map can be populated/ polluted with all sorts of things the user creates Need to be able to render a huge amount of possibly disparate types

Early Goals Build brand new Engine for Civilization V Like the game, we wanted graphics engine to be able to ‘stand the test of time’ Decided while D3D11 was in Alpha to build the engine natively for D3D11 architecture, and map backwards to DX9

Step 1: Cutting the overhead down Shaders start in Firaxis Shading Language (FSL) superset of HLSL Compiles into CPP and Header file – all shader constants are mapped to structs, grouped into packages where all packages have same bindings Model Code is templated – FSL generated header is then bound with template code Result is tiny amount of code that fills out required shading, barely shows up on profiling FSL Files CPP / H Template Code Compile Time Glue Code

Step 2: Abstracting the Rendering Still have to Support DX9, might have to support consoles in future Might have to write a ‘driver’ Our solution: Make DX9 ‘look like’ DX11 Started with as a restricted design as possible, and expanded as we needed to

Packetized Rendering Stateless rendering, much simpler then D3D Command based – all rendering is performed by self contained command A command set may contain a list of surfaces to render, each with shader constant payload A surface is an immutable bundle of an IB, VB, textures, shader def, etc All state is bundled into a packages Alpha State, Z State, etc. Commands reference one of these state packages Entire Frame is queued up Minimal per frame allocation

Only 5 Types of commands COMMAND_RENDER_BATCHES COMMAND_GENERATE_MIPS A List of surfaces to render into 1 or more rendertargets, with alpha and Zstate bundles Surfaces have IB, VB, sampler and texture bundles. All required state is specified COMMAND_GENERATE_MIPS COMMAND_RESOLVE_RENDERTEXTURE COMMAND_COPY_RENDERTEXTURE COMMAND_COPY_RESOURCE

Packetized Rendering Command Stream Rendering System Command Stream Rendering Engine D3D/Driver

Step 3: Threading Command Stream Command Stream Command Stream Job Manager Job Command Stream Job Job Job Job Job Job Job Job Job Job Job Job Rendering System

Why do we queue up entire Frame? Would seem like additional overhead, but perf analysis shows it is a net win Internal command setup is super-cheap, just some mem copies Engine cache coherency is vastly better D3D driver cache coherency is much better with one giant dump Very low % of total CPU time spent in submission Allows us to filter redundant D3D calls. Call overhead adds up Fast even in DX9

Implementation advantages Once ‘stateless’ concept grasped, code maintaince easy Next to no state-leaking (flickering alpha, textures etc) Because rendering is packetized, individual jobs need little or no communication between each other NO THREADING BUGS

Threaded D3D11 submission Top issues: Generally High driver overhead for batch submission But: D3D11 has multithreaded submission Command Streams not necessarily map 1:1 to CommandLists Civilization V can change how it submits via settings the config files

Step 4: Gloating over results Wildly surpassed commonly held beliefs on # of batches possible, especially with threading Test Driver with native CL support Driver without CL support Units 1686* 931 Landmarks 1152* 673 Lategame 3616* 2052 *Believed to be GPU limited

Conclusions High throughput rendering is possible: IF: care taken to reduce application overhead Job based, pay-load based rendering Redundant state and calls filtered Use D3D11 command lists Engine can peg 12 threads at 97% (sans driver)

D3D11 Features: Tessellation Major addition to D3D11 API [Screenshot]

Terrain Civ5 contains one of the most complex terrain systems ever made Complete procedural process Use GPU to raytrace and anti-alias shadows Caching system to deal with cases where terrain is too big

Tessellation Terrain very high detail, roughly 64x64 heightmap data per hex Triangle count, when zoomed out, can be in the millions Used Tessellation as a ‘drop-in’

Tessellation Cont Simple Bicupic Beta Spline patches Adjusted global tessellation as camera moved in and out A strict performance increase : 10%-40% faster, on both AMD and Nvidia hardware. More Adapative techinques would work even better, but didn’t have time to implement them

Leaders

Leader Rendering Largely done with DX10.1 rendering tech New Variable bit rate compression technology implemented for D3D11. 2.5 GBs of texture data reduced to 150mbs, can be decompressed on the GPU Details forthcoming, research is in publication submission process – extensive use of UAVs

Future Stuff, NO AO

Future Stuff (CS), AO

Q&A