Download presentation
1
Firaxis LORE And other uses of D3D11
2
Low Overhead Rendering Engine
Or, how I learned to Render 15,000+ batches at 60 FPS
3
Overview Civ 5 is a big game, covers 6000 years of history
The entire map can be populated/ polluted with all sorts of things the user creates Need to be able to render a huge amount of possibly disparate types
5
Early Goals Build brand new Engine for Civilization V
Like the game, we wanted graphics engine to be able to ‘stand the test of time’ Decided while D3D11 was in Alpha to build the engine natively for D3D11 architecture, and map backwards to DX9
6
Step 1: Cutting the overhead down
Shaders start in Firaxis Shading Language (FSL) superset of HLSL Compiles into CPP and Header file – all shader constants are mapped to structs, grouped into packages where all packages have same bindings Model Code is templated – FSL generated header is then bound with template code Result is tiny amount of code that fills out required shading, barely shows up on profiling FSL Files CPP / H Template Code Compile Time Glue Code
7
Step 2: Abstracting the Rendering
Still have to Support DX9, might have to support consoles in future Might have to write a ‘driver’ Our solution: Make DX9 ‘look like’ DX11 Started with as a restricted design as possible, and expanded as we needed to
8
Packetized Rendering Stateless rendering, much simpler then D3D
Command based – all rendering is performed by self contained command A command set may contain a list of surfaces to render, each with shader constant payload A surface is an immutable bundle of an IB, VB, textures, shader def, etc All state is bundled into a packages Alpha State, Z State, etc. Commands reference one of these state packages Entire Frame is queued up Minimal per frame allocation
9
Only 5 Types of commands COMMAND_RENDER_BATCHES COMMAND_GENERATE_MIPS
A List of surfaces to render into 1 or more rendertargets, with alpha and Zstate bundles Surfaces have IB, VB, sampler and texture bundles. All required state is specified COMMAND_GENERATE_MIPS COMMAND_RESOLVE_RENDERTEXTURE COMMAND_COPY_RENDERTEXTURE COMMAND_COPY_RESOURCE
10
Packetized Rendering Command Stream Rendering System Command Stream
Rendering Engine D3D/Driver
11
Step 3: Threading Command Stream Command Stream Command Stream
Job Manager Job Command Stream Job Job Job Job Job Job Job Job Job Job Job Job Rendering System
12
Why do we queue up entire Frame?
Would seem like additional overhead, but perf analysis shows it is a net win Internal command setup is super-cheap, just some mem copies Engine cache coherency is vastly better D3D driver cache coherency is much better with one giant dump Very low % of total CPU time spent in submission Allows us to filter redundant D3D calls. Call overhead adds up Fast even in DX9
13
Implementation advantages
Once ‘stateless’ concept grasped, code maintaince easy Next to no state-leaking (flickering alpha, textures etc) Because rendering is packetized, individual jobs need little or no communication between each other NO THREADING BUGS
14
Threaded D3D11 submission
Top issues: Generally High driver overhead for batch submission But: D3D11 has multithreaded submission Command Streams not necessarily map 1:1 to CommandLists Civilization V can change how it submits via settings the config files
15
Step 4: Gloating over results
Wildly surpassed commonly held beliefs on # of batches possible, especially with threading Test Driver with native CL support Driver without CL support Units 1686* 931 Landmarks 1152* 673 Lategame 3616* 2052 *Believed to be GPU limited
17
Conclusions High throughput rendering is possible: IF:
care taken to reduce application overhead Job based, pay-load based rendering Redundant state and calls filtered Use D3D11 command lists Engine can peg 12 threads at 97% (sans driver)
18
D3D11 Features: Tessellation
Major addition to D3D11 API [Screenshot]
19
Terrain Civ5 contains one of the most complex terrain systems ever made Complete procedural process Use GPU to raytrace and anti-alias shadows Caching system to deal with cases where terrain is too big
20
Tessellation Terrain very high detail, roughly 64x64 heightmap data per hex Triangle count, when zoomed out, can be in the millions Used Tessellation as a ‘drop-in’
21
Tessellation Cont Simple Bicupic Beta Spline patches
Adjusted global tessellation as camera moved in and out A strict performance increase : 10%-40% faster, on both AMD and Nvidia hardware. More Adapative techinques would work even better, but didn’t have time to implement them
22
Leaders
23
Leader Rendering Largely done with DX10.1 rendering tech
New Variable bit rate compression technology implemented for D3D11. 2.5 GBs of texture data reduced to 150mbs, can be decompressed on the GPU Details forthcoming, research is in publication submission process – extensive use of UAVs
24
Future Stuff, NO AO
25
Future Stuff (CS), AO
26
Q&A
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.