Porting Unity to new APIs Aras Pranckevičius Unity Technologies.

Slides:



Advertisements
Similar presentations
Introduction to Direct3D 10 Course Porting Game Engines to Direct3D 10: Crysis / CryEngine2 Carsten Wenzel.
Advertisements

GPGPU Programming Dominik G ö ddeke. 2Overview Choices in GPGPU programming Illustrated CPU vs. GPU step by step example GPU kernels in detail.
DirectX11 Performance Reloaded
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
June 28 th – July 1 st 2006 Implementing Usability: Insights to improve your chances  CFUnited 2007.
Tailoring Needs Chapter 3. Contents This presentation covers the following: – Design considerations for tailored data-entry screens – Design considerations.
THQ/Gas Powered Games Supreme Commander and Supreme Commander: Forged Alliance Thread for Performance.
EECE476: Computer Architecture Lecture 21: Faster Branches Branch Prediction with Branch-Target Buffers (not in textbook) The University of British ColumbiaEECE.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
1 Hash-Based Indexes Yanlei Diao UMass Amherst Feb 22, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Programming & Programming Languages Overview l Machine operations and machine language. l Example of machine language. l Different types of processor.
SM3121 Software Technology Mark Green School of Creative Media.
Garbage Collection and High-Level Languages Programming Languages Fall 2003.
CS 4730 Level Design CS 4730 – Computer Game Design Credit: Several slides from Walker White (Cornell)
Advanced Web 2012 Lecture 4 Sean Costain PHP Sean Costain 2012 What is PHP? PHP is a widely-used general-purpose scripting language that is especially.
Chocolate Bar! luqili. Milestone 3 Speed 11% of final mark 7%: path quality and speed –Some cleverness required for full marks –Implement some A* techniques.
Mark Nelson What are game engines? Fall 2013
Unit 1 – Improving Productivity Ryan Taroni Instructions ~ 100 words per box.
Overview [See Video file] Architecture Overview.
1 Java Inheritance. 2 Inheritance On the surface, inheritance is a code re-use issue. –we can extend code that is already written in a manageable manner.
1 Software Construction and Evolution - CSSE 375 Exception Handling – Logging & Special Situations Steve Chenoweth Office: Moench Room F220 Phone: (812)
Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Supercomputers – David Bailey (1991) Eileen Kraemer August 25, 2002.
Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
IT253: Computer Organization
CRT State Stuff Dana Robinson The HDF Group. In the next slide, I show a single executable linked to three dlls. Two dlls and the executable were built.
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Design and Integration: Part 2. 2 Plus Delta Feedback Reading and lecture repeat Ambiguous questions on quizzes Attendance quizzes Boring white lecture.
Refactoring & Testability. Testing in OOP programming No life in flexible methodologies and for refactoring- infected developers without SOME kind of.
1 Some Real Problem  What if a program needs more memory than the machine has? —even if individual programs fit in memory, how can we run multiple programs?
Operating Systems Lecture 14 Segments Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard. Zhiqing Liu School of Software Engineering.
Final Project Ideas Patrick Cozzi University of Pennsylvania CIS Fall 2014.
Basic Systems and Software. Were we left off Computers are programmable (formal) machines. Digital information is stored as a series of two states (1.
Dynamic Gaze-Contingent Rendering Complexity Scaling By Luke Paireepinart.
Computer Graphics 3 Lecture 6: Other Hardware-Based Extensions Benjamin Mora 1 University of Wales Swansea Dr. Benjamin Mora.
EECS 473 Advanced Embedded Systems Misc. things Midterm Review.
Query Processing CS 405G Introduction to Database Systems.
CSE 143 Lecture 13 Recursive Backtracking slides created by Ethan Apter
From Use Cases to Implementation 1. Structural and Behavioral Aspects of Collaborations  Two aspects of Collaborations Structural – specifies the static.
INF3110 Group 2 EXAM 2013 SOLUTIONS AND HINTS. But first, an example of compile-time and run-time type checking Imagine we have the following code. What.
1 How will execution time grow with SIZE? int array[SIZE]; int sum = 0; for (int i = 0 ; i < ; ++ i) { for (int j = 0 ; j < SIZE ; ++ j) { sum +=
1 Threads in Java Jingdi Wang. 2 Introduction A thread is a single sequence of execution within a program Multithreading involves multiple threads of.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
From Use Cases to Implementation 1. Mapping Requirements Directly to Design and Code  For many, if not most, of our requirements it is relatively easy.
Lecture 5 Page 1 CS 111 Online Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command.
Gallium3D Graphics Done Right
Processor support devices Part 2: Caches and the MESI protocol
CSE 374 Programming Concepts & Tools
A Real Problem What if you wanted to run a program that needs more memory than you have? September 11, 2018.
Some Real Problem What if a program needs more memory than the machine has? even if individual programs fit in memory, how can we run multiple programs?
Process Creation Processes get created (and destroyed) all the time in a typical computer Some by explicit user command Some by invocation from other running.
Morgan Kaufmann Publishers Memory & Cache
The Small batch (and Other) solutions in Mantle API
More Selections BIS1523 – Lecture 9.
Navigating huge (UE4 (rendering)) code
UMBC Graphics for Games
How to improve (decrease) CPI
Genome Workbench Chuong Huynh NIH/NLM/NCBI New Delhi, India
HW for Computer Graphics
Arithmetic and Decisions
Hold up, wait a minute, let me put some async in it
Discussing an OVS/OVN Split
UE4 Vulkan Updates & Tips
January 15, 2004 Adrienne Noble
CIS 441/541: Introduction to Computer Graphics Lecture 15: shaders
Update : about 8~16% are writes
An Incremental Rendering VM
From Use Cases to Implementation
Presentation transcript:

Porting Unity to new APIs Aras Pranckevičius Unity Technologies

Warm up quotes! “I expect DX12 to give people an appreciation for what DX11 was doing for them” (Tom Forsyth) “Low-level graphics APIs are built for engine developers, not humans” (Tim Sweeney) “GL -> Vulkan is like complaining that your Prius is too slow, so someone gives you all the pieces of a Ferrari” (Brandon Jones)

New APIs are easy when… You’re starting a new engine now, or Have a good console engine, or You have no customers, or It’s a hobby/toy engine, or Somehow made all the right decisions 10 years ago

Bending an existing engine Nai ̈ ve change to use new APIs Shader languages (only Metal & Vulkan) Resource bindings (only DX12 & Vulkan) Threaded work creation …sync, mem alloc, etc. (won’t cover those)

Nai ̈ ve porting Same engine, same structure, same ideas –Just get to whatever you had in DX11/DX9/GL Metal: –Much less code than in GLES –Much faster than iOS GLES DX12: –Similar amount of code to DX11 –Much slower than DX11 

SHADING LANGUAGES

What we have DX12: HLSL –No problem. Same as DX11. Metal: MetalSL –Nice language, but no one else uses it. –Cross-compile (currently: glsl-optimizer) Vulkan: SPIR-V –“bytecode”; will have GLSL -> SPIR-V tools.

What we want HLSL -> SPIR-V, possibly built on clang –Analysis / codegen / upgrade / refactor tools there –Extend HLSL as needed (towards C++/MetalSL like language?) SPIR-V -> LLVM -> SPIR-V xplatform optimizer Platform backends: SPIR-V to... –HLSL/PSSL (D3D9/11/12, XB1, PS4 etc.) –GLSL (GL/ES) –MetalSL (Metal)

RESOURCE BINDINGS

Why so slow? Initial DX12/Vulkan port will not run fast Lots of “late” decisions –e.g. at Draw time, “so what has changed? which tables need update?” etc. –That is the “driver cost”! But drivers were heavily optimized already. Your new code is not.

APIs assume that… You have a lot of knowledge –Which things change and when –Which things don’t change –Layouts of shader resources / constants Ideally you have that knowledge! –Your existing code might not (yet)

Example Constant/Uniform buffers –actually started a while ago in DX10/GL3 Ideal use case: shader CB matches C struct –Works if engine & shaders are developed in sync –We don’t have this luxury :( –People can be on latest Unity, and use shaders written 3 years ago No clear fit into ideal CB/UB model!

More assumption examples Similar now with DX12 descriptor tables –“you can prebuild a lot of static ones upfront” Or with Pipeline State Objects –“you know final PSO state quite early on” In current Unity code that’s not quite true –…yet

Resource Binding, Now Descriptor tables (DX12) & CBs (DX12/Metal): –Linearly allocate –Mostly one-time use only –Essentially everything is treated as dynamic data

Resource Binding, Soon “per draw” vs “everything else” frequencies Fast path when shader data matches what code knows about –Generic “it works” code otherwise

THREADED WORK CREATION

Threaded Work Creation Command buffer “creation” and “submission” separate Can create from multiple threads! –Just like some consoles could for a while

Current rendering (Unity 5.2) Dual-thread –MainThread: high level render logic –RenderThread: does API calls and a bit more –Ringbuffer with our own “command buffer” in between Works on all platforms/APIs –…except WebGL since no threads there yet

Threading the renderer Just make rendering logic etc. be freely threadable –Turns out: quite a lot of mutable/global state  e.g. “I’ll just have a small cache here” –make it per-thread cache? –thread-safe cache? –stop caching? –do something else? and so on

Current DX12 situation Unity 5.2 beta –No big high level changes –No threaded cmdbuffers –All DX12 stuff is on one thread DX12 results not bad –But not great yet either Will improve: –Handle resources better –Threaded cmdbuffers

Current general situation Right now (Unity 5.2) –Not taking full advantage of new APIs yet –A lot of code cleanups happened, benefited all platforms Soon TM –More cleanups and threading for all platforms –Better resource binding for DX12 (and 11 too!) –More threading for DX12/Metal/consoles..we’re not doing Vulkan just yet, but keeping it in mind