The Small batch (and Other) solutions in Mantle API Guennadi Riguer – Mantle Chief Architect
Small batch performance Problems Abstraction level Small batch performance Platform efficiency
Wrong Abstraction Level Unpredictable big black box Current situation: neither fast nor simple Too high to be fast, too low to be simple
Small Batch Performance 10K 100K Most games today Really optimized games Where you want to be (Mantle target)
Texture atlases & arrays Previous Solutions Geometry instancing Geometry shaders Texture atlases & arrays Uber-shaders Command recorders …
Why They Failed? Development and content creation limitations Trading driver overhead for engine performance Trading CPU performance for GPU overhead
Empower developers to do what they want What is Mantle? Lower level API Focus on performance Empower developers to do what they want
Predictable performance Why Mantle? Better performance Predictable performance Developer control
Key Features
Execution Model GPU Queues Application . . . Graphics Compute DMA App thread
Execution Model GPU Queues Application . . . Graphics Compute DMA App thread
Execution Model GPU Application Queues . . . Graphics Compute DMA App thread Queues
Execution Model Queues Application GPU . . . Graphics Compute DMA App thread Graphics Compute DMA GPU . . .
Memory & Resources Application controls memory Application handles hazards Generalized resources
Pre-build & Pre-validate Pipelines Resource binding Multi-use command buffers
Platform Considerations APUs & SOCs are here No longer CPU vs. GPU Race to low power
Power Efficiency - StarSwarm FPS Power (W) FPS/W The selection of resolutions and settings was made to create both CPU and GPU limited cases for completeness of the produced results. Starswarm using RTS preset @1080p running on APU A10-7800 @ 3.5GHz, 4GB 2133MHz RAM, A88X-Pro M/B
Power Efficiency - Games FPS Power (W) FPS/W The selection of resolutions and settings was made to create both CPU and GPU limited cases for completeness of the produced results. Battlefield 4 BrokenFlightDeck level @720p and @1080p MEDIUM settings and SSAO enabled running on APU A10-7800 @ 3.5GHz, 4GB 2133MHz RAM, A88X-Pro M/B Thief built-in benchmark running on same hardware with low settings at 720x480 and normal settings at 720p
Lessons Learned
Think Outside the Box Don’t just make something faster …avoid doing it completely Design API and driver together
New Driver Design Even a bit of sync kills many-core performance “Thick” driver = cache pollution “Make it the application’s problem”
The whole is greater than the sum of its parts Lots of Little Things… The whole is greater than the sum of its parts
Future HW Considerations HW small batch, anyone? Command processing bottlenecks More operations/batches in flight
Applications must "do the right thing“ Challenges Programming is harder Ecosystem must change Applications must "do the right thing“
Summary Mantle fixes abstraction level Mantle improves platform efficiency Mantle leads industry transformation
Questions?
Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. DirectX is a registered trademark of Microsoft Corporation.