The Small batch (and Other) solutions in Mantle API

Slides:

Advertisements

Similar presentations

ATI Stream Computing OpenCL™ Histogram Optimization Illustration Marc Romankewicz April 5, 2010.

Advertisements

ATI Stream Computing ACML-GPU – SGEMM Optimization Illustration Micah Villmow May 30, 2008.

ATI Stream ™ Physics Neal Robison Director of ISV Relations, AMD Graphics Products Group Game Developers Conference March 26, 2009.

OpenCL™ - Parallel computing for CPUs and GPUs Benedict R. Gaster AMD Products Group Lee Howes Office of the CTO.

Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI.

EVOLUTION OF MULTIMEDIA & DISPLAY MAZEN SALLOUM 26 FEB 2015.

HOLY SMOKE! FASTER PARTICLE RENDERING USING DIRECTCOMPUTE AMD AND MICROSOFT DEVELOPER DAY, JUNE 2014, STOCKHOLM GARETH THOMAS 2 ND JUNE 2014.

1 The World’s Most Power Efficient Processors Performance and Scalability for Enterprise, Mobile and Embedded Solutions ARM Cortex Processors The World’s.

Tools for Investigating Graphics System Performance

Jul The New Geant4 License J. Perl The New Geant4 License Makes clear the user’s wide- ranging freedom to use, extend or redistribute Geant4, even.

Coordinated Energy Management in Heterogeneous Processors INDRANI PAUL 1,2, VIGNESH RAVI 1, SRILATHA MANNE 1, MANISH ARORA 1,3, SUDHAKAR YALAMANCHILI 2.

Panel Discussion: The Future of I/O From a CPU Architecture Perspective #OFADevWorkshop Brad Benton AMD, Inc.

OPTIMIZING AND DEBUGGING GRAPHICS APPLICATIONS WITH AMD'S GPU PERFSTUDIO 2.5 GPG Developer Tools Gordon Selley Peter Lohrmann GDC 2011.

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU †, SOORAJ PUTHOOR †, BRADFORD M BECKMANN †, MARK.

Filtering Approaches for Real-Time Anti-Aliasing /

AMD platform security processor

OpenCL Introduction A TECHNICAL REVIEW LU OCT

Computing Hardware Starter.

Computer Graphics Graphics Hardware

Copyright 2011, Atmel December, 2011 Atmel ARM-based Flash Microcontrollers 1 1.

OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT

1| AMD FirePro™ / Creo 2.0 Launch Event | April 2012 | Confidential – NDA Required AMD FIREPRO ™ / CREO 2.0 Sales Deck April 2012.

Next-Generation Graphics APIs: Similarities and Differences Tim Foley NVIDIA Corporation

Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K.

Hardware. Make sure you have paper and pen to hand as you will need to take notes and write down answers and thoughts that you can refer to later on.

ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008.

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

C O N F I D E N T I A LC O N F I D E N T I A L ATI FireGL ™ Workstation Graphics from AMD April 2008 AMD Graphics Product Group.

© GCSE Computing Computing Hardware Starter. Creating a spreadsheet to demonstrate the size of memory. 1 byte = 1 character or about 1 pixel of information.

STRUCTURAL AGNOSTIC SPMV: ADAPTING CSR-ADAPTIVE FOR IRREGULAR MATRICES MAYANK DAGA AND JOSEPH L. GREATHOUSE AMD RESEARCH ADVANCED MICRO DEVICES, INC.

FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY

SIMULATION OF EXASCALE NODES THROUGH RUNTIME HARDWARE MONITORING JOSEPH L. GREATHOUSE, ALEXANDER LYASHEVSKY, MITESH MESWANI, NUWAN JAYASENA, MICHAEL IGNATOWSKI.

SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †

IMPLEMENTING A LEADING LOADS PERFORMANCE PREDICTOR ON COMMODITY PROCESSORS BO SU † JOSEPH L. GREATHOUSE ‡ JUNLI GU ‡ MICHAEL BOYER ‡ LI SHEN † ZHIYING.

Wi-Fi BT/BLE Combo Module WINC3400 hands-on

PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK BO SU † JUNLI GU ‡ LI SHEN † WEI HUANG ‡ JOSEPH L. GREATHOUSE ‡ ZHIYING WANG † † NUDT.

From Source Code to Packages and even whole distributions By Cool Person From openSUSE.

Computer Graphics Graphics Hardware

µC-States: Fine-grained GPU Datapath Power Management

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

ATI Stream Computing ACML-GPU – SGEMM Optimization Illustration

Measuring and Modeling On-Chip Interconnect Power on Real Hardware

Build /24/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.

BLIS optimized for EPYCTM Processors

Data Platform and Analytics Foundational Training

TechEd /9/ :26 PM © 2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks.

Build advanced PowerApps that work offline!

Heterogeneous System coherence for Integrated CPU-GPU Systems

Many-core Software Development Platforms

Blake A. Hechtman†§, Shuai Che†, Derek R. Hower†, Yingying Tian†Ϯ,

Understanding Wi-Fi Direct in Windows 8

SOC Runtime Gregory Stoner.

libflame optimizations with BLIS

Certifying graphics experiences on Windows 8

Interference from GPU System Service Requests

Simulation of exascale nodes through runtime hardware monitoring

Interference from GPU System Service Requests

Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.

RegMutex: Inter-Warp GPU Register Time-Sharing

Compute Shaders Optimize your engine using compute

Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.

2.C Memory GCSE Computing Langley Park School for Boys.

Computer Graphics Graphics Hardware

Advanced Micro Devices, Inc.

Jason Stewart (AMD) | Rolando Caloca O. (Epic Games) | 21 March 2018

Ajit Mathews Corp. VP Software Development ML Software Engineering

Presentation transcript:

The Small batch (and Other) solutions in Mantle API Guennadi Riguer – Mantle Chief Architect

Small batch performance Problems Abstraction level Small batch performance Platform efficiency

Wrong Abstraction Level Unpredictable big black box Current situation: neither fast nor simple Too high to be fast, too low to be simple

Small Batch Performance 10K 100K Most games today Really optimized games Where you want to be (Mantle target)

Texture atlases & arrays Previous Solutions Geometry instancing Geometry shaders Texture atlases & arrays Uber-shaders Command recorders …

Why They Failed? Development and content creation limitations Trading driver overhead for engine performance Trading CPU performance for GPU overhead

Empower developers to do what they want What is Mantle? Lower level API Focus on performance Empower developers to do what they want

Predictable performance Why Mantle? Better performance Predictable performance Developer control

Key Features

Execution Model GPU Queues Application . . . Graphics Compute DMA App thread

Execution Model GPU Queues Application . . . Graphics Compute DMA App thread

Execution Model GPU Application Queues . . . Graphics Compute DMA App thread Queues

Execution Model Queues Application GPU . . . Graphics Compute DMA App thread Graphics Compute DMA GPU . . .

Memory & Resources Application controls memory Application handles hazards Generalized resources

Pre-build & Pre-validate Pipelines Resource binding Multi-use command buffers

Platform Considerations APUs & SOCs are here No longer CPU vs. GPU Race to low power

Power Efficiency - StarSwarm FPS Power (W) FPS/W The selection of resolutions and settings was made to create both CPU and GPU limited cases for completeness of the produced results. Starswarm using RTS preset @1080p running on APU A10-7800 @ 3.5GHz, 4GB 2133MHz RAM, A88X-Pro M/B

Power Efficiency - Games FPS Power (W) FPS/W The selection of resolutions and settings was made to create both CPU and GPU limited cases for completeness of the produced results. Battlefield 4 BrokenFlightDeck level @720p and @1080p MEDIUM settings and SSAO enabled running on APU A10-7800 @ 3.5GHz, 4GB 2133MHz RAM, A88X-Pro M/B Thief built-in benchmark running on same hardware with low settings at 720x480 and normal settings at 720p

Lessons Learned

Think Outside the Box Don’t just make something faster …avoid doing it completely Design API and driver together

New Driver Design Even a bit of sync kills many-core performance “Thick” driver = cache pollution “Make it the application’s problem” 

The whole is greater than the sum of its parts Lots of Little Things… The whole is greater than the sum of its parts

Future HW Considerations HW small batch, anyone? Command processing bottlenecks More operations/batches in flight

Applications must "do the right thing“ Challenges Programming is harder Ecosystem must change Applications must "do the right thing“

Summary Mantle fixes abstraction level Mantle improves platform efficiency Mantle leads industry transformation

Questions?

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. DirectX is a registered trademark of Microsoft Corporation.