Panel Discussion: The Future of I/O From a CPU Architecture Perspective #OFADevWorkshop Brad Benton AMD, Inc.

Slides:

Advertisements

Similar presentations

ATI Stream Computing OpenCL™ Histogram Optimization Illustration Marc Romankewicz April 5, 2010.

Advertisements

ATI Stream Computing ACML-GPU – SGEMM Optimization Illustration Micah Villmow May 30, 2008.

ATI Stream ™ Physics Neal Robison Director of ISV Relations, AMD Graphics Products Group Game Developers Conference March 26, 2009.

OpenCL™ - Parallel computing for CPUs and GPUs Benedict R. Gaster AMD Products Group Lee Howes Office of the CTO.

Cooperative Boosting: Needy versus Greedy Power Management INDRANI PAUL 1,2, SRILATHA MANNE 1, MANISH ARORA 1,3, W. LLOYD BIRCHER 1, SUDHAKAR YALAMANCHILI.

IMPORTANT READ CAREFULLY BEFORE USING THIS PRODUCT LICENSE AGREEMENT AND LIMITED WARRANTY BY INSTALLING OR USING THE SOFTWARE, FILES OR OTHER ELECTRONIC.

PANEL Session : The Future of I/O from a CPU Architecture Perspective #OFADevWorkshop.

EVOLUTION OF MULTIMEDIA & DISPLAY MAZEN SALLOUM 26 FEB 2015.

SCOPIA V7.1 What’s New?. DISCLAIMER and CONFIDENTIALITY This presentation contains confidential information and can only be shared with partners under.

Processes and Resources

Coordinated Energy Management in Heterogeneous Processors INDRANI PAUL 1,2, VIGNESH RAVI 1, SRILATHA MANNE 1, MANISH ARORA 1,3, SUDHAKAR YALAMANCHILI 2.

Windows Server Scalability And Virtualized I/O Fabric For Blade Server

Data Intensive Computing on Heterogeneous Platforms Norm Rubin Fellow GPG graphics products group AMD HPEC 2009.

HETEROGENEOUS SYSTEM COHERENCE FOR INTEGRATED CPU-GPU SYSTEMS JASON POWER*, ARKAPRAVA BASU*, JUNLI GU †, SOORAJ PUTHOOR †, BRADFORD M BECKMANN †, MARK.

DXVA 2.0 A new Hardware Video Acceleration Pipeline for Windows Vista

AMD platform security processor

OpenCL Introduction A TECHNICAL REVIEW LU OCT

FPGA and ASIC Technology Comparison - 1 © 2009 Xilinx, Inc. All Rights Reserved How do I Get Started with PlanAhead?

Contact Information Office: 225 Neville Hall Office Hours: Monday and Wednesday 12:00-1:00 and by appointment.

Copyright 2011, Atmel December, 2011 Atmel ARM-based Flash Microcontrollers 1 1.

OpenCL Introduction AN EXAMPLE FOR OPENCL LU OCT

1| AMD FirePro™ / Creo 2.0 Launch Event | April 2012 | Confidential – NDA Required AMD FIREPRO ™ / CREO 2.0 Sales Deck April 2012.

Sequential Consistency for Heterogeneous-Race-Free DEREK R. HOWER, BRADFORD M. BECKMANN, BENEDICT R. GASTER, BLAKE A. HECHTMAN, MARK D. HILL, STEVEN K.

Firmware Storage : Technical Overview Copyright © Intel Corporation Intel Corporation Software and Services Group.

Recognizing Potential Parallelism Introduction to Parallel Programming Part 1.

Enhancement Package Innovations Gabe Rodriguez - Halliburton Stefan Kneis – SAP Marco Valencia - SAP.

Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.

ATI Stream Computing ATI Radeon™ HD 2900 Series GPU Hardware Overview Micah Villmow May 30, 2008.

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

C O N F I D E N T I A LC O N F I D E N T I A L ATI FireGL ™ Workstation Graphics from AMD April 2008 AMD Graphics Product Group.

STRUCTURAL AGNOSTIC SPMV: ADAPTING CSR-ADAPTIVE FOR IRREGULAR MATRICES MAYANK DAGA AND JOSEPH L. GREATHOUSE AMD RESEARCH ADVANCED MICRO DEVICES, INC.

FAULTSIM: A FAST, CONFIGURABLE MEMORY-RESILIENCE SIMULATOR DAVID A. ROBERTS, AMD RESEARCH PRASHANT J. NAIR, GEORGIA INSTITUTE OF TECHNOLOGY

SIMULATION OF EXASCALE NODES THROUGH RUNTIME HARDWARE MONITORING JOSEPH L. GREATHOUSE, ALEXANDER LYASHEVSKY, MITESH MESWANI, NUWAN JAYASENA, MICHAEL IGNATOWSKI.

SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †

IMPLEMENTING A LEADING LOADS PERFORMANCE PREDICTOR ON COMMODITY PROCESSORS BO SU † JOSEPH L. GREATHOUSE ‡ JUNLI GU ‡ MICHAEL BOYER ‡ LI SHEN † ZHIYING.

Permission to reprint or distribute any content from this presentation requires the prior written approval of Standard & Poor’s. Copyright © 2011 Standard.

Wi-Fi BT/BLE Combo Module WINC3400 hands-on

PPEP: ONLINE PERFORMANCE, POWER, AND ENERGY PREDICTION FRAMEWORK BO SU † JUNLI GU ‡ LI SHEN † WEI HUANG ‡ JOSEPH L. GREATHOUSE ‡ ZHIYING WANG † † NUDT.

From Source Code to Packages and even whole distributions By Cool Person From openSUSE.

µC-States: Fine-grained GPU Datapath Power Management

Joseph L. GreathousE, Mayank Daga AMD Research 11/20/2014

Measuring and Modeling On-Chip Interconnect Power on Real Hardware

BLIS optimized for EPYCTM Processors

Virtual frame buffer and VSYNC

Parallelspace PowerPoint Template for ArchiMate® 2.1 version 1.1

Parallelspace PowerPoint Template for ArchiMate® 2.1 version 2.0

The Small batch (and Other) solutions in Mantle API

Heterogeneous System coherence for Integrated CPU-GPU Systems

Many-core Software Development Platforms

Antonio R. Miele Marco D. Santambrogio Politecnico di Milano

Blake A. Hechtman†§, Shuai Che†, Derek R. Hower†, Yingying Tian†Ϯ,

In-depth on the memory system

Harris Gasparakis, Ph.D. Raghunath Rao, Ph.D.

Navigating the Portal 18 December 2017 Version 1.0.

SOC Runtime Gregory Stoner.

libflame optimizations with BLIS

Self-Registration walk-through

Certifying graphics experiences on Windows 8

Interference from GPU System Service Requests

Simulation of exascale nodes through runtime hardware monitoring

Ideas for adding FPGA Accelerators to DPDK

Interference from GPU System Service Requests

Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.

Transparency Reporting: Status

RegMutex: Inter-Warp GPU Register Time-Sharing

Machine Learning for Performance and Power Modeling of Heterogeneous Systems Joseph L. Greathouse, Gabriel H. Loh Advanced Micro Devices, Inc.

Building and running HPC apps in Windows Azure

Advanced Micro Devices, Inc.

Jason Stewart (AMD) | Rolando Caloca O. (Epic Games) | 21 March 2018

Ajit Mathews Corp. VP Software Development ML Software Engineering

Presentation transcript:

Panel Discussion: The Future of I/O From a CPU Architecture Perspective #OFADevWorkshop Brad Benton AMD, Inc.

Future of I/O from CPU/Arch Perspective Issues Move to Exascale involves more parallel processing across more processing elements –GPUs, FPGAs, Specialized accelerators, Processing In Memory (PIM), etc. These elements frequently do not live in the same coherent memory domain Results in much data movement to bring data to the needed processing element March 30 – April 2, 2014#OFADevWorkshop2

Future of I/O from CPU/Arch Perspective Heterogeneous System Architecture (HSA) HSA Foundation –“Not-for-Profit Industry Consortium of SOC and SOC IP vendors, OEMs, academia, OSVs and ISVs defining a consistent heterogeneous platform architecture to make it dramatically easier to program heterogeneous parallel devices” –Develops open, royalty-free industry specifications and APIs for heterogeneous computing – Goal: Bring Accelerators (and other devices) forward as first class citizens –Unified address space –Operate in pageable system memory –Full memory coherence –User mode dispatch/scheduling March 30 – April 2, 2014#OFADevWorkshop3

Future of I/O from CPU/Arch Perspective HSA Foundation Membership 50+ Members –Multiple categories of membership Founding Members –AMD –ARM –Imagination –MediaTek –Qualcomm –Samsung –Texas Instruments March 30 – April 2, 2014#OFADevWorkshop4

Future of I/O from CPU/Arch Perspective Application-level access to HSA devices Application talks directly to hardware –No system call –No kernel driver involved –Hardware scheduling –Low dispatch latency Standardized binary packet format –Architected Queueing Language (AQL) March 30 – April 2, 2014#OFADevWorkshop5 Application A Application B Application C HSADevice A A A B B B C C C C C

Future of I/O from CPU/Arch Perspective HSA is Designed to go Beyond the GPU March 30 – April 2, 2014#OFADevWorkshop6 CPU GPU Shared Memory, Coherency, User Mode Queues Audio Processor Audio Processor Video Hardware Video Hardware DSP Image Signal Processing Image Signal Processing Fixed Function Accelerator Fixed Function Accelerator Encode Decode Engines Encode Decode Engines

Future of I/O from CPU/Arch Perspective HSA and I/O Devices Okay…HSA looks interesting from the viewpoint of accelerators, how is this related to I/O? I/O devices can participate, as well! The HSA and its programming model are designed to allow any I/O devices capable of page fault handling to participate in a shared virtual address space. March 30 – April 2, 2014#OFADevWorkshop7

Future of I/O from CPU/Arch Perspective HSA is Designed to go Beyond the GPU March 30 – April 2, 2014#OFADevWorkshop8 CPU GPU Shared Memory, Coherency, User Mode Queues Audio Processor Audio Processor Video Hardware Video Hardware DSP Image Signal Processing Image Signal Processing Fixed Function Accelerator Fixed Function Accelerator Encode Decode Engines Encode Decode Engines

Future of I/O from CPU/Arch Perspective HSA is Designed to go Beyond the GPU March 30 – April 2, 2014#OFADevWorkshop9 CPU GPU Shared Memory, Coherency, User Mode Queues Audio Processor Audio Processor Video Hardware Video Hardware DSP Image Signal Processing Image Signal Processing Fixed Function Accelerator Fixed Function Accelerator I/O Devices

Future of I/O from CPU/Arch Perspective HSA CPU/GPU Queueing March 30 – April 2, 2014#OFADevWorkshop10 CPUGPU

Future of I/O from CPU/Arch Perspective HSA CPU/GPU Queueing + NIC Queueing March 30 – April 2, 2014#OFADevWorkshop11 CPUGPU NIC

Future of I/O from CPU/Arch Perspective HSA IOMMU/Device Interactions March 30 – April 2, 2014#OFADevWorkshop12

Future of I/O from CPU/Arch Perspective HSA and I/O Devices March 30 – April 2, 2014#OFADevWorkshop13 Implications for OFA Software stack –Unified memory addressing via system-wide accessible page tables –Support for no pinning, on-demand paging options –Aligns with requirements from MPI and PGAS input to OFIWG to simplify memory registration

Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2014 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.

#OFADevWorkshop Thank You