Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.

Slides:



Advertisements
Similar presentations
An Overview Of Virtual Machine Architectures Ross Rosemark.
Advertisements

Threads, SMP, and Microkernels
Programming Technologies, MIPT, April 7th, 2012 Introduction to Binary Translation Technology Roman Sokolov SMWare
Virtualisation From the Bottom Up From storage to application.
CS533 Concepts of Operating Systems Class 14 Virtualization and Exokernels.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Disco Running Commodity Operating Systems on Scalable Multiprocessors Presented by Petar Bujosevic 05/17/2005 Paper by Edouard Bugnion, Scott Devine, and.
Keith Wiles DPACC vNF Overview and Proposed methods Keith Wiles – v0.5.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Bugnion et al. Presented by: Ahmed Wafa.
G Robert Grimm New York University Disco.
Disco Running Commodity Operating Systems on Scalable Multiprocessors.
Slide 3-1 Copyright © 2004 Pearson Education, Inc. Operating Systems: A Modern Perspective, Chapter 3 Operating System Organization.
Virtualization for Cloud Computing
Distributed Systems CS Virtualization- Overview Lecture 22, Dec 4, 2013 Mohammad Hammoud 1.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
Microkernels, virtualization, exokernels Tutorial 1 – CSC469.
UNIX System Administration OS Kernal Copyright 2002, Dr. Ken Hoganson All rights reserved. OS Kernel Concept Kernel or MicroKernel Concept: An OS architecture-design.
A Cloud is a type of parallel and distributed system consisting of a collection of inter- connected and virtualized computers that are dynamically provisioned.
CS533 Concepts of Operating Systems Jonathan Walpole.
Kenichi Kourai (Kyushu Institute of Technology) Takuya Nagata (Kyushu Institute of Technology) A Secure Framework for Monitoring Operating Systems Using.
Operating System Support for Virtual Machines Samuel T. King, George W. Dunlap,Peter M.Chen Presented By, Rajesh 1 References [1] Virtual Machines: Supporting.
©2003/04 Alessandro Bogliolo Computer systems A quick introduction.
Virtualization Concepts Presented by: Mariano Diaz.
Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.
Dynamic Reconfiguration Dynamic selection of handler functionality: currently through use of parameterizable handlers or by selecting from a set of existing.
Virtualization: Not Just For Servers Hollis Blanchard PowerPC kernel hacker.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.
INTRODUCTION SOFTWARE HARDWARE DIFFERENCE BETWEEN THE S/W AND H/W.
Chapter 2 Operating System Overview
Our work on virtualization Chen Haogang, Wang Xiaolin {hchen, Institute of Network and Information Systems School of Electrical Engineering.
Heterogeneous Multikernel OS Yauhen Klimiankou BSUIR
Threads, SMP, and Microkernels Chapter 4. Process Resource ownership - process is allocated a virtual address space to hold the process image Scheduling/execution-
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine, and Mendel Rosenblum Summary By A. Vincent Rayappa.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Edouard et al. Madhura S Rama.
“Trusted Passages”: Meeting Trust Needs of Distributed Applications Mustaque Ahamad, Greg Eisenhauer, Jiantao Kong, Wenke Lee, Bryan Payne and Karsten.
 Virtual machine systems: simulators for multiple copies of a machine on itself.  Virtual machine (VM): the simulated machine.  Virtual machine monitor.
Disco : Running commodity operating system on scalable multiprocessor Edouard et al. Presented by Vidhya Sivasankaran.
1 Threads, SMP, and Microkernels Chapter Multithreading Operating system supports multiple threads of execution within a single process MS-DOS.
Memory Hierarchy Adaptivity An Architectural Perspective Alex Veidenbaum AMRM Project sponsored by DARPA/ITO.
An Execution Model for Heterogeneous Multicore Architectures Gregory Diamos, Andrew Kerr, and Sudhakar Yalamanchili Computer Architecture and Systems Laboratory.
1.4 Hardware Review. CPU  Fetch-decode-execute cycle 1. Fetch 2. Bump PC 3. Decode 4. Determine operand addr (if necessary) 5. Fetch operand from memory.
A. Frank - P. Weisberg Operating Systems Structure of Operating Systems.
Department of Computer Science and Software Engineering
Full and Para Virtualization
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Disco: Running Commodity Operating Systems on Scalable Multiprocessors Presented by: Pierre LaBorde, Jordan Deveroux, Imran Ali, Yazen Ghannam, Tzu-Wei.
Operating Systems Unit 2: – Process Context switch Interrupt Interprocess communication – Thread Thread models Operating Systems.
Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Threads, SMP, and Microkernels Chapter 4. Processes and Threads Operating systems use processes for two purposes - Resource allocation and resource ownership.
CSCI/CMPE 4334 Operating Systems Review: Exam 1 1.
Compiler Research How I spent my last 22 summer vacations Philip Sweany.
7/2/20161 Re-architecting VMMs for Multicore Systems: The Sidecore Approach Presented by: Sanjay Kumar PhD Candidate, Georgia Institute of Technology Co-Authors:
CT101: Computing Systems Introduction to Operating Systems.
Virtualization Neependra Khare
Introduction to Operating Systems
Review of computer processing and the basic of Operating system
Chapter 4: Threads.
Threads, SMP, and Microkernels
Chapter 15, Exploring the Digital Domain
Chapter 4: Threads.
for Network Processors
Data Path through host/ANP.
A Survey on Virtualization Technologies
Lecture 4- Threads, SMP, and Microkernels
CSC3050 – Computer Architecture
Research: Past, Present and Future
Presentation transcript:

Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 2 Software Challenges of Heterogeneity Programming Model Programming Model Execution Model Execution Model Portability Portability Performance Performance

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 3 Pooled Accelerator Execution Model Instance Heterogeneous multiprocessor systems are viewed as a pool of processors, each potentially with a unique ISA and system interface Applications that make full use of these systems must include binaries compatible with each accelerator ISA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Execution Model Configuration of the Machine Model Architecture description specifies configuration of accelerators and processors & communicates QoS requirements Kernel Stream Elements Control Thread Stream ACC … Local Memory DMACache FIFO Multicore processor 1Accelerator 1 Memory Programming Model Accelerator-based Code Segment – compiled for specific device/driver combination System Architecture Description Source Program Compilation Environment HVM

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 5 Goals of Harmony Low Overhead Low Overhead Comparable to or better than hand tuned applications Comparable to or better than hand tuned applications System Configuration Agnostic System Configuration Agnostic Correct execution on a system with any number and type of heterogeneous architectures Correct execution on a system with any number and type of heterogeneous architectures No code modification required No code modification required Scalable Scalable EP application performance should scale with the number of devices EP application performance should scale with the number of devices Familiar Familiar Do not require any more than current programming model of threaded applications for homogeneous architectures Do not require any more than current programming model of threaded applications for homogeneous architectures Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Key Idea Accelerator kernel deployment based on static and dynamic inter-kernel dependencies Accelerator kernel deployment based on static and dynamic inter-kernel dependencies Inspired by ILP scheduling techniques Inspired by ILP scheduling techniques Kernels are “issued” to accelerators and their execution is “committed” to release dependent kernels Kernels are “issued” to accelerators and their execution is “committed” to release dependent kernels op Dependence resolution op ReadyBuffer Issue From Application Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 7 Harmony Architecture & Operation Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 8 Harmony Runtime Operation Accelerator kernels are mapped to specific architectures based on Accelerator kernels are mapped to specific architectures based on Architectures in the system Architectures in the system Available implementations Available implementations Performance Performance Results are forwarded to waiting functions Results are forwarded to waiting functions Can support speculation Can support speculation Results are committed in order Results are committed in order Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 9 Application Development Programmer supplied (Harmony) checks on entry/exit to accelerator kernels Programmer supplied (Harmony) checks on entry/exit to accelerator kernels Marshalling of operands when a accelerator kernel is invoked Marshalling of operands when a accelerator kernel is invoked May employ multiple (static) implementations corresponding to multiple accelerators May employ multiple (static) implementations corresponding to multiple accelerators Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 10 Preliminary Performance Evaluation 3.1% Overhead 3.8% Overhead Matrix Multiplication Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 11 Scheduling Overhead Harmony

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 12 Extensions to FPGAs Maintain the base Harmony deployment model Maintain the base Harmony deployment model Accelerator pools Accelerator pools Associate a Harmony thread with each FPGA-based accelerator Associate a Harmony thread with each FPGA-based accelerator Virtualize the FPGA fabric Virtualize the FPGA fabric Demand-driven vs. static configuration of the fabric Demand-driven vs. static configuration of the fabric Adapt existing register allocation based scheduling techniques Adapt existing register allocation based scheduling techniques Example: Virtualized Packet Schedulers (Sponsor: RNET Technologies) Example: Virtualized Packet Schedulers (Sponsor: RNET Technologies) Poster Session Poster Session Extensions to FPGAs

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS FPGA-Based Accelerator Architecture Volatile (DRAM)‏ Nonvolatile (FLASH)‏ PCIe/Hypertransport/CSI Interface PowerPC EncryptDecrypt FFT Memory Controller Switch NI Extensions to FPGAs

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Accelerator Configuration Volatile (DRAM)‏ Nonvolatile (FLASH)‏ PCIe/Hypertransport/CSI Interface PowerPC Memory Controller Switch NI Host Driver Host (DRAM)‏ EncryptDecrypt Switch NI Harmony Thread Address translation in the NI allows isolated paths between accelerators and memory FFTNI Harmony Thread Future

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS Virtual Machine Monitor User Software Guest OS Heterogeneous Virtual Machines Heterogeneous Virtual Machines Local Memory Cache ACC DMA FIFO Local Memory Cache Local Memory Cache ACC DMA FIFO ACC DMA FIFO Network SW Resources HW Resources CPU isolation security legacy systems User Software Guest OS PIs: A. Gavrilovska, K. Schwan, S. Yalamanchili Virtualization of accelerator resources Consolidation and sharing of accelerators Looking Ahead

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY CERCS 16 Questions?