Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,

Slides:

Advertisements

Similar presentations

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Advertisements

Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.

Extensibility, Safety and Performance in the SPIN Operating System Presented by Allen Kerr.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Helper Threads via Virtual Multithreading on an experimental Itanium 2 processor platform. Perry H Wang et. Al.

Extensibility, Safety and Performance in the SPIN Operating System Department of Computer Science and Engineering, University of Washington Brian N. Bershad,

Introduction to Operating Systems CS-2301 B-term Introduction to Operating Systems CS-2301, System Programming for Non-majors (Slides include materials.

Disco Running Commodity Operating Systems on Scalable Multiprocessors.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Copyright Arshi Khan1 System Programming Instructor Arshi Khan.

 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.

CSE378 Gen. Intro1 Machine Organization and Assembly Language Programming Machine Organization –Hardware-centric view (in this class) –Not at the transistor.

Computer System Architectures Computer System Software

2017/4/21 Towards Full Virtualization of Heterogeneous Noc-based Multicore Embedded Architecture 2012 IEEE 15th International Conference on Computational.

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

Microkernels, virtualization, exokernels Tutorial 1 – CSC469.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

Operating Systems CS3502 Fall 2014 Dr. Jose M. Garrido

Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.

CS533 Concepts of Operating Systems Jonathan Walpole.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:

University of Michigan Electrical Engineering and Computer Science 1 Dynamic Acceleration of Multithreaded Program Critical Paths in Near-Threshold Systems.

Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.

Fall 2000M.B. Ibáñez Lecture 01 Introduction What is an Operating System? The Evolution of Operating Systems Course Outline.

Parallel and Distributed Systems Instructor: Xin Yuan Department of Computer Science Florida State University.

CS 390- Unix Programming Environment CS 390 Unix Programming Environment Topics to be covered: Distributed Computing Fundamentals.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto OS Design.

1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah

Chapter 2 Parallel Architecture. Moore’s Law The number of transistors on a chip doubles every years. – Has been valid for over 40 years – Can’t.

المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.

OPERATING SYSTEMS Goals of the course Definitions of operating systems Operating system goals What is not an operating system Computer architecture O/S.

Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

Architectural Characterization of an IBM RS6000 S80 Server Running TPC-W Workloads Lei Yang & Shiliang Hu Computer Sciences Department, University of.

I/O Computer Organization II 1 Interconnecting Components Need interconnections between – CPU, memory, I/O controllers Bus: shared communication channel.

Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.

EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM

Harmony: A Run-Time for Managing Accelerators Sponsor: LogicBlox Inc. Gregory Diamos and Sudhakar Yalamanchili.

A few issues on the design of future multicores André Seznec IRISA/INRIA.

Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs? Wasim Shaikh Date: 10/29/2015.

Full and Para Virtualization

Jason Jong Kyu Park, Yongjun Park, and Scott Mahlke

Chapter 1 - OS Overview Ivy Tech State College Northwest Region 01 CIS106 Microcomputer Operating Systems Gina Rue CIS Faculty.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.

1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.

Page 1 2P13 Week 1. Page 2 Page 3 Page 4 Page 5.

Processor Level Parallelism 2. How We Got Here Developments in PC CPUs.

Bolt : Faster Reconfiguration in Operating Systems Sankaralingam Panneerselvam Michael M. Swift Nam Sung Kim University of Wisconsin, Madison, WI ATC 2015.

1.3 Operating system services An operating system provide services to programs and to the users of the program. It provides an environment for the execution.

Multi-Core CPUs Matt Kuehn. Roadmap ► Intel vs AMD ► Early multi-core processors ► Threads vs Physical Cores ► Multithreading and Multi-core processing.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

CT101: Computing Systems Introduction to Operating Systems.

Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.

Applied Operating System Concepts

Productive Performance Tools for Heterogeneous Parallel Computing

Network Operating Systems (NOS)

The Multikernel: A New OS Architecture for Scalable Multicore Systems

CS490 Windows Internals Quiz 2 09/27/2013.

Antonio R. Miele Marco D. Santambrogio Politecnico di Milano

Linchuan Chen, Xin Huo and Gagan Agrawal

What is an Operating System?

Chapter 4: Threads.

Operating System Concepts

Operating System Concepts

Presentation transcript:

Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison, WI HotPar 20121

Accelerators  Specialized execution units for common tasks  Data parallel tasks, encryption, video encoding, XML parsing, network processing etc.  Task offloaded to accelerators  High performance, energy efficient (or both) HotPar GPU Wire speed processor APU DySER Crypto accelerator

Why accelerators?  Moore’s law and Dennard’s scaling  Single core to Multi-core  Dark Silicon: More transistors but limited power budget [Esmaeilzadeh et. al - ISCA ‘11]  Accelerators  Trade-off area for specialized logic  Performance up by 250x and energy efficiency by 500x [Hameed et. al – ISCA ’10]  Heterogeneous architectures HotPar 20123

Heterogeneity everywhere  Focus on diverse heterogeneous systems  Processing elements with different properties in the same system  Our work  How can OS support this architecture? 1. Abstract heterogeneity 2. Enable flexible task execution 3. Multiplex shared accelerators among applications HotPar 20124

Outline  Motivation  Classification of accelerators  Challenges  Our model - Rinnegan  Conclusion HotPar 20125

IBM wire-speed processor HotPar Accelerator Complex CPU cores Memory controller

Classification of accelerators  Accelerators classified based on their accessibility  Acceleration devices  Co-Processor  Asymmetric cores HotPar 20127

Classification HotPar PropertiesAcceleration devices Co-ProcessorAsymmetric cores SystemsGPU, APU, Crypto accelerators in Sun Niagara chips etc. C-cores, DySER, IBM wire-speed processor NVIDIA’s Kal-El processor, Scalable cores like WiDGET LocationOff/On-chip deviceOn-chip deviceRegular core that can execute threads Accessibility ControlDevice driversISA instructionsThread migration, RPC DataDMA or zero copySystem memoryCache coherency Resource contentionMultiple applications might want to make use of the device Sharing of resource across multiple cores could result in contention Multiple threads might want to access the special core 1) Application must deal with different classes of accelerators with different access mechanisms 2) Resources to be shared among multiple applications 1) Application must deal with different classes of accelerators with different access mechanisms 2) Resources to be shared among multiple applications

Challenges HotPar  Task invocation  Execution on different processing elements  Virtualization  Sharing across multiple applications  Scheduling  Time multiplexing the accelerators

Task invocation HotPar  Current systems decide statically on where to execute the task  Programmer problems where system can help  Data granularity  Power budget limitation  Presence/availability of accelerator AES Crypto accelerator

Virtualization HotPar  Data addressing for accelerators  Accelerators need to understand virtual addresses  OS must provide virtual-address translations  Process preemption after launching a computation  System must be aware of computation completion 0x0000x1000x2000x5000x400 0x48000x49000x5000 Virtual memory crypto XML crypt 0x240 Accelerator units CPU cores

Scheduling HotPar  Scheduling needed to prioritize access to shared accelerators  User-mode access to accelerators complicates scheduling decisions  OS cannot interpose on every request

Outline  Motivation  Classification of Accelerators  Challenges  Our model - Rinnegan  Conclusion HotPar

Rinnegan  Goals  Application leverage the heterogeneity  OS enforces system wide policies  Rinnegan tries to provide support for different class of accelerators  Flexible task execution by Accelerator stub  Accelerator agent for safe multiplexing  Enforcing system policy through Accelerator monitor HotPar

Accelerator stub  Entry point for accelerator invocation and a dispatcher  Abstracts di ﬀ erent processing elements  Binding - Selecting the best implementation for the task  Static: during compilation  Early: at process start  Late: at call HotPar Process making use of accelerator Stub { Encrypt through AES instructions } { Offload to crypto Accelerator }

Accelerator agent  Manages an accelerator  Roles of an agent  Virtualize accelerator  Forwarding requests to the accelerator  Implement scheduling decisions  Expose accounting information to the OS HotPar Resource allocator Process trying to offload a parallel task Stub User Space Kernel Space Asymmetric core Thread migration Device driver Process can communicate directly with accelerator Stub Accelerator agents Hardware

Accelerator monitor  Monitors information like utilization of resources  Responsible for system- wide energy and performance goals  Acts as an online modeling tool  Helps in deciding where to schedule the task HotPar Accelerator agent Process making use of accelerator Stub Accelerator monitor User Space Kernel Space Accelerator Hardware

Programming model  Based on task level parallelism  Task creation and execution  Write: different implementations  Launch: similar to function invocation  Fetching result: supporting synchronous/asynchronous invocation  Example: Intel TBB/Cilk HotPar application() { task = cryptStub(inputData, outputData);... waitFor(task); } cryptStub(input, output) { if(factor1 == true) output = implmn_AES(input) //or// else if(factor2 == true) output = implmn_CACC(input) }

Summary HotPar Accelerator agent Process making use of accelerator Stub Accelerator monitor Other kernel components Process User Space Kernel Space CPU cores Accelerator Direct communication with accelerator

Related work HotPar  Task invocation based on data granularity  Merge, Harmony  Generating device specific implementation of same task  GPUOcelot  Resource scheduling: Sharing between multiple applications  PTask, Pegasus

Conclusions  Accelerators common in future systems  Rinnegan embraces accelerators through stub mechanism, agent and monitoring components  Other challenges  Data movement between devices  Power measurement HotPar

HotPar Thank You