Some Opportunities and Obstacles in Cross-Layer and Cross-Component (Power) Management Onur Mutlu NSF CPOM Workshop, 2/10/2012.

Slides:



Advertisements
Similar presentations
Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Application-Aware Memory Channel Partitioning † Sai Prashanth Muralidhara § Lavanya Subramanian † † Onur Mutlu † Mahmut Kandemir § ‡ Thomas Moscibroda.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
Microarchitecture is Dead AND We must have a multicore interface that does not require programmers to understand what is going on underneath.
Computer Architecture Lecture 1: Introduction and Basics Prof. Onur Mutlu Carnegie Mellon University Spring 2013, 1/14/2013.
Understanding a Problem in Multicore and How to Solve It
Introduction CSCI 444/544 Operating Systems Fall 2008.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
1 Multi-Core Systems CORE 0CORE 1CORE 2CORE 3 L2 CACHE L2 CACHE L2 CACHE L2 CACHE DRAM MEMORY CONTROLLER DRAM Bank 0 DRAM Bank 1 DRAM Bank 2 DRAM Bank.
1 Lecture 13: DRAM Innovations Today: energy efficiency, row buffer management, scheduling.
OCIN Workshop Wrapup Bill Dally. Thanks To Funding –NSF - Timothy Pinkston, Federica Darema, Mike Foster –UC Discovery Program Organization –Jane Klickman,
SWE Introduction to Software Engineering
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.
Parallel Application Memory Scheduling Eiman Ebrahimi * Rustam Miftakhutdinov *, Chris Fallin ‡ Chang Joo Lee * +, Jose Joao * Onur Mutlu ‡, Yale N. Patt.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
Memory Systems in the Many-Core Era: Some Challenges and Solution Directions Onur Mutlu June 5, 2011 ISMM/MSPC.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 11 Slide 1 Architectural Design.
Scalable Many-Core Memory Systems Lecture 4, Topic 3: Memory Interference and QoS-Aware Memory Systems Prof. Onur Mutlu
1 Instant replay  The semester was split into roughly four parts. —The 1st quarter covered instruction set architectures—the connection between software.
 Introduction Introduction  Definition of Operating System Definition of Operating System  Abstract View of OperatingSystem Abstract View of OperatingSystem.
Operating Systems Should Manage Accelerators Sankaralingam Panneerselvam Michael M. Swift Computer Sciences Department University of Wisconsin, Madison,
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
LOGO OPERATING SYSTEM Dalia AL-Dabbagh
Operating System Review September 10, 2012Introduction to Computer Security ©2004 Matt Bishop Slide #1-1.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Application Aware Prioritization Mechanisms for On-Chip Networks Reetuparna Das Onur Mutlu † Thomas Moscibroda ‡ Chita Das § Reetuparna Das § Onur Mutlu.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Stall-Time Fair Memory Access Scheduling Onur Mutlu and Thomas Moscibroda Computer Architecture Group Microsoft Research.
Introduction 1-1 Introduction to Virtual Machines From “Virtual Machines” Smith and Nair Chapter 1.
1 Computer Architecture Research Overview Rajeev Balasubramonian School of Computing, University of Utah
Thread Cluster Memory Scheduling : Exploiting Differences in Memory Access Behavior Yoongu Kim Michael Papamichael Onur Mutlu Mor Harchol-Balter.
My major mantra: We Must Break the Layers. Algorithm Program ISA (Instruction Set Arch)‏ Microarchitecture Circuits Problem Electrons.
Programming Models & Runtime Systems Breakout Report MICS PI Meeting, June 27, 2002.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Parallelism: A Serious Goal or a Silly Mantra (some half-thought-out ideas)
Internetworking Concept and Architectural Model
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
OPERATING SYSTEM SUPPORT DISTRIBUTED SYSTEMS CHAPTER 6 Lawrence Heyman July 8, 2002.
Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,
VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
System-level power analysis and estimation September 20, 2006 Chong-Min Kyung.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
Data Center & Large-Scale Systems (updated) Luis Ceze, Bill Feiereisen, Krishna Kant, Richard Murphy, Onur Mutlu, Anand Sivasubramanian, Christos Kozyrakis.
Staged Memory Scheduling Rachata Ausavarungnirun, Kevin Chang, Lavanya Subramanian, Gabriel H. Loh*, Onur Mutlu Carnegie Mellon University, *AMD Research.
Lecture 4 Page 1 CS 111 Online Modularity and Virtualization CS 111 On-Line MS Program Operating Systems Peter Reiher.
Parallelism-Aware Batch Scheduling Enhancing both Performance and Fairness of Shared DRAM Systems Onur Mutlu and Thomas Moscibroda Computer Architecture.
Embedded System Lab. 오명훈 Addressing Shared Resource Contention in Multicore Processors via Scheduling.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO CS 219 Computer Organization.
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Computer Architecture: Multi-Core Processors: Why? Prof. Onur Mutlu Carnegie Mellon University.
Providing High and Predictable Performance in Multicore Systems Through Shared Resource Management Lavanya Subramanian 1.
1 Lecture 4: Memory Scheduling, Refresh Topics: scheduling policies, refresh basics.
Fall 2012 Parallel Computer Architecture Lecture 4: Multi-Core Processors Prof. Onur Mutlu Carnegie Mellon University 9/14/2012.
CIS-550 Computer Architecture Lecture 1: Introduction and Basics
Computer Architecture: Parallel Task Assignment
Microarchitecture.
Computer Architecture Lecture 1: Introduction and Basics
Task Scheduling for Multicore CPUs and NUMA Systems
15-740/ Computer Architecture Lecture 7: Pipelining
Many-core Software Development Platforms
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Eiman Ebrahimi, Kevin Hsieh, Phillip B. Gibbons, Onur Mutlu
Database System Architectures
Presented by Florian Ettinger
Presentation transcript:

Some Opportunities and Obstacles in Cross-Layer and Cross-Component (Power) Management Onur Mutlu NSF CPOM Workshop, 2/10/2012

What Are These Layers and Components? Well-defined (mostly) interfaces between transformation layers and components/resources create abstractions 2 Microarchitecture ISA Program/Language Algorithm Problem Runtime System (VM, OS, MM) User Logic Circuits Electrons Computation Communication Storage/Memory

NSF CPOM Workshop, 2/10/2012 Advantages of abstraction layers  Improved productivity at each layer/component No need to worry about decisions made in higher/lower levels  Designer sanity Disadvantages  Overall suboptimal design Uncoordinated decisions contradict each other, degrade efficiency Layers and Components: Pros and Cons 3 Microarchitecture Runtime System (VM, OS, MM)

NSF CPOM Workshop, 2/10/2012 An Example: Lack of OS-Architecture Coordination 4 Low OS-level priority High OS-level priority (Core 0) (Core 1) Moscibroda and Mutlu, “Memory performance attacks: Denial of memory service in multi-core systems,” USENIX Security 2007.

NSF CPOM Workshop, 2/10/2012 Advantages of abstraction layers  Improved productivity at each layer/component No need to worry about decisions made in higher/lower levels  Designer sanity Disadvantages  Overall suboptimal design Uncoordinated decisions contradict each other, degrade efficiency  Waste due to overprovisioning at each level  System-level metrics hard to optimize Application-level QoS, user experience, TCO, total energy, … Minimize energy while satisfying QoS Research question: How do we eliminate disadvantages without sacrificing the advantages of layers? Layers and Components: Pros and Cons 5 Computation Communication Storage/Memory Microarchitecture Runtime System (VM, OS, MM)

NSF CPOM Workshop, 2/10/2012 Local optimizations relatively easy to think of; local metrics easy to optimize Global optimization much less so We may be lacking unifying principles that span across layers to  Coordinate information flow and policies across layers  Guide/ease the co-design and management of resources within and across layers One General Problem 6

NSF CPOM Workshop, 2/10/2012 Find and employ cross-cutting (unifying) design and resource management principles  Ensure these principles are conveyed in the interfaces Design and enhance interfaces (driven by principles)  E.g., the application-OS-architecture interface for resource management Develop design tools to enable global optimization  Cross-layer evaluation infrastructures Change mentality  Be open-minded to proposals that co-design multiple layers  Performance- vs energy- vs user-centric design  Provide commensurate focus on important subsystems: memory and interconnect  develop new execution models How Do We Fix The Problem? 7

NSF CPOM Workshop, 2/10/2012 Interconnect Energy Scaling 8 Chart courtesy of Shekhar Borkar, Intel

NSF CPOM Workshop, 2/10/2012 Dynamically identify critical/bottleneck tasks (pieces of work) and spend most of the energy on them  Not all instructions are equally important [Fields+ ISCA’01,02]  Not all network packets are equally important [Das+ ISCA’10]  Not all useful cache blocks are equally important [Qureshi+ ISCA’06]  Not all threads are equally important [Bhattacherjee+ ISCA’09]  Not all queries are equally important  Not all applications are equally important Create a criticality interface between layers and components  Application/compiler  OS/runtime  hardware  Enable ways of discovering and expressing task criticality and managing resources based on criticality/slack One Unifying Principle: Criticality/Slack 9

NSF CPOM Workshop, 2/10/2012 Identify the criticality/slack present in each task Convey criticality information across layers/components Co-design the layers and resources such that they can prioritize tasks based on criticality/slack Manage execution to minimize slack and spend little energy on non-critical tasks Exploiting Criticality/Slack 10 Identify bottleneck threads that cause serialization and dynamically ship them to high-power high-performance cores [Suleman+ ASPLOS’09, Joao+ ASPLOS’12]

NSF CPOM Workshop, 2/10/2012 Identify critical instructions and execute them in fast pipelines [Fields+ ISCA’02] Identify critical threads and slow down others [Bhattacherjee+ ISCA’09] Identify critical/limiter threads in a parallel application and prioritize them in the memory controller [Ebrahimi+ MICRO’11] Identify latency-sensitive applications/threads and prioritize them in the memory controller [Kim+ MICRO’10] or entire shared memory system [Ebrahimi+ ASPLOS’10] Identify critical interconnect requests from core’s point of view and prioritize them in routers [Das+ ISCA’10] Exploiting Dynamic Criticality: Other Examples 11

Thread Cluster Memory Scheduling [Kim+ MICRO’10] 1.Group threads into two clusters 2.Prioritize non-intensive cluster 3.Different policies for each cluster 12 thread Threads in the system thread Non-intensive cluster Intensive cluster thread Memory-non-intensive Memory-intensive Prioritized higher priority higher priority Throughput Fairness TCM, a heterogeneous scheduling policy, provides the best fairness and system throughput

NSF CPOM Workshop, 2/10/2012 Consolidate many tasks to maximize efficiency and provide the QoS each task needs  QoS interface across layers Application  OS/runtime  architecture  uarch. Coordinate the management of shared resources across components and layers  Goal: no efficiency loss due to contradictions in policies  Policy co-design (e.g., between cores, interconnect, memory controllers, memory): criticality can again help here Exploit heterogeneity at all layers  Both architectural/microarchitectural and technology heterogeneity can provide large efficiency gains Other Unifying Principles and Research Issues 13

NSF CPOM Workshop, 2/10/2012 We need to break the layers in education first, to be truly successful  Cross-layer undergraduate courses Fund both within-layer and cross-layer approaches Fund widely many areas – there is likely no silver bullet Role of Education and Funding 14

Some Opportunities and Obstacles in Cross-Layer and Cross-Component (Power) Management Onur Mutlu NSF CPOM Workshop, 2/10/2012