Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.

Slides:

Advertisements

Similar presentations

Verifiable Resource Accounting for Cloud Computing Services Vyas Sekar, Petros Maniatis ISTC for Secure Computing 1.

Advertisements

1 Hardware Support for Isolation Krste Asanovic U.C. Berkeley MURI “DHOSA” Site Visit April 28, 2011.

Computer Abstractions and Technology

1 A Self-Tuning Cache Architecture for Embedded Systems Chuanjun Zhang*, Frank Vahid**, and Roman Lysecky *Dept. of Electrical Engineering Dept. of Computer.

1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.

Trace Analysis Chunxu Tang. The Mystery Machine: End-to-end performance analysis of large-scale Internet services.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Energy Conservation in Datacenters through Cluster Memory Management and Barely-Alive Memory Servers Vlasia Anagnostopoulou Susmit.

Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,

Institute of Networking and Multimedia, National Taiwan University, Jun-14, 2014.

SEDA: An Architecture for Well- Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University.

CloudScale: Elastic Resource Scaling for Multi-Tenant Cloud Systems Zhiming Shen, Sethuraman Subbiah, Xiaohui Gu, John Wilkes.

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of.

Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.

Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.

Energy Model for Multiprocess Applications Texas Tech University.

New Challenges in Cloud Datacenter Monitoring and Management

FlashFQ: A Fair Queueing I/O Scheduler for Flash-Based SSDs Kai Shen and Stan Park University of Rochester USENIX-ATC

Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.

CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.

University of Karlsruhe, System Architecture Group Balancing Power Consumption in Multiprocessor Systems Andreas Merkel Frank Bellosa System Architecture.

Multi-core Programming Thread Profiler. 2 Tuning Threaded Code: Intel® Thread Profiler for Explicit Threads Topics Look at Intel® Thread Profiler features.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.

Cloud Computing Energy efficient cloud computing Keke Chen.

Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]

Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.

Temperature Aware Load Balancing For Parallel Applications Osman Sarood Parallel Programming Lab (PPL) University of Illinois Urbana Champaign.

Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.

Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology, Fall 2010 Performance.

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 1 Computer Abstractions and Technology Sections 1.5 – 1.11.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.

Smita Vijayakumar Qian Zhu Gagan Agrawal 1.  Background  Data Streams  Virtualization  Dynamic Resource Allocation  Accuracy Adaptation  Research.

OPERETTA: An Optimal Energy Efficient Bandwidth Aggregation System Karim Habak†, Khaled A. Harras‡, and Moustafa Youssef† †Egypt-Japan University of Sc.

1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

An OBSM method for Real Time Embedded Systems Veronica Eyo Sharvari Joshi.

OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.

Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.

Fault Tolerance Benchmarking. 2 Owerview What is Benchmarking? What is Dependability? What is Dependability Benchmarking? What is the relation between.

Rassul Ayani 1 Performance of parallel and distributed systems  What is the purpose of measurement?  To evaluate a system (or an architecture)  To compare.

Lev Finkelstein ISCA/Thermal Workshop 6/ Overview 1.Motivation (Kevin) 2.Thermal issues (Kevin) 3.Power modeling (David) 4.Thermal management (David)

June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.

Threads. Readings r Silberschatz et al : Chapter 4.

Department of Electrical and Computer Engineering University of Wisconsin - Madison Optimizing Total Power of Many-core Processors Considering Voltage.

Equalizer: Dynamically Tuning GPU Resources for Efficient Execution Ankit Sethia* Scott Mahlke University of Michigan.

Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.

Use your money elsewhere! HOW TO REDUCE TECHNOLOGY HARDWARE EXPENSES.

Rakesh Kumar Keith Farkas Norman P Jouppi,Partha Ranganathan,Dean M.Tullsen University of California, San Diego MICRO 2003 Speaker ： Chun-Chung Chen Single-ISA.

Advanced Operating Systems CS6025 Spring 2016 Processes and Threads (Chapter 2)

Measuring Performance II and Logic Design

Overview Motivation (Kevin) Thermal issues (Kevin)

Computer Architecture: Parallel Task Assignment

Green cloud computing 2 Cs 595 Lecture 15.

Resource Aware Scheduler – Initial Results

Effective Data-Race Detection for the Kernel

PA an Coordinated Memory Caching for Parallel Jobs

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Department of Computer Science University of California, Santa Barbara

Energy Efficient Scheduling in IoT Networks

Operating Systems : Overview

Hardware Counter Driven On-the-Fly Request Signatures

CS510 - Portland State University

Request Behavior Variations

Department of Computer Science University of California, Santa Barbara

Presentation transcript:

Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao Zhang, and Zhuan Chen University of Rochester Shriraman is now at Simon Fraser University; Zhang is now at Google. ASPLOS 20131

Problem Overview and Motivation ASPLOS  Power/energy are important for data centers and servers  Dynamic power consumption on modern hardware  Power proportionality: 5% idle power on Intel SandyBridge  Workload dependency: 3:2 power ratio on same machine  A server runs many requests from users; needs per-request power accounting and control  Request is basic unit of user- initiated work  Carbon footprint of a web search?  Power viruses in cloud  Request is basic unit of load distribution

Power Containers on Multicore Servers ASPLOS  Challenges:  Concurrent, fine-grained request activities over multiple stages  Complex multicore power sharing; insufficient direct measurement  Approach: event-based power modeling + request tracking  Contribution on power modeling:  Attribute multicore shared power  Use coarse-grain measurement to enhance power model online  Contribution on request tracking:  Online OS request tracking/control; transparent to applications

Multicore Power Attribution to Concurrent Tasks ASPLOS  Bellosa’s event-based power model  Assumes direct, linear relationship between one’s power and physical activities (instructions, cache references, bus activities, …)  On a multicore, maintenance of shared resources consumes certain power when at least one core is active  Attribution of such shared power is not affected by one’s own physical activities, but rather depends on the level of multicore sharing  Our approach: evenly attribute to concurrently running requests  Implemented efficiently by local estimation of multicore sharing level (to avoid expensive global coordination and cross-core interrupts)

Measurement-Assisted Online Model Correction ASPLOS  Event-based power model is questioned on today’s system  [McCulough et al. 2011] showed poor results  We find large model errors due to different characteristics between production and calibration workloads (worse for unusual workloads)  Our approach  Coarse-grain power measurement can help re-calibrate power model  Measurement has delays, needs alignment with modeled power

Request Power Accounting ASPLOS  Request context tracking in a multi-stage server  Past techniques are not in- band [Magpie/X-Ray], or at app/library level [Google Dapper]  Our approach  [X-Trace] net flow tracking  Track request propagation events (sockets, forks, sync, …) at the OS  Attribute and control power at proper request context

Online Overhead ASPLOS  Overhead depends on frequency of periodic activities; low overhead (<1%) for sufficiently frequent activities  Periodic maintenance of an active power container  Reading hardware counters, power modeling, updating request stats  2948 cycles, 1656 instructions, 16 floating point ops, 3 LLC refs  1µsec, 10 µJs  Measurement-assisted online model recalibration  Computation of least-square linear regression (16 µsecs for ~30 samples)

Request Power/Energy Accounting Results ASPLOS  Distribution of average request power  Distribution of request energy usage

Validation of Request Power Accounting ASPLOS  Challenge of validation:  Hard to know the ground truth – no way to directly measure request power or energy use on a multicore server  Validation approach #1:  Aggregate accounted request energy over a time duration to estimate total system energy usage; then match with measurement  No more than 9% error across six workloads on three machines  Validation approach #2:  Use accounted request energy to estimate system energy usage at hypothetical workload conditions (different mixes/rates of requests)  No more than 11% error for workloads with stable-behavior requests

Per-Request Power Control ASPLOS  Cap max power use of a server machine  Set equal per-request power budget for fairness  Power containers track request power usage, throttle high-power requests using core duty-cycle modulation  A case study of Google App Engine with power viruses  Power containers: 33% slowdown for power viruses; 2% slowdown for normal requests  Alternatively slow down all requests indiscriminately (13%) to lower the max power

Heterogeneity-Aware Request Distribution ASPLOS  Different workloads/requests exhibit different levels of machine preferences in energy efficiency  Energy usage on Intel SandyBridge over energy usage on Woodcrest  Power containers can capture fine-grained request energy profiles and distribute load in a server cluster for high efficiency  Case study shows 25% energy saving compared to alternatives

Conclusions ASPLOS  High-level messages:  Fine-grained power accounting and control are important for multicore server efficiency and dependability  Increasingly so on more power-proportional hardware and dynamic-behavior, user-controlled workloads (web 2.0, cloud computing, …)  Possible to support these at the OS (low overhead on today’s processor hardware, requiring no application assistance)  Contributions:  Enhanced modeling to account for multicore shared power  Use online measurement to mitigate power model inaccuracy  OS-level online request power tracking and control  Utility in power virus control and adaptive request distribution