Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,

Slides:



Advertisements
Similar presentations
VARUN GUPTA Carnegie Mellon University 1 Partly based on joint work with: Anshul Gandhi Mor Harchol-Balter Mike Kozuch (CMU) (CMU) (Intel Research)
Advertisements

C LOUD C OM 2012 Self-Adaptive Management of The Sleep Depths of Idle Nodes in Large Scale Systems to Balance Between Energy Consumption and Response Times.
ElasticTree: Saving Energy in Data Center Networks Brandon Heller, Srini Seetharaman, Priya Mahadevan, Yiannis Yiakoumis, Puneed Sharma, Sujata Banerjee,
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)
CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building
Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.
Data Center Demand Response: Coordinating IT and the Smart Grid Zhenhua Liu California Institute of Technology December 18, 2013 Acknowledgements:
Adnan. IEEE CLOUD Energy Efficient Geographical Load Balancing via Dynamic Deferral of Workload Muhammad Abdullah Adnan Ryo Sugihara (Amazon.com)
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Provisioning for Multi-tier Internet Applications Bhuvan Urgaonkar, Prashant.
Providing Performance Guarantees for Cloud Applications Anshul Gandhi IBM T. J. Watson Research Center Stony Brook University 1 Parijat Dube, Alexei Karve,
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Energy Efficient Dynamic Provisioning in Data Centers: The Benefit of Seeing the Future TexPoint fonts used in EMF. Read the TexPoint manual before you.
Queueing Theory: Recap
RELATED BACKGROUND WORK OZLEM BILGIR. OUTLINE 1- Gandhi et al., Optimal Power Allocation in Server Farms, SIGMETRICS’09 2-Chen et al., Managing Server.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Quantifying the Benefits of Resource Multiplexing in On-Demand Data Centers Abhishek.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Allocation for Shared Data Centers Using Online Measurements.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms
Differentiated Multimedia Web Services Using Quality Aware Transcoding S. Chandra, C.Schlatter Ellis and A.Vahdat InfoCom 2000, IEEE Journal on Selected.
Fundamental Characteristics of Queues with Fluctuating Load VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Alan Scheller-Wolf Carnegie.
Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.
Fundamental Characteristics of Queues with Fluctuating Load (appeared in SIGMETRICS 2006) VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ.
Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.
Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,
Computer Systems Design
RAQFM – a Resource Allocation Queueing Fairness Measure David Raz School of Computer Science, Tel Aviv University Jointly with Hanoch Levy, Tel Aviv University.
Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.
Georges da Costa 2, Marcos Dias de Assunção 1, Jean-Patrick Gelas 1, Yiannis Georgiou 3, Laurent Lefèvre 1, Anne-Cécile Orgerie 1, Jean-Marc Pierson 2,
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design.
OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
CSE 691: Energy-Efficient Computing Lecture 6 SHARING: distributed vs. local Anshul Gandhi 1307, CS building
Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California.
Adaptive Virtual Machine Provisioning in Elastic Multi-tier Cloud Platforms Fan Zhang, Junwei Cao, Hong Cai James J. Mulcahy, Cheng Wu Tsinghua University,
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Energy Aware Consolidation for Cloud Computing Srikanaiah, Kansal, Zhao Usenix HotPower 2008.
IT Infrastructure for Providing Energy-as-a-Service to Electric Vehicles Smruti R. Sarangi, Partha Dutta, and Komal Jalan IEEE TRANSACTIONS ON SMART GRID,
4th Annual INFORMS Revenue Management and Pricing Section Conference, June 2004 An Asymptotically-Optimal Dynamic Admission Policy for a Revenue Management.
Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Feifei Chen Swinburne University of Technology Melbourne, Australia
Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
CSE 591: Energy-Efficient Computing Lecture 3 SPEED: processor Anshul Gandhi 347, CS building
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.
A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.
Anshul Gandhi 347, CS building
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
André Bauer, Simon Spinner, Nikolas Herbst, Samuel Kounev
Georgios Varsamopoulos, Zahra Abbasi, and Sandeep Gupta
Scaling the Memory Power Wall with DRAM-Aware Data Management
Adaptive Cloud Computing Based Services for Mobile Users
Understanding and Exploiting Amazon EC2 Spot Instances
Optimal Power Allocation in Server Farms
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
ElasticTree: Saving Energy in Data Center Networks
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Energy Efficient Scheduling in IoT Networks
Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan
Presentation transcript:

Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi, Alan Scheller-Wolf, Mike Kozuch

Power is Expensive 2 Annual U.S. data center energy consumption Sadly, most of this energy is wasted [energystar.gov], [McKinsey & Co.], [Gartner] 100 Billion kWh or 7.4 Billion dollars Electricity consumed by 9 million homes As much CO 2 as all of Argentina

3 Servers only busy 5-30% time on average, but they’re left ON, wasting power. [Gartner Report] [NYTimes]  BUSY server: 200 Watts  IDLE server: 140 Watts  OFF server: 0 Watts Intel Xeon E quad-core 2.27 GHz 16 GB memory Setup time 260s 200W Don’t know future load Most Power is Wasted ? ALWAYS ON: Provision for Peak Time Load 3

4 Talk Thesis Response Time, T Power, P Key Setup Cost ON/OFF - High response time + Might save power ? x x x x x x x x x x x x x x x 4 ? ALWAYS ON + Low response time - Wastes power

5 Outline  Part I: Theory – M/M/k Understanding the challenges: -- What is the effect of setup time?  Part II: Systems Implementation Dynamic power management in practice 5

M/M/1/Setup 6 # busy servers # jobs in system 0,00,0 0,10,10,20,20,30,3 1,11,11,21,21,31,3       Server turns off when idle. Setup ~ Exp(  ) jobs/sec  jobs/sec 6 [Welch ’64]

M/M/k/Setup (k=4) 7 Setup ~ Exp(  )    

M/M/k/Setup (k=4) 8 Setup ~ Exp(  )     Open for 50 years

M/M/k/Setup (k=4) 9 Setup ~ Exp(  )     Solvable only Numerically Matrix-Analytic (MA)

M/M/k/Setup (k=4) Setup ~ Exp(  )     Wide Open

New Technique: RRR [Sigmetrics 13] 11 Recursive Renewal Reward (RRR)  Exact. No iteration. No infinite sums.  Yields transforms of response time & power. Closed-form for all chains that are skip-free in horizontal direction and DAG in vertical direction. Infinite repeating portion Finite portion

New Technique: RRR [Sigmetrics 13] 12 Infinite repeating portion Finite portion N = # jobs

Results of Analysis 13    OPT (ON/OFF with 0 setup) ON/OFF E[T] (s)  k  E[P] (kW)  k  ON/OFF OPT

14 Outline  Part I: Theory – M/M/k What is the effect of setup time? -- Setup hurts a lot when k: small -- But setup much less painful when k: large -- ON/OFF allows us to achieve near optimal power  Part II: Systems Implementation Dynamic power management in practice Time Demand ? Arrivals: NOT Poisson Very unpredictable! -- Servers are time-sharing -- Job sizes highly variable -- Metric: T 95 < 500 ms -- Setup time = 260 s

15 Load Balancer 28 Application servers 500 GB DB 7 Memcached Unknown  key-value workload mix of CPU & I/O  1 Request = 120ms, 3000 KV pairs  Provision capacity to meet SLA  SLA: T 95 < 500 ms  Setup time: 260 s Power Aware Load Balancer Our Data Center

16 Provisioning arrival rate T 95 At Single Server 60 req/s 450 ms ON/OFF ? x x x x x x x x x x x x x x x r(t) t AlwaysOn ? r(t) t arrival rate T 95 At Single Server 60 req/s 450 ms

AlwaysOn Num. servers  Time (min)  ______ load o k busy+idle x k busy+idle+setup Time (min)  Num. servers  ______ load o k busy+idle x k busy+idle+setup ON/OFF T 95 =291ms, P avg =2,323W I’m late, I’m late ! T 95 =11,003ms, P avg =1,281W 17

18 Reactive Control-Theoretic [Leite, Kusic, Mosse ‘10] [Nathuji, Kansal, Ghaffarkhah ‘10] [Fan, Weber, Barroso ‘07] [Wang, Chen ‘08] [Wood, Shenoy, … ‘07] [Horvath, Skadron ’08] [Urgaonkar, Chandra ‘05] [Bennani, Menasce ‘05] [Gmach et al. ‘08] Predictive [Krioukov, …, Culler, Katz ‘10] [Castellanos et al. ‘05] [Chen, He, …, Zhao ’08] [Chen, Das, …, Gautam ’05] [Bobroff, Kuchut, Beaty ‘07] -- All suffer from setup lag. -- All too quick to shut servers off. -- All provision based on arrival rate. ON/OFF Variants

19 x = 100 Time (min)  Num. servers  ______ load o n busy+idle x n busy+idle+setup ON/OFF T 95 =11,003ms, P avg =1,281W Num. servers  Time (min)  T 95 =487ms, P avg =2,218W ON/OFF+padding

20 Existing ON/OFF policies are too quick to turn servers off … then suffer huge setup lag. A Better Idea: AutoScale Wait some time (t wait ) before turning idle server off “Un-balance” load: Pack jobs onto as few servers as possible w/o violating SLAs Two new ideas [Transactions on Computer Systems 2012]

21 Scaling Up via AutoScale When scaling up capacity, # jobs/server is a more robust indicator than current request rate. [Transactions on Computer Systems 2012] 21 But not so obvious how to use # jobs/server … 10 jobs/server  60 req/s 30 jobs/server  120 req/s 10 jobs/server  load  30 jobs/server  load   

22 Why AutoScale works Theorem (loosely): As k → ∞, AutoScale’s capacity approaches square-root staffing. [ Performance Evaluation, 2010 (b)] Theorem : As k → ∞, M/M/k with DelayedOff + Packing

I’m late, I’m late ! Time (min)  Num. servers  ______ load o k busy+idle x k busy+idle+setup ON/OFF T 95 =11,003ms, P avg =1,281W AutoScale T 95 =491ms, P avg =1,297W ______ load o n busy+idle x n busy+idle+setup Num. servers  Time (min)  Within 30% of OPT power on all our traces! 23 Facebook cluster-testing AS

AlwaysOnON/OFFAutoScale T 95 P avg T 95 P avg T 95 P avg 291 ms 2323 W 11,003 ms 1281 W 491 ms 1297 W 271 ms 2205 W 3,802 ms 759 W 466 ms 1016 W 289 ms 2363 W 4,227 ms 1,391 W 470 ms 1679 W 377 ms 2263 W > 1 min 849 W 556 ms 1412 W Results [Transactions on Computer Systems 2012] 24

25 Conclusion Dynamic power management  Managing the setup cost Part I: Effect of setup in M/M/k  First analysis of M/M/k/setup and M/M/∞/setup  Introduced RRR technique for analyzing repeating Markov chains  Effect of setup cost is very high for small k, but diminishes as k increases Part II: Managing the setup cost in data centers  Non-Poisson arrival process; load unknown; unpredictable spikes  Leaving servers AlwaysOn wastes power, but setup can be deadly.  Lesson: Don’t want to rush to turn servers off.  Proposed AutoScale with Delayedoff, Packing routing & Non-linear Scaling.  Demonstrated effectiveness of AutoScale in practice.  Proved AutoScale has very good asymptotic behavior.

26 Thank You! References Anshul Gandhi, Sherwin Doroudi, Mor Harchol-Balter, Alan Scheller-Wolf. "Exact Analysis of the M/M/k/setup Class of Markov Chains via Recursive Renewal Reward." Proceedings of ACM SIGMETRICS 2013 Conference, June Anshul Gandhi, Mor Harchol-Balter, Ram Raghunathan, and Mike Kozuch. "AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers." ACM Transactions on Computer Systems, vol. 30, No. 4, Article 14, 2012, pp Anshul Gandhi, Mor Harchol-Balter, and Ivo Adan. "Server farms with setup costs." Performance Evaluation, vol. 67, no. 11, 2010, pp Anshul Gandhi, Varun Gupta, Mor Harchol-Balter, and Michael Kozuch. "Optimality Analysis of Energy-Performance Trade-off for Server Farm Management." Performance Evaluation vol. 67, no. 11, 2010, pp