Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,

Slides:



Advertisements
Similar presentations
VARUN GUPTA Carnegie Mellon University 1 Partly based on joint work with: Anshul Gandhi Mor Harchol-Balter Mike Kozuch (CMU) (CMU) (Intel Research)
Advertisements

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
C LOUD C OM 2012 Self-Adaptive Management of The Sleep Depths of Idle Nodes in Large Scale Systems to Balance Between Energy Consumption and Response Times.
ElasticTree: Saving Energy in Data Center Networks Brandon Heller, Srini Seetharaman, Priya Mahadevan, Yiannis Yiakoumis, Puneed Sharma, Sujata Banerjee,
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
Anshul Gandhi (Carnegie Mellon University) Varun Gupta (CMU), Mor Harchol-Balter (CMU) Michael Kozuch (Intel, Pittsburgh)
CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building
Proactive Prediction Models for Web Application Resource Provisioning in the Cloud _______________________________ Samuel A. Ajila & Bankole A. Akindele.
Adnan. IEEE CLOUD Energy Efficient Geographical Load Balancing via Dynamic Deferral of Workload Muhammad Abdullah Adnan Ryo Sugihara (Amazon.com)
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Provisioning for Multi-tier Internet Applications Bhuvan Urgaonkar, Prashant.
Providing Performance Guarantees for Cloud Applications Anshul Gandhi IBM T. J. Watson Research Center Stony Brook University 1 Parijat Dube, Alexei Karve,
Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting Roy, N., A. Dubey, and A. Gokhale 4th IEEE International Conference.
Energy Efficient Dynamic Provisioning in Data Centers: The Benefit of Seeing the Future TexPoint fonts used in EMF. Read the TexPoint manual before you.
NETE4631:Capacity Planning (3)- Private Cloud Lecture 11 Suronapee Phoomvuthisarn, Ph.D. /
Queueing Theory: Recap
RELATED BACKGROUND WORK OZLEM BILGIR. OUTLINE 1- Gandhi et al., Optimal Power Allocation in Server Farms, SIGMETRICS’09 2-Chen et al., Managing Server.
Utility-Function-Driven Energy- Efficient Cooling in Data Centers Authors: Rajarshi Das, Jeffrey Kephart, Jonathan Lenchner, Hendrik Hamamn IBM Thomas.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Quantifying the Benefits of Resource Multiplexing in On-Demand Data Centers Abhishek.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science Dynamic Resource Allocation for Shared Data Centers Using Online Measurements.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
Join-the-Shortest-Queue (JSQ) Routing in Web Server Farms
Differentiated Multimedia Web Services Using Quality Aware Transcoding S. Chandra, C.Schlatter Ellis and A.Vahdat InfoCom 2000, IEEE Journal on Selected.
Fundamental Characteristics of Queues with Fluctuating Load VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ. Alan Scheller-Wolf Carnegie.
Akhil Langer, Harshit Dokania, Laxmikant Kale, Udatta Palekar* Parallel Programming Laboratory Department of Computer Science University of Illinois at.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Automated Workload Management in.
Fundamental Characteristics of Queues with Fluctuating Load (appeared in SIGMETRICS 2006) VARUN GUPTA Joint with: Mor Harchol-Balter Carnegie Mellon Univ.
Thesis Proposal VARUN GUPTA 1. Thesis Proposal VARUN GUPTA 2.
Computer Systems Design
Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi,
Energy, Energy, Energy  Worldwide efforts to reduce energy consumption  People can conserve. Large percentage savings possible, but each individual has.
RAQFM – a Resource Allocation Queueing Fairness Measure David Raz School of Computer Science, Tel Aviv University Jointly with Hanoch Levy, Tel Aviv University.
Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.
Continuous resource monitoring for self-predicting DBMS Dushyanth Narayanan 1 Eno Thereska 2 Anastassia Ailamaki 2 1 Microsoft Research-Cambridge, 2 Carnegie.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
1 Mor Harchol-Balter Computer Science Dept, CMU What Analytical Performance Modeling Teaches Us About Computer Systems Design.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
CSE 691: Energy-Efficient Computing Lecture 6 SHARING: distributed vs. local Anshul Gandhi 1307, CS building
Power and Performance Modeling in a Virtualized Server System M. Pedram and I. Hwang Department of Electrical Engineering Univ. of Southern California.
Adaptive Virtual Machine Provisioning in Elastic Multi-tier Cloud Platforms Fan Zhang, Junwei Cao, Hong Cai James J. Mulcahy, Cheng Wu Tsinghua University,
1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.
Energy Aware Consolidation for Cloud Computing Srikanaiah, Kansal, Zhao Usenix HotPower 2008.
IT Infrastructure for Providing Energy-as-a-Service to Electric Vehicles Smruti R. Sarangi, Partha Dutta, and Komal Jalan IEEE TRANSACTIONS ON SMART GRID,
A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,
4th Annual INFORMS Revenue Management and Pricing Section Conference, June 2004 An Asymptotically-Optimal Dynamic Admission Policy for a Revenue Management.
Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Managing Server Energy and Operational Costs Chen, Das, Qin, Sivasubramaniam, Wang, Gautam (Penn State) Sigmetrics 2005.
Design and Evaluation of a Model for Multi-tiered Internet Applications Bhuvan Urgaonkar Internship project talk – Services Management Middleware Dept,
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Copyright © 2010, Performance and Power Management for Cloud Infrastructures Hien Nguyen Van; Tran, F.D.; Menaud, J.-M. Cloud Computing (CLOUD),
Accounting for Load Variation in Energy-Efficient Data Centers
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
CSE 591: Energy-Efficient Computing Lecture 3 SPEED: processor Anshul Gandhi 347, CS building
Power Capping Via Forced Idleness ANSHUL GANDHI Carnegie Mellon Univ. 1.
A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.
Multi-mode Energy Management for Multi-tier Server Clusters Tibor Horvath Kevin Skadron University of Virginia PACT 2008.
DENS: Data Center Energy-Efficient Network-Aware Scheduling
Anshul Gandhi 347, CS building
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Georgios Varsamopoulos, Zahra Abbasi, and Sandeep Gupta
Scaling the Memory Power Wall with DRAM-Aware Data Management
Adaptive Cloud Computing Based Services for Mobile Users
Understanding and Exploiting Amazon EC2 Spot Instances
CSE 591: Energy-Efficient Computing Lecture 12 SLEEP: memory
CSE 591: Energy-Efficient Computing Lecture 9 SLEEP: processor
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
ElasticTree: Saving Energy in Data Center Networks
Energy Efficient Scheduling in IoT Networks
Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan
Presentation transcript:

Power Management in Data Centers: Theory & Practice Mor Harchol-Balter Computer Science Dept Carnegie Mellon University 1 Anshul Gandhi, Sherwin Doroudi, Alan Scheller-Wolf, Mike Kozuch

Power is Expensive 2 Annual U.S. data center energy consumption Sadly, most of this energy is wasted [energystar.gov], [McKinsey & Co.], [Gartner] 100 Billion kWh or 7.4 Billion dollars Electricity consumed by 9 million homes As much CO 2 as all of Argentina

3 Servers only busy 5-30% time on average, but they’re left ON, wasting power. [Gartner Report] [NYTimes]  BUSY server: 200 Watts  IDLE server: 140 Watts  OFF server: 0 Watts Intel Xeon E quad-core 2.27 GHz 16 GB memory Setup time 260s 200W Don’t know future load Most Power is Wasted ? ALWAYS ON: Provision for Peak Time Load 3

4 Talk Thesis Response Time, T Power, P Key Setup Cost ON/OFF - High response time + Might save power ? x x x x x x x x x x x x x x x 4 ? ALWAYS ON + Low response time - Wastes power

5 Outline  Part I: Theory – M/M/k -- What is the effect of setup time?  Part II: Systems Implementation Dynamic power management in practice 5

M/M/1/Setup 6 # busy servers # jobs in system 0,00,0 0,10,10,20,20,30,3 1,11,11,21,21,31,3       Server turns off when idle. Setup ~ Exp(  ) jobs/sec  jobs/sec 6 [Welch ’64]

M/M/k/Setup (k=4) 7 Setup ~ Exp(  )    

M/M/k/Setup (k=4) 8 Setup ~ Exp(  )     Open for 50 years

M/M/k/Setup (k=4) 9 Setup ~ Exp(  )     Solvable only Numerically Matrix-Analytic (MA)

M/M/k/Setup (k=4) Setup ~ Exp(  )     Not even approximated

New Technique: RRR [Sigmetrics 13] 11 Recursive Renewal Reward (RRR)  Exact. No iteration. No infinite sums.  Yields transforms of response time & power. Closed-form for all chains that are skip-free in horizontal direction and DAG in vertical direction. Infinite repeating portion Finite portion

Results of Analysis 12    OPT (ON/OFF with 0 setup) ON/OFF E[T] (s)  k k  E[P] (kW)  k  ON/OFF OPT

13 Outline  Part I: Theory – M/M/k What is the effect of setup time? -- Setup hurts a lot when k: small -- But setup much less painful when k: large -- ON/OFF allows us to achieve near optimal power  Part II: Systems Implementation Dynamic power management in practice Time Demand ? Arrivals: NOT Poisson Very unpredictable! -- Servers are time-sharing -- Job sizes highly variable -- Metric: T 95 < 500 ms -- Setup time = 260 s

14 Load Balancer 28 Application servers 500 GB DB 7 Memcached Unknown  key-value workload mix of CPU & I/O  1 job = 1 to 3000 KV pairs 120ms total on avg  SLA: T 95 < 500 ms  Setup time: 260 s Power Aware Load Balancer Our Data Center

15 Provisioning arrival rate T 95 At Single Server 60 req/s 450 ms ON/OFF ? x x x x x x x x x x x x x x x r(t) t AlwaysOn ? r(t) t arrival rate T 95 At Single Server 60 jobs/s 450 ms

AlwaysOn Num. servers  Time (min)  _ load o k busy+idle x k busy+idle+setup Time (min)  Num. servers  _ load o k busy+idle x k busy+idle+setup ON/OFF T 95 =291ms, P avg =2,323W I’m late, I’m late ! T 95 =11,003ms, P avg =1,281W 16

17 Reactive Control-Theoretic [Leite, Kusic, Mosse ‘10] [Nathuji, Kansal, Ghaffarkhah ‘10] [Fan, Weber, Barroso ‘07] [Wang, Chen ‘08] [Wood, Shenoy, … ‘07] [Horvath, Skadron ’08] [Urgaonkar, Chandra ‘05] [Bennani, Menasce ‘05] [Gmach et al. ‘08] Predictive [Krioukov, …, Culler, Katz ‘10] [Castellanos et al. ‘05] [Chen, He, …, Zhao ’08] [Chen, Das, …, Gautam ’05] [Bobroff, Kuchut, Beaty ‘07] -- All suffer from setup lag. -- All too quick to shut servers off. -- All provision based on arrival rate. ON/OFF Variants

18 x = 100 Time (min)  Num. servers  _ load o k busy+idle x k busy+idle+setup ON/OFF T 95 =11,003ms, P avg =1,281W Num. servers  Time (min)  T 95 =487ms, P avg =2,218W ON/OFF+padding

19 Existing ON/OFF policies are too quick to turn servers off … then suffer huge setup lag. A Better Idea: AutoScale Wait some time (t wait ) before turning idle server off “Un-balance” load: Pack jobs onto as few servers as possible w/o violating SLAs Two new ideas [Transactions on Computer Systems 2012]

20 Scaling Up via AutoScale # jobs/server more robust indicator. [Transactions on Computer Systems 2012] 20 Arrival rate, r(t), is insufficient indicator of load. Jobs/server grows super-linearly! Jobs/server reacts similarly to all forms of load 2x 1x Scaling factor → job size arrival rate

21 Why AutoScale works Theorem (loosely): As k → ∞, AutoScale’s capacity approaches square-root staffing. [ Performance Evaluation, 2010 (b)] Theorem : As k → ∞, M/M/k with DelayedOff + Packing

I’m late, I’m late ! Time (min)  Num. servers  _ load o k busy+idle x k busy+idle+setup ON/OFF T 95 =11,003ms, P avg =1,281W AutoScale T 95 =491ms, P avg =1,297W _ load o k busy+idle x k busy+idle+setup Num. servers  Time (min)  Within 30% of OPT power on all our traces! 22 Facebook cluster-testing AutoScale

AlwaysOnON/OFFAutoScale T 95 P avg T 95 P avg T 95 P avg 291 ms 2323 W 11,003 ms 1281 W 491 ms 1297 W 271 ms 2205 W 3,802 ms 759 W 466 ms 1016 W 289 ms 2363 W 4,227 ms 1,391 W 470 ms 1679 W 377 ms 2263 W > 1 min 849 W 556 ms 1412 W Results [Transactions on Computer Systems 2012] 23

24 Conclusion Dynamic power management  Managing the setup cost Part I: Effect of setup in M/M/k  First analysis of M/M/k/setup and M/M/∞/setup.  Introduced RRR technique for analyzing repeating Markov chains.  Effect of setup cost is very high for small k, but diminishes as k increases. Part II: Managing the setup cost in data centers  Non-Poisson arrival process; load unknown; unpredictable spikes.  Leaving servers AlwaysOn wastes power, but setup can be deadly.  Lesson: Don’t want to rush to turn servers off.  Proposed AutoScale with Delayedoff, Packing routing & Non-linear Scaling.  Demonstrated effectiveness of AutoScale in practice and theory.

25 Discussion Items Load Balance r Unknown Power Aware Load Balancer Application servers Memcached DB  Scaling stateful servers?  Tradeoffs between architectures: “Should we separate stateful from stateless?” See HotCloud 2012 See Middleware 2012 – best of both

26 References Anshul Gandhi, Sherwin Doroudi, Mor Harchol-Balter, Alan Scheller-Wolf. "Exact Analysis of the M/M/k/setup Class of Markov Chains via Recursive Renewal Reward." ACM SIGMETRICS 2013 Conference, June Anshul Gandhi, Mor Harchol-Balter, R. Raghunathan, Mike Kozuch. "AutoScale: Dynamic, Robust Capacity Management for Multi-Tier Data Centers." ACM Transactions on Computer Systems, vol. 30, No. 4, Article 14, 2012, pp Anshul Gandhi, Timothy Zhu, Mor Harchol-Balter, Mike Kozuch, “SOFTScale: Stealing Opportunistically for Transient Scaling.” Middleware Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter, Mike Kozuch. “Saving Cash by Using Less Cache.” HotCloud Anshul Gandhi, Mor Harchol-Balter, and Ivo Adan. "Server farms with setup costs." Performance Evaluation, vol. 67, no. 11, 2010, pp Anshul Gandhi, Varun Gupta, Mor Harchol-Balter, and Michael Kozuch. "Optimality Analysis of Energy-Performance Trade-off for Server Farm Management." Performance Evaluation vol. 67, no. 11, 2010, pp