Power Cost Reduction in Distributed Data Centers Yuan Yao University of Southern California 1 Joint work: Longbo Huang, Abhishek Sharma, LeanaGolubchik and Michael Neely IBM Student Workshop for Frontiers of Cloud Computing 2011 Paper to appear on Infocom 2012
Background and motivation Data centers are growing in number and size… – Number of servers: Google (~1M) – Data centers built in multiple locations IBM owns and operates hundreds of data centers worldwide …and in power cost! – Google spends ~$100M/year on power – Reduce cost on power while considering QoS 2
Existing Approaches Power efficient hardware design System design/Resource management – Use existing infrastructure – Exploit options in routing and resource management of data center 3
Existing Approaches Power cost reduction through algorithm design – Server level: power-speed scaling [Wierman09] – Data center level: rightsizing [Gandhi10, Lin11] – Inter data center level: Geographical load balancing [Qureshi09, Liu11] 4 $5/kwh $2/kwh job
Our Approach: SAVE We provide a framework that allows us to exploit options in all these levels + Temporal volatility of power prices = StochAstic power redUctionschEme(S AVE) 5 Server level Data center level Inter data center level Job arrivedJob served
Our Model: data center and workload M geographically distributed data centers Each data center contain a front end server and a back end cluster Workloads A i (t) (i.i.d) arrive at front end servers and are routed to one of the back end clusters 6 µ ji (t)
Our Model: server operation and cost 7 Back end cluster of data center i contain N i servers – N i (t) servers active Service rate of active servers: b i (t) ∈ [0, b max ] Power price at data center i: p i (t) (i.i.d) Powerusage at data center i: Power cost at data center i:
Our Model: two time scale The system we model is two time scale – At t=kT, change the number of active servers N j (t) – At all time slots, change service rate b j (t) 8
Our Model: summary Input: power prices p i (t), job arrival A i (t) Two time Scale Control Action : Queue evolution: Objective: Minimize the time average power cost subject to all constraints on Π, and queue stability 9
SAVE: intuitions SAVE operates at both front end and back end Front end routing: – When, choose μ ij (t)>0 Back end server management: – Choose small N j (t) and b j (t) to reduce the power cost f j (t) – When is large, choose large N j (t) and b j (t) to stabilize the queue 10
SAVE: how it works Front end routing: – In all time slot t, choose μ ij (t) maximize Back end server management: Choose V>0 – At time slot t=kT, choose N j (t) to minimize – In all time slots τ choose b j (τ) to minimize Serve jobs and update queue sizes 11
SAVE: performance Theorem on performance of our approach: – Delay of SAVE ≤ O(V) – Power cost of SAVE ≤ Power cost of OPTIMAL + O(1/V) – OPTIMAL can be any scheme that stabilizes the queues 12 V controls the trade-off between average queue size (delay) and average power cost. SAVE suited for delay tolerant workloads
Experimental Setup We simulate data centers at 7 locations – Real world power prices – Possion arrivals We use synthetic workloads that mimics MapReduce jobs Power Cost 13 Power consumption of active servers Power usage effectiveness Power consumption of servers in sleep Power price
Experimental Setup: Heuristics for comparison Local Computation – Send jobs to local back end Load Balancing – Evenly split jobs to all back ends Low Price (similar to [Qureshi09]) – Send more jobs to places with low power prices All servers are activated 14 Instant On/Off – Routing is the same as Load Balancing – Data center i tune N i (t) and b i (t) every time slot to minimize its power cost – No additional cost on activating/putting to sleep servers Unrealistic
Experimental Results As V increases, power cost reduction grows from ~0.1% to ~18% SAVE is more effective for delay tolerant workloads. relative power cost reduction as compared to Local Computation 15
Experimental Results: Power Usage Our approach saves power usage 16 We record the actual power usage (not cost) of all schemes in our experiments
Summary We propose atwo time scale, non work conserving control algorithm aimed atreducing power costin distributed data centers. Our work facilitating an explicit power cost vs. delay trade-off We derive analytical bounds on the time average power cost and service delay achieved by our algorithm Through simulations we show that our approach can reduce the power cost by as much as 18%, and our approach reduces power usage. 17
Future work Other problems on power reduction in data centers – Scheduling algorithms to save power – Delay sensitive workloads – Virtualized environment, when migration is available 18
Questions? Please check out our paper: – "Data Centers Power Reduction: A two Time Scale Approach for Delay Tolerant Workloads” to appear on Infocom 2012 Contact info: 19