2 Energy Efficiency and Cloud Computing David Patterson, UC Berkeley Reliable Adaptive Distributed Systems Lab Image: John Curley

Slides:



Advertisements
Similar presentations
UC Berkeley Above the Clouds A Berkeley View of Cloud Computing 1 UC Berkeley RAD Lab.
Advertisements

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
SLA-Oriented Resource Provisioning for Cloud Computing
Chapter 4 Infrastructure as a Service (IaaS)
Take your CMS to the cloud to lighten the load Brett Pollak Campus Web Office UC San Diego.
Cloud Computing Imranul Hoque. Today’s Cloud Computing.
C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 6 2/13/2015.
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Ken Birman. Massive data centers We’ve discussed the emergence of massive data centers associated with web applications and cloud computing Generally.
UC Berkeley 1 Above the Clouds: A Berkeley View of Cloud Computing Armando Fox, UC Berkeley Reliable Adaptive Distributed Systems Lab Image: John Curley.
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
A Computer Scientist Looks at the Energy Problem Randy H. Katz University of California, Berkeley EECS BEARS Symposium February 12, 2009 “Energy permits.
Cloud Computing (101).
CS : Creating the Grid OS—A Computer Science Approach to Energy Problems David E. Culler, Randy H. Katz University of California, Berkeley August.
Randy H. Katz 10 February 2009 Power Management in Computing Systems EE290N-3 Contemporary Energy Issues Randy H. Katz
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
Cloud Computing: Overview 1. This lecture What is cloud computing? What are its essential characteristics? Why cloud computing? Classification/service.
Cloud computing Tahani aljehani.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
An Introduction to Cloud Computing. The challenge Add new services for your users quickly and cost effectively.
Chapter-7 Introduction to Cloud Computing Cloud Computing.
EA and IT Infrastructure - 1© Minder Chen, Enterprise Architecture, IT Infrastructure, and Cloud Computing Minder Chen, Ph.D. CSU Channel Islands.
EA and IT Infrastructure - 1© Minder Chen, Stages in IT Infrastructure Evolution Mainframe/Mini Computers Personal Computer Client/Sever Computing.
Cloud Computing – The Cloud Dr. Jie Liu. Definition  Cloud computing is Web-based processing, whereby shared resources, software, and information are.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Road to the Cloud The Economics of Cloud Computing.
WINDOWS AZURE Scott Guthrie Corporate Vice President Windows Azure Application Platform.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Cloud Computing: Overview
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
UC Berkeley 1 The Future of Cloud Computing Michael Franklin, UC Berkeley Reliable Adaptive Distributed Systems Lab Image: John Curley
“Clouds: a construction zone” (and Why PaaS is the future…) Matt Thompson General Manager, Developer & Platform Evangelism Microsoft.
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future.
Geographic Information Systems Cloud GIS. ► The use of computing resources (hardware and software) that are delivered as a service over the Internet ►
Introduction to Cloud Computing
M.A.Doman Short video intro Model for enabling the delivery of computing as a SERVICE.
What is the cloud ? IT as a service Cloud allows access to services without user technical knowledge or control of supporting infrastructure Best described.
Server Virtualization
Increasing Manufacturing Uptime Is Made Easier with RtTech’s Industrial Facilities Application RtDuet, Powered by the Microsoft Azure Cloud MICROSOFT AZURE.
Distributed Systems Foundations Lecture 0. Evolution of computing history Main Frame with terminals Network of PCs & Workstations. Client-Server Now,
1 NETE4631 Network Information Systems : Introduction to Cloud Computing Lecture Notes #2.
Virtualization Supplemental Material beyond the textbook.
MidVision Enables Clients to Rent IBM WebSphere for Development, Test, and Peak Production Workloads in the Cloud on Microsoft Azure MICROSOFT AZURE ISV.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
noun ; Software Defined Enterprise/SDE/ The enterprise who leverages software to flank their traditional business offerings, or to create entirely new.
Axis AI Solves Challenges of Complex Data Extraction and Document Classification through Advanced Natural Language Processing and Machine Learning MICROSOFT.
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS
4a. Aula 2o. Período de Livro texto Copyright © 2012, Elsevier Inc. All rights reserved March 5, 2012 Prof. Kai Hwang, USC Cloud Roles in.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
Clouds, Grids and Clusters Prepared by M.Chandana Department of CSE Engineered for Tomorrow Course code: 10CS845.
Above the Clouds: A Berkeley View of Cloud Computing Annajiat Alim Rasel, P Shimul Bala, P Raquibul Bari, P Annajiat Alim.
Extreme Scale Infrastructure
Intro to Software as a Service (SaaS) and Cloud Computing
Green cloud computing 2 Cs 595 Lecture 15.
What is Cloud Computing - How cloud computing help your Business?
The University of Adelaide, School of Computer Science
Above the Clouds A Berkeley View of Cloud Computing
Syllabus and Introduction Keke Chen
The Future of Cloud Computing
Internet and Web Simple client-server model
Presentation transcript:

2 Energy Efficiency and Cloud Computing David Patterson, UC Berkeley Reliable Adaptive Distributed Systems Lab Image: John Curley

3 Energy Proportionality vs. Reality Turing Off Servers vs. Ensuring Full ROI Turning Off and Reliability Defining Cloud Computing RAD Lab Vision Datacenter OS and Energy Efficiency Datacenter Store and Energy Efficiency Outline

4 Datacenter Is New “Server” “Program” == Web search, , map/GIS, … “Computer” == 1000’s computers, storage, network Warehouse-sized facilities and workloads New datacenter ideas ( ): truck container (Sun), floating (Google), datacenter-in-a-tent (Microsoft) How to enable innovation in new services without first building & capitalizing a large company? photos: Sun Microsystems and datacenterknowledge.com

5 Tie to Cloud Computing Cloud Computing saves energy? Don’t buy machines for local use that are often idle Better to ship bits as photons vs. ship electrons over transmission lines to spin disks locally – Clouds use nearby (hydroelectric) power – Leverage economies of scale of cooling, power distribution

6 Tie to Cloud Computing Techniques developed to stop using idle servers to save money in Cloud Computing can also be used to save power – Up to Cloud Computing Provider to decide what to do with idle resources New Requirement: Scale DOWN and up – Who decides when to scale down in a datacenter? – How can Datacenter storage systems improve energy?

7 Energy Proportional Computing Figure 1. Average CPU utilization of more than 5,000 servers during a six-month period. Servers are rarely completely idle and seldom operate near their maximum utilization, instead operating most of the time at between 10 and 50 percent of their maximum “The Case for Energy-Proportional Computing,” Luiz André Barroso, Urs Hölzle, IEEE Computer December 2007 It is surprisingly hard to achieve high levels of utilization of typical servers: Most of time10% to 50%

8 Energy Proportionality? How close to “Energy Proportionality”? 10% of peak utilization => 10% of peak power? “ The Case for Energy-Proportional Computing,” Barroso and Hölzle, IEEE Computer, Dec Energy Proportionality

9 Benchmarking Power SPECPower benchmark December 2007 –Run ~SPECJBB Java benchmark (requests/s) –Vary requests/s in 10% increments:100% to 0% –Single number sum of requests / sum of power 1.5 years for companies to compare results, innovate, and tune hardware and software –Publish results every quarter: > 100 results –Average result improved 3X in 1.5 years –Benchmarketing or real progress?

10 SPECPower Results SPECpower 2008: – Average of 23 results from 2Q % utilization => 74% Peak Power 10% utilization => 54% Peak Power Save power by consolidate and turn off 5 10% = 270% 1 50% = 74% Save 2/3 of power (during slower periods) Energy Proportionality

11 But Powering off Hurts Hardware Reliability? Theory: if turn on and off infrequently, could IMPROVE reliability! Which is better: hot and idle vs. turned off and no wear but cycle temperature? Disks: MTTF measured in powered on hours –50,000 start/stops guaranteed (~1/hour over lifetime) – More years if fewer powered on hours per year? Integrated Circuits: there is small effect of being powered on vs. temperature cycle of off and on –One paper says improve lifetime by 1.4X if turn off 50% with infrequent power cycles (~1/hour over lifetime)

12 Small Experiment DETER Project at ISI and Berkeley 64 Nodes at ISI: Turn off when idle one hour 64 Identical nodes at Berkeley: Always on Ran for 18 months (so far) Failures –ISI ≤ 3 failures –Berkeley 5 failures (but more temperature variation) Didn’t hurt reliability (for small experiment)

13 Tradeoff: Turning Off vs. Ensuring Full ROI Given diurnal patterns and high power even when idle, turn off computers and consolidate during traditional slow periods – Problem: Existing monitoring software assumes broken if server doesn’t respond: change monitoring software or ??? Given huge capital investment in power and cooling, to maximize ROI, increase workload of other valuable tasks during traditional slow periods

14 Case for Getting Value Cost of Internet-Scale Datacenter – James Hamilton, perspectives.mvdirona.com – Keynote, Int’l Symp. Computer Arch., 6/23/09 Largest costs is server and storage H/W –Followed by cooling, power distribution, power –People costs 1000+:1 server:admin) –Services interests work-done-per-$ (or joule) –Networking $ varies: very low to dominant, depending upon service

15 Example Monthly Costs 50,000 $2k/server 15MW $200M, $0.07 per KWH Power$ 1/3 Servers$, <Power, cooling infra. Power Power and Cooling Infrastructure Servers

16 Given Costs, Why Turn Off? Only saving part of 20% of monthly costs Better to run batch jobs (MapReduce) overnight to add value to company – (Or rent idle machines to others) How much value do you really get from batch jobs? Electric utility mandated reductions on crisis days (or pay more all year)? Still true in future as Hardware costs fall and Power costs rise?

17 Example Monthly Costs 50,000 $1k/server 15MW $200M, $0.10 per KWH Power$ = Servers $, >Power, cooling infra. Power Power & Cooling Infrastructure Servers

18 DatacenterS Reduce Cost? Rather than elaborate, expensive batteries and diesel generators, rely on other datacenters to take over on failure Reduces cooling and power infrastructure costs per datacenter, making power a larger fraction of monthly costs

19 Outline Energy Proportionality vs. Reality Turing Off Servers vs. Ensuring Full ROI Turning Off Servers and Reliability Defining Cloud Computing RAD Lab Vision Datacenter OS and Energy Efficiency Datacenter Store and Energy Efficiency

20 But... What is cloud computing, exactly?

21 “It’s nothing (new)” “...we’ve redefined Cloud Computing to include everything that we already do... I don’t understand what we would do differently... other than change the wording of some of our ads.” Larry Ellison, CEO, Oracle (Wall Street Journal, Sept. 26, 2008)

22 Above the Clouds: A Berkeley View of Cloud Computing abovetheclouds.cs.berkeley.edu 2/09 White paper by RAD Lab PI’s and students –Clarify terminology around Cloud Computing –Quantify comparison with conventional computing –Identify Cloud Computing challenges and opportunities Why can we offer new perspective? –Strong engagement with industry –Users of cloud computing in research and teaching last 18 months Goal: stimulate discussion on what’s really new –Without resorting to weather analogies ad nauseam

23 Utility Computing Arrives Amazon Elastic Compute Cloud (EC2) “Compute unit” rental: $ /hr. –1 CU ≈ GHz 2007 AMD Opteron/Xeon core N No up-front cost, no contract, no minimum Billing rounded to nearest hour; pay-as-you-go storage also available A new paradigm (!) for deploying services? “Instances”PlatformCoresMemoryDisk Small - $0.10 / hr32-bit11.7 GB 160 GB Large - $0.40 / hr64-bit47.5 GB 850 GB – 2 spindles XLarge - $0.80 / hr64-bit815.0 GB1690 GB – 3 spindles

24 What Is it? What’s New? Old idea: Software as a Service (SaaS) –Basic idea predates MULTICS –Software hosted in the infrastructure vs. installed on local servers or desktops; dumb (but brawny) terminals –Recently: “[HW, Infrastructure, Platform] as a service” ?? HaaS, IaaS, PaaS poorly defined, so we avoid New: pay-as-you-go utility computing –Illusion of infinite resources on demand –Fine-grained billing: release == don’t pay –Earlier examples: Sun, Intel Computing Services—longer commitment, more $$$/hour, no storage –Public (utility) vs. private clouds

25 Why Now (Not Then)? “The Web Space Race”: Build-out of extremely large datacenters (10,000s of commodity PCs) –Build-out driven by growth in demand (more users) => Infrastructure software: e.g., Google File System => Operational expertise: failover, DDoS, firewalls... –Discovered economy of scale: 5-7x cheaper than provisioning a medium-sized (100s machines) facility More pervasive broadband Internet Commoditization of HW & SW –Fast Virtualization –Standardized software stacks

26 Classifying Clouds Instruction Set VM (Amazon EC2, 3Tera) Managed runtime VM (Microsoft Azure) Framework VM (Google AppEngine) Tradeoff: flexibility/portability vs. “built in” functionality EC2AzureAppEngine Lower-level, Less managed Higher-level, More managed

27 Unused resources Cloud Economics 101 Cloud Computing User: Static provisioning for peak - wasteful, but necessary for SLA “Statically provisioned” data center “Virtual” data center in the cloud Demand Capacity Time Machines Demand Capacity Time $

28 Unused resources Cloud Economics 101 Cloud Computing Provider: Could save energy “Statically provisioned” data center Real data center in the cloud Demand Capacity Time Machines Demand Capacity Time Energy

29 Unused resources Risk of Under-utilization Under-utilization results if “peak” predictions are too optimistic Static data center Demand Capacity Time Resources

30 Risks of Under Provisioning Lost revenue Lost users Resources Demand Capacity Time (days) 1 23 Resources Demand Capacity Time (days) 1 23 Resources Demand Capacity Time (days) 1 23

31 New Scenarios Enabled by “Risk Transfer” to Cloud Not (just) Capital Expense vs. Operation Expense! “Cost associativity”: 1,000 CPUs for 1 hour same price as 1 CPUs for 1,000 hours –Washington Post converted Hillary Clinton’s travel documents to post on WWW <1 day after released –RAD Lab graduate students demonstrate improved Hadoop (batch job) scheduler—on 1,000 servers Major enabler for SaaS startups –Animoto traffic doubled every 12 hours for 3 days when released as Facebook plug-in –Scaled from 50 to >3500 servers –...then scaled back down

32 Hybrid / Surge Computing Keep a local “private cloud” running same protocols as public cloud When need more, “surge” onto public cloud, and scale back when need fulfilled Saves energy (and capital expenditures) by not buying and deploying power distribution, cooling, machines that are mostly idle

33 Outline Energy Proportionality vs. Reality Turing Off vs. Ensuring Full ROI Turning Off and Reliability Defining Cloud Computing RAD Lab Vision Datacenter OS and Energy Efficiency Datacenter Store and Energy Efficiency

34 RAD Lab 5-year Mission Enable 1 person to develop, deploy, operate next -generation Internet application Key enabling technology: statistical machine learning –debugging, power management, performance prediction, … Highly interdisciplinary faculty and students –PI’s: Fox/Katz/Patterson (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (systems/security), Franklin (databases) –2 postdocs, ~30 PhD students, ~5 undergrads

35 Successes Predict performance of complex software system when demand is scaled up Automatically add/drop servers to fit demand, without violating Service Level Agreement (SLA) Distill millions of lines of log messages into an operator- friendly “decision tree” that pinpoints “unusual” incidents/conditions Recurring theme: cutting-edge Statistical Machine Learning (SML) works where simpler methods have failed

36 RAD Lab Prototype: System Architecture Drivers New apps, equipment, global policies (eg SLA) Offered load, resource utilization, etc. Chukwa & XTrace (monitoring) Training data Ruby on Rails environment VM monitor local OS functions Chukwa trace coll. web svc APIs Web 2.0 apps local OS functions Chukwa trace coll. SCADS Director performance & cost models Log Mining Automatic Workload Evaluation (AWE)

37 Automatic Management of a Datacenter As datacenters grow, need to automatically manage the applications and resources –examples: deploy applications change configuration, add/remove virtual machines recover from failures Director: –mechanism for executing datacenter actions Advisors: –intelligence behind datacenter management

38 Director Framework Advisor Datacenter(s) VM Director Drivers config monitoring data Advisor performance model workload model

39 Director Framework Director –issues low-level/physical actions to the DC/VMs request a VM, start/stop a service –manage configuration of the datacenter list of applications, VMs, … Advisors –update performance, utilization metrics –use workload, performance models –issue logical actions to the Director start an app, add 2 app servers

40 What About Storage? Easy to imagine how to scale up and scale down computation Database don’t scale down, usually run into limits when scaling up What would it mean to have datacenter storage that could scale up and down as well so as to save energy for storage in idle times?

41 SCADS: Scalable, Consistency-Adjustable Data Storage Goal: Provide web application developers with scale independence as site grows –No changes to application –Cost / User doesn’t increase as users increase –Latency / Request doesn’t increase as users Key Innovations –Performance safe query language –Declarative performance/consistency tradeoffs –Automatic scale up and down using machine learning (Director/Advisor)

42 Beyond 2/3 Energy Conservation Upper Bound? What if heterogeneous servers in data center? –Performance nodes: 1U to 2U servers, 2-4 sockets, 16 GB DRAM, 4 disks –Storage nodes: 4U to 8U servers, 2-4 sockets, 32 GB - 64 GB DRAM, 48 disks (e.g., Sun Thumper) 1 replica on Storage node, 2 or more replicas on Performance nodes If 10 Watts / disk, 250W per node (no disks): 1* *10 = 730 Watts vs. 12*( *10) = 3480 Watts Could save 80% heterogeneous vs. 67% homogenous when trying to save power

43 Overall Power Savings? Assumptions: Peak needs 10X servers, 50 hours per week is peak load, rest week 10% utilization (=> 2/3 power) Homogeneous, Everything on power: 50 Full load % load = 130 Full load Heteregeneous, turn off when load is low 50 Full load hrs * 10% 100% load = 62 Full load Saves 1/2 of power bill of data center

44 Conclusion Long way before Energy Proportionality ≈ ½ peak power when (benchmark) system idle Scaling down helps energy conservation Cloud Computing will transform IT industry –Pay-as-you-go utility computing leveraging economies of scale of Cloud provider –1000 CPUs for 1 hr = 1 CPU for 1000 hrs Cloud Computing offers financial incentive for systems to scale down as well as up – New CC challenges: Director, Scalable Store

45 Backup Slides

46 Microsoft’s Chicago Modular Datacenter

47 The Million Server Datacenter square meter housing 400 containers –Each container contains 2500 servers –Integrated computing, networking, power, cooling systems 300 MW supplied from two power substations situated on opposite sides of the datacenter Dual water-based cooling systems circulate cold water to containers, eliminating need for air conditioned rooms

IT Carbon Footprint 820m tons CO 2 360m tons CO 2 260m tons CO Worldwide IT carbon footprint: 2% = 830 m tons CO 2 Comparable to the global aviation industry Expected to grow to 4% by 2020

49 Thermal Image of Typical Cluster Rack Rack Switch M. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation

50 DC Networking and Power 96 x 1 Gbit port Cisco datacenter switch consumes around 15 kW -- approximately 100x a typical dual processor Google 145 W High port density drives network element design, but such high power density makes it difficult to tightly pack them with servers Alternative distributed processing/communications topology under investigation by various research groups

51 DC Networking and Power Within DC racks, network equipment often the “hottest” components in the hot spot Network opportunities for power reduction –Transition to higher speed interconnects (10 Gbs) at DC scales and densities –High function/high power assists embedded in network element (e.g., TCAMs) Recent Work: –Y. Chen, T. Wang, R. H. Katz, “Energy Efficient Ethernet Encodings,” IEEE LCN, –G. Ananthanarayanan, R. H. Katz, “Greening the Switch,” Usenix HotPower’08 Workshop.