DIMMer: A case for turning off DIMMs in clouds

Slides:



Advertisements
Similar presentations
Zehan Cui, Yan Zhu, Yungang Bao, Mingyu Chen Institute of Computing Technology, Chinese Academy of Sciences July 28, 2011.
Advertisements

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
Paging: Design Issues. Readings r Silbershatz et al: ,
KMemvisor: Flexible System Wide Memory Mirroring in Virtual Environments Bin Wang Zhengwei Qi Haibing Guan Haoliang Dong Wei Sun Shanghai Key Laboratory.
CSE 691: Energy-Efficient Computing Lecture 4 SCALING: stateless vs. stateful Anshul Gandhi 1307, CS building
Green Cloud Computing Hadi Salimi Distributed Systems Lab, School of Computer Engineering, Iran University of Science and Technology,
Micro-Pages: Increasing DRAM Efficiency with Locality-Aware Data Placement Kshitij Sudan, Niladrish Chatterjee, David Nellans, Manu Awasthi, Rajeev Balasubramonian,
Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.
Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.
Virtualization for Cloud Computing
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
New Challenges in Cloud Datacenter Monitoring and Management
CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.
Green IT and Data Centers Darshan R. Kapadia Gregor von Laszewski 1.
Virtual Machine Course Rofideh Hadighi University of Science and Technology of Mazandaran, 31 Dec 2009.
1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Virtualization. Virtualization  In computing, virtualization is a broad term that refers to the abstraction of computer resources  It is "a technique.
+ CS 325: CS Hardware and Software Organization and Architecture Cloud Architectures.
Cloud Computing Energy efficient cloud computing Keke Chen.
Heterogeneity and Dynamicity of Clouds at Scale: Google Trace Analysis [1] 4/24/2014 Presented by: Rakesh Kumar [1 ]
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Server Virtualization
Power Containers: An OS Facility for Fine-Grained Power and Energy Management on Multicore Servers Kai Shen, Arrvindh Shriraman, Sandhya Dwarkadas, Xiao.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
1 Agility in Virtualized Utility Computing Hangwei Qian, Elliot Miller, Wei Zhang Michael Rabinovich, Craig E. Wills {EECS Department, Case Western Reserve.
Accounting for Load Variation in Energy-Efficient Data Centers
Threads. Readings r Silberschatz et al : Chapter 4.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.
PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,
SizeCap: Efficiently Handling Power Surges for Fuel Cell Powered Data Centers Yang Li, Di Wang, Saugata Ghose, Jie Liu, Sriram Govindan, Sean James, Eric.
1 Automated Power Management Through Virtualization Anne Holler, VMware Anil Kapur, VMware.
1 Lecture: Memory Basics and Innovations Topics: memory organization basics, schedulers, refresh,
Matthew Garrett Going green with Linux Matthew Garrett
Virtualization for Cloud Computing
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
OPERATING SYSTEMS CS 3502 Fall 2017
Processes and threads.
Seth Pugsley, Jeffrey Jestes,
Reducing Memory Interference in Multicore Systems
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Tao Zhu1,2, Chengchun Shu1, Haiyan Yu1
Process Management Process Concept Why only the global variables?
Lecture 20: WSC, Datacenters
CS 6560: Operating Systems Design
Measurement-based Design
Green cloud computing 2 Cs 595 Lecture 15.
Data Dissemination and Management (2) Lecture 10
Swapping Segmented paging allows us to have non-contiguous allocations
Intro to Processes CSSE 332 Operating Systems
PA an Coordinated Memory Caching for Parallel Jobs
The University of Adelaide, School of Computer Science
湖南大学-信息科学与工程学院-计算机与科学系
Lecture 18 Warehouse Scale Computing
Lecture: DRAM Main Memory
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
Lecture: DRAM Main Memory
Lecture: DRAM Main Memory
INFO 344 Web Tools And Development
Processor Fundamentals
IP Control Gateway (IPCG)
Chapter 13: I/O Systems.
Data Dissemination and Management (2) Lecture 10
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Towards Predictable Datacenter Networks
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

DIMMer: A case for turning off DIMMs in clouds Dongli Zhang, Moussa Ehsan, Michael Ferdman, Radu Sion National Security Institute

Inconsistency between Capacity and Demand 200B KWh … worldwide in 2010 Year 2011 Today, companies such as Google, Amazon and Facebook run immense data centers with unimaginable computation and storage capacities. As online services become pervasive, global data centers worldwide in 2010 consumed more than 200B KWh, which is a huge amount of energy. Google also announced in 2011 that their facilities have a continuous electricity usage equivalent to powering 200,000 homes. So, do today’s data centers really require such a huge amount of energy? Actually no! According to a report by The New York Times in 2012, up to 90% of the energy is wasted because data center cannot modulate capacity So, can we reduce the energy cost by dynamically provisioning servers in data center? Capacity >>> demand Energy waste is 90% 200,000

Google Data Center Peak Power Usage Distribution 2007 2012 The answer is no, because servers may participate in distributed services (e.g., file serving) and the frequent power cycling may increase the failure rates of server’s components, such as power supplies and fans. We explored into the peak power usage distribution Google data centers. The figures are from the book “The data center as a computer”. According to Google’s data in both 2007 and 2012, CPU and DRAM consume more than 50% of the data center power cost. So, can we dynamically provision the server components, such as DRAM and CPU? [The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines] [The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, second edition]

DIMMer Vision Agile framework Workload-driven scalability and power reduction Power off idle DRAM ranks Power off idle CPU sockets Save background power We envision DIMMer, an agile framework for workload-driven scalability and power reduction in data centers. During low resource utilization, e.g., during low memory utilization, DIMMer would power off all idle DRAM ranks in data center servers to save DRAM background power. DIMMer would dynamically provision memory capacity at the granularity of DRAM ranks. Energy costs for additional DRAM capacity are paid only when this capacity is requested by the system. It is an opportunity to reduce energy consumption of server systems while having little impact on performance.

Vision for Implementation Modifications to the OS kernel (e.g., Linux) and Hardware: Rank-Aware Page Allocator Create a free list for each DRAM rank Page Migrator Migrate pages in the background and at low priority Expose registers in CPU and DRAM controller To allow full electronically power-off Provided by hardware manufacturer Enable DIMMer during low workload The implementation of DIMMer is not hard. It requires modifications to the memory management subsystem of the OS kernel and also the DRAM controller. We need a Rank-Aware Page allocator to control the rank of each memory page. Therefore, we need to maintain a page free list data structure for each rank. To achieve memory pages consolidation, a page migrator is also required. Page migrations will happen in background and at low priority. Actually, functionalities similar to allocator and migrator have already been implemented in prior works. DIMMer also requires hardware modification. To fully electronically power off DRAM, we hope the hardware manufacture could expose the corresponding registers in the future generations of DRAM controllers. Finally, DIMMer is only enabled during low workload. It will be disabled during peak workload.

DRAM: Self-Refresh Mode Low-power mode Consumes less energy than ACT mode Data is not lost State of the Art: Hot ranks and cold ranks Maximize the self-refresh time Idle 8GB DIMM power = 2.3W ! You may ask, wait, instead of powering off DRAMs, why not use self-refresh mode. Self-refresh mode……. The state of the art works consolidate hot memory pages on hot ranks, cold pages on cold ranks. They maximize the length of self-refresh time of DRAM ranks. Why don’t we just consolidate memory pages and switch idle DRAM ranks into self-refresh mode, but not power off them? To answer this question, we measured the power consumption of a single 8GB DIMM with various workloads from 0 to more than 2GB/s. We find that simply keeping a DIMM powered on with near-zero memory traffic has a constant power consumption of 2.3W, which constitutes more than 50% of each DIMM’s power consumption observed at peak throughput for cloud workloads. To summarize the difference between DIMMer and self-refresh, self-refresh mode reduces the active time of DRAM and DIMMer reduces the active capacity of DRAM. Even during self-refresh mode, the idle power is still being consumed. Self-Refresh: Reduce Active Time DIMMer: Reduce Active Capacity

DIMMer vs. Self-Refresh CPU+DRAM Power: Self-Refresh : 66.00W DIMMer : 64.16W Now let’s compare DIMMer with self-refresh mode in a more detailed case. We assume a dual-node server have two CPUs and 8 DIMMS, that is, 16 ranks. Memory pages are consolidated on active ranks. Idle ranks contain no active memory page. In an aggressive case, we assume the percentage of time in ACT_STBY/PRE_STBY is 100% for active ranks and 0% for idle ranks. This is just an aggressive case. Suppose we have one DIMM idle, that is, two ranks are idle. By switching idle DRAM ranks into self-refresh mode, the total power consumption is 66.0W. By powering off unused ranks, DIMMer consumes 64.16W for CPU and DRAM. The saving over self-refresh mode is only 3%. What a funny joke! Saving 3%! Percentage of Time in ACT_STBY/PRE_STBY: Active 100% Idle 0%

DIMMer vs. Self-Refresh Idle CPU socket power = 16W CPU+DRAM Power: Self-Refresh : 57.12W DIMMer : 37.44W The story will change if we power off an entire DRAM channel. According to our measurement, the idle power consumption of an 8-core CPU is 16W. With self-refresh mode, the CPU plus DRAM power consumption is 57.12W. After powering off the entire socket, DIMMer consumes 37.44W. This time, the saving of DIMMer over self-refresh mode is 35%! The primary saving of DIMMer is achieved by powering off CPU. If we just switch DRAMs into self-refresh mode, we cannot power off the idle CPU. We are just showing a special case on a single server. It DIMMer applicable to today’s data centers? Saving 35%! Percentage of Time in ACT_STBY/PRE_STBY: Active 100% Idle 0%

Google Cluster Trace http://code.google.com/p/googleclusterdata/ 12000+ servers Machines, jobs, and tasks Workload traces (CPU, memory, storage) with normalized data Resource usage in 5-min granularity of one month Spawned a number of seminal results SoCC , ICDCS, ICAC, IC2E … Google cluster trace is published online by Google. It has data for more than 12 thou servers. It has detailed information about the servers, jobs and tasks, including CPU, memory, and storage utilization. It is collected at 5-minute granularity over one month. It has spawned a number of seminal results

Our Assumptions Max server DRAM capacity = 64GB Max server CPU socket = 2 No OS kernel resource usage and job dependencies Increasing DRAM capacity in data center The google cluster trace used for our study provides relative memory capacities normalized to the machine maximum-capacity present in the cluster. We estimate per-machine DRAM capacity and number of DIMMs based on the distribution of the machine counts in the dataset, the approximate date of cluster deployment, and the fact that Google populated all server DRAM slots at that time. Machines with unusual memory capacities (likely due to partial memory failures) were ignored. We assume most servers have 32GB memory, that is, 4 channels, 16 ranks. We assume the maximum DRAM capacity is 64GB and maximum CPU socket is 2.

Saving: DRAM Consolidate memory pages on minimum DRAM ranks on each server Monthly DRAM Rank Count (total and idle) 50% memory utilization 50% DRAM ranks are idle Save 30MWh per month We assume the memory consolidation can happen only within each server and we counted the monthly total and idle DRAM ranks in each 5-minute time slot. On average, during the month, the memory utilization if 50% and 50% of DRAM ranks can be powered off. Totally, powering off DRAM Because self-refresh power is proportional to DRAM capacity, the savings of DIMMer are likely to be higher in future systems.

Saving: CPU + = 82MWh = 51MT CO2 Consolidate workloads on minimum CPU sockets on each server Monthly CPU socket count (total and idle) 50 % CPU utilization 20 % CPU sockets are idle No PCI devices (e.g. NIC) involved Save 52MWh per month + = 82MWh = 51MT CO2 We also counted the monthly total and idle DRAM ranks in each 5-minute time slot. Unlike DRAM ranks that can be powered off independently, a CPU can be powered off only when all of its cores and all DRAM ranks attached to its socket are idle. We note that powering off a CPU may have other system implications, such as rendering some PCIe devices unavailable, which may place additional constraints on DIMMer. This provides opportunity for future research, for example, in powering off some of a system’s NICs for further energy reduction. Changes to the cluster resource management framework can mitigate this inconsistency and maximize the number CPUs that can be powered off.

Physical Memory Address Space DIMMer Cost The story is not finished… OS Memory Capacity Resizing Non-Interleaved Address Mapping Page Migration Reduced Cache Capacity We have demod the opp Physical Memory Address Space 4KB Page Addition Removal Disk Cache

Page Migration Tricks: 8GB node-to-node migration takes 13.5s Observation in Google Cluster Trace 30.2% are cache pages 11.2% are caches pages not mapped into process First, let’s discuss page migration cost. We measured the node-to-node page migration, which we think as the most expensive case. According to our experiment, if there is a 8GB memory migration every 30 mins for one month, the total monthly page migration cost is 210KWh, which is 0.26% of DIMMer’s total savings. On average , each migration will take 13.5 seconds. Actually, to power off a rank, we do not need to migrate all memory pages of that rank. According to our analysis of Google trace, 30.2% are cache pages and 11.2% are cache pages not mapped into any process. Many of cache pages will only be used once. Therefore, the tricks…. Tricks: Only migrate hot cache pages and anonymous pages Out of band migration 4KB Page

OS Memory Capacity Resizing Liu et al. (TPDS 2014) Coarse-grained VM memory hotplug Dell PowerEdge 1950 & Intel Quad-Core Xeon E5450 3GHz VM-based 1GB Memory Removal VM-based 1GB Memory Addition Physical Memory Address Space When we power on and power off DRAMs, there is a latency, because kernel needs to resize the memory capacity. Here we use experimental data from Liu’s paper published on TPDS in 2014. The data is about the course-grained VM memory hotplug. According to the data, the VM-based 1GB memory addition will consume 0.43s. The VM-based 1GB memory removal during high workload, heavily loaded running TPC-C, will take 0.3s. 0.43s 0.3s Heavily loaded running TPC-C Tricks: Reserve extra DRAM rank to handle spontaneous demand spikes

Reduced Cache Capacity Web Server Disk Cache Zhu et al. (HotCloud 2012) Sacrificing a few percent cache hit rate Reduce cache cost by 90% To power off DRAM, we may reduce the disk cache capacity. This will not bring obvious performance impact. We use experimental evidence with web server disk cache from the prior work to support thi. Sacrificing a few percent hit rate, the cache costs can be reduced by almost 90%. Although one cannot reliably infer the performance implications of reducing disk cache for other workloads, these web server results are promising. Actually, cache pages can be classified as hot pages and cold pages. We can put hot cache pages on hot ranks and cold cache pages on cold ranks. Tricks: Hot cache pages on active rank Cold cache pages on idle rank Disk Cache

Non-Interleaved Address Mapping Channel Interleaving Ferdman et al. (ASPLOS 2012): Cloud workloads underutilize memory bandwidth Rank Interleaving VipZonE (CODES+ISSS 2012): 1.03% execution overhead Map-Reduce Web front end 25% of available bandwidth Media Streaming Web Search Finally, let’s discuss the interleaving. We have both channel and rank interleaving with DRAM. Channel interleaving is used to improve memory bandwidth by interleaving physical pages across DIMMs on multiple channels. Rank interleaving reduces memory latency by spreading each page across many ranks, enabling concurrent accesses by letting the controller open a row in one rank, while another rank is being accessed. Tricks: Reserve certain interleaving-enabled channels Power off parallel ranks across channels

Total Cost of Ownership (TCO) Model by James Hamilton : http://perspectives.mvdirona.com/ Cost of power: $0.10/KWh Cost of facility: $50M Number of servers: 12,583 Cost/Server: $2K Server Amortization: 36 months Power Usage Effectiveness: 1.2 Monthly Savings Over Total: 0.6% Over Power: 1.4% Excluding Cooling: 3.15% The story is still not saved. Let me show you the Total Cost of Ownership, that is, what is DIMMer ‘s saving’s percentage in total data center cost. We use the cost model by James Hamilton. We use the following inputs: … With this cost model, the DIMMer’s saving over total monthly data center cost is 0.6%.

DIMMer Summary DIMMer Vision: Power off idle DRAM and CPU Vision for Implementation Improvement over Self-Refresh Benefit / Cost 50% DRAM and 18.8% CPU background energy Total Cost of Ownership Energy-efficient scheduler in the future It is long-recognized that most server hardware exhibits disproportionately high energy consumption when operating at low utilization [41]. To mitigate this effect, low-power operating modes have been introduced. Moreover, techniques have been developed to maximize the time spent in lowpower states. DIMMer builds on this work and observes that dynamic control of memory capacity presents an additional opportunity to reduce energy consumption of server systems. Using a Google cluster trace as well as in-house experiments, we estimate up to 50% savings on DRAM and 18.8% on CPU background energy. At $0.10/kWh, this corresponds to 0.6% of total data center cost. Some people think 0.6% is a lot. You – decide it! Dongli Zhang Doctoral Student @ Stony Brook University Email: dozhang@cs.stonybrook.edu