Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan

Slides:

Advertisements

Similar presentations

Exploring the Potential of CMP Core Count Management on Data Center Energy Savings Ozlem Bilgir * Margaret Martonosi * Qiang Wu * Princeton University.

Advertisements

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.

ENERGY-EFFICIENT ALGORITHMS INTRODUCTION TO DETERMINISTIC ONLINE POWER-DOWN ALGORITHMS Len Matsuyama CS 695.

International Symposium on Low Power Electronics and Design Energy-Efficient Non-Minimal Path On-chip Interconnection Network for Heterogeneous Systems.

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.

A Framework for Dynamic Energy Efficiency and Temperature Management (DEETM) Michael Huang, Jose Renau, Seung-Moon Yoo, Josep Torrellas University of Illinois.

Decentralized Reactive Clustering in Sensor Networks Yingyue Xu April 26, 2015.

1 MemScale: Active Low-Power Modes for Main Memory Qingyuan Deng, David Meisner*, Luiz Ramos, Thomas F. Wenisch*, and Ricardo Bianchini Rutgers University.

A Cyber-Physical Systems Approach to Energy Management in Data Centers Presented by Chen He Adopted form the paper authors.

KnightShift: Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity Daniel WongMurali Annavaram University of Southern California MICRO-2012.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

Task Based Execution of GPU Applications with Dynamic Data Dependencies Mehmet E Belviranli Chih H Chou Laxmi N Bhuyan Rajiv Gupta.

Knight’s Tour Distributed Problem Solving Knight’s Tour Yoav Kasorla Izhaq Shohat.

Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02

CS 423 – Operating Systems Design Lecture 22 – Power Management Klara Nahrstedt and Raoul Rivas Spring 2013 CS Spring 2013.

PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,

Low-Power Wireless Sensor Networks

Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.

A Distributed Energy Saving Approach for Ethernet Switches in Data Centers Weisheng Si 1, Javid Taheri 2, Albert Zomaya 2 1 School of Computing, Engineering,

Déjà Vu Switching for Multiplane NoCs NOCS’12 University of Pittsburgh Ahmed Abousamra Rami MelhemAlex Jones.

Power Save Mechanisms for Multi-Hop Wireless Networks Matthew J. Miller and Nitin H. Vaidya University of Illinois at Urbana-Champaign BROADNETS October.

AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.

CSE 691: Energy-Efficient Computing Lecture 6 SHARING: distributed vs. local Anshul Gandhi 1307, CS building

Optimizing Sensor Networks in the Energy-Latency-Density Design Space Curt Schurgers, Vlasios Tsiatsis, Saurabh Ganeriwal, Mani Srivastava, IEEE TRANSACTIONS.

IBM Research © 2010 IBM Corporation Guarded Power Gating in a Multi-core Setting Niti Madan, Alper Buyuktosunoglu, Pradip Bose, IBM T.J.Watson June 2010.

Multi-Core Architectures

Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.

Reducing Network Energy Consumption via Sleeping and Rate Adaptation.

Srihari Makineni & Ravi Iyer Communications Technology Lab

1 Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly Koushik Chakraborty Philip Wells Gurindar Sohi

Computational Sprinting on a Real System: Preliminary Results Arun Raghavan *, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas F. Wenisch +, Milo M. K.

A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,

Dana Butnariu Princeton University EDGE Lab June – September 2011 OPTIMAL SLEEPING IN DATACENTERS Joint work with Professor Mung Chiang, Ioannis Kamitsos,

Mobile Agent Migration Problem Yingyue Xu. Energy efficiency requirement of sensor networks Mobile agent computing paradigm Data fusion, distributed processing.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

Embedded System Lab. 김해천 The TURBO Diaries: Application-controlled Frequency Scaling Explained.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Energy Efficient Implementation of IETF Constrained Protocol Suite draft-ietf-lwig-energy-efficient-01 Z. Cao, C. Gomez, M. Kovatsch, H. Tian, X. He Carles.

Authors: Danhua Guo 、 Guangdeng Liao 、 Laxmi N. Bhuyan 、 Bin Liu 、 Jianxun Jason Ding Conf. : The 4th ACM/IEEE Symposium on Architectures for Networking.

Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.

A+MAC: A Streamlined Variable Duty-Cycle MAC Protocol for Wireless Sensor Networks 1 Sang Hoon Lee, 2 Byung Joon Park and 1 Lynn Choi 1 School of Electrical.

1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),

The CRISP Performance Model for Dynamic Voltage and Frequency Scaling in a GPGPU Rajib Nath, Dean Tullsen 1 Micro 2015.

Dynamic Resource Allocation for Shared Data Centers Using Online Measurements By- Abhishek Chandra, Weibo Gong and Prashant Shenoy.

A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.

Multi-mode Energy Management for Multi-tier Server Clusters Tibor Horvath Kevin Skadron University of Virginia PACT 2008.

GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich, Pascal Bouvry, Yury Audzevich, and Samee Ullah Khan.

Hang Zhang1, Xuhao Chen1, Nong Xiao1,2, Fang Liu1

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Anshul Gandhi 347, CS building

Green cloud computing 2 Cs 595 Lecture 15.

SECTIONS 1-7 By Astha Chawla

Scaling the Memory Power Wall with DRAM-Aware Data Management

Frequency Governors for Cloud Database OLTP Workloads

Rahul Boyapati. , Jiayi Huang

Yu-Guang Chen1,2, Wan-Yu Wen1, Tao Wang2,

Fault and Energy Aware Communication Mapping with Guaranteed Latency for Applications Implemented on NoC Sorin Manolache, Petru Eles, Zebo Peng {sorma,

HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.

Computer Architecture Lecture 4 17th May, 2006

CSE 591: Energy-Efficient Computing Lecture 9 SLEEP: processor

Haishan Zhu, Mattan Erez

Zhen Xiao, Qi Chen, and Haipeng Luo May 2013

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.

Energy Efficient Scheduling in IoT Networks

Cross-Layer Optimizations between Network and Compute in Online Services Balajee Vamanan.

Shane Case and Kanad Ghose Dept. of Computer Science

Reducing Network Energy Consumption via Sleeping and Rate Adaptation

Department of Electrical Engineering Joint work with Jiong Luo

Request Behavior Variations

A Novel Cache-Utilization Based Dynamic Voltage Frequency Scaling (DVFS) Mechanism for Reliability Enhancements *Yen-Hao Chen, *Yi-Lun Tang, **Yi-Yu Liu,

Presentation transcript:

Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan DynSleep: Fine-grained Power Management for a Latency-Critical Data Center Application Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan

Outline Background & Motivation DynSleep Prototype with Memcached Data Center Workload Characteristics. Existing Approaches. DynSleep Prototype with Memcached Experimental Evaluation

Data Center Latency-Critical Workloads Characteristics Server utilization Lightly loaded. Short-term variability. Request processing ON/OFF execution pattern. Non-deterministic. Poor energy efficiency at low server utilization

Power Saving Opportunities Target QoS is defined at peak load. Low utilization servers create latency slack. Exploiting this slack for power saving. Tail latency under light load Target tail latency Latency Slack

Existing Approaches DVFS: reducing the processing rate. Sleep States Limited room for down scaling. Limited power saving. Per-core control is not common. Sleep States Limited by the length of idle periods. Frequency (GHz) 2.7 2.4 2.1 1.8 1.5 1.2 Voltage (V) 0.99 0.96 0.94 0.92 0.90 0.88 Active Power (W) 3.42 2.93 2.49 2.05 1.68 1.31 56% frequency reduction 13% voltage reduction 61% power reduction State State transition time Target Residency Power C0 N/A (3~3.5 W) C1 1 μs 1.2 W C3 59 μs 156 μs 0.13 W C6 89 μs 300 μs 0 W

Observations Our Solution: DynSleep Short idle periods cause high idle power. Traffic variability. Fine-grained control over time and space domain. Our Solution: DynSleep

DynSleep: Overview Utilizing per-core sleep state. (space domain) Postponing the request service. Transform scattered idle periods into a longer one for deep sleep state. (reduce idle power) Dynamically determine core wake-up time. Satisfy the target tail latency constraint. (time domain)

DynSleep: Example at t=A2 at t=A3 at t=A1 t=0 W3 W1 time R1 arrives Target Tail Latency Target Tail Latency Target Tail Latency R1 arrives R2 arrives R3 arrives

DynSleep: Power consumption behavior Baseline DynSleep Active Shallow sleep Deep sleep Time

Case Study: Memcached Clients Worker Thread ∙ ∙ ∙ Libevent Request Processing Send Response req result Read and Parse Data fd Client send requests Memcached Server libevent monitors network sockets through epoll for the request arrivals. Independent threads and requests.

Memcached with DynSleep Libevent Thread Request Processing Thread Clients Libevent Request Processing Send Response DynSleep Manager DynSleep Calculator req result fd1 fd2 Register/Update Timer Thread Communication Read and Parse Data fd Wakeup signal Sleep signal Two separate threads. A core is woken up by wakeup signal.

Evaluation: Experiment Setup A client server and a request processing server connected over 10G Ethernet. Intel Xeon E5 2697-V2 12-core processor. Only support per-core DFS. On-chip energy sensors with 1KHz sampling rate.

Evaluation: Power Saving At low to medium load, lager latency slack leads to high power saving of DynSleep. At high load, DynSleep power saving is comparative to DVFS scheme. DynSleep significantly outperforms per-core DFS scheme.

Evaluation: Latency Distribution Baseline 95th: 187µs DFS 95th: 448µs Target : 686µs DynSleep 95th: 665µs Load 0.3 Baseline has about 500 µsec latency slack. Large gap still left in DFS(DVFS) scheme because of the limited VF down scaling. DynSleep effective close the gap by postponing request processing.

Evaluation: Load Changes At low load, DFS can’t fully utilize the latency slack. At high load, DFS lacks the responsiveness and frequently violate the constraint. DynSleep responds instantaneously to the load changes because of the request level updates.

Conclusion Major source of the energy inefficiency comes from the idle power. Non-deterministic and short idle periods. We propose DynSleep Reshape the idle periods pattern. Utilize deep sleep states. Dynamically wake up to meet the strict QoS constraint. Our memcached prototype demonstrates up to 65% power saving.