Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.

Slides:

Advertisements

Similar presentations

Express5800/ft series servers Product Information Fault-Tolerant General Purpose Servers.

Advertisements

Managing Hardware and Software Assets

1 Virtual Resource Management (VRM) in Cloud Environment draft-Junsheng-Cloud-VRM-00 Friday 21 Jan 2011 B. Khasnabish, Chu JunSheng, Meng Yu.

6-April 06 by Nathan Chien. PCI System Block Diagram.

Conserving Disk Energy in Network Servers ACM 17th annual international conference on Supercomputing Presented by Hsu Hao Chen.

Chapter 1: Introduction to Scaling Networks

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

Proposal by CA Technologies, IBM, SAP, Vnomic

Database System Concepts and Architecture

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.

Department of Computer Science iGPU: Exception Support and Speculative Execution on GPUs Jaikrishnan Menon, Marc de Kruijf Karthikeyan Sankaralingam Vertical.

Walter Binder University of Lugano, Switzerland Niranjan Suri IHMC, Florida, USA Green Computing: Energy Consumption Optimized Service Hosting.

Dave Bradley Rick Harper Steve Hunter 4/28/2003 CoolRunnings.

KnightShift: Scaling the Energy Proportionality Wall Through Server-Level Heterogeneity Daniel WongMurali Annavaram University of Southern California MICRO-2012.

11 HDS TECHNOLOGY DEMONSTRATION Steve Sonnenberg May 12, 2014 © Hitachi Data Systems Corporation All Rights Reserved.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

DISTRIBUTED CONSISTENCY MANAGEMENT IN A SINGLE ADDRESS SPACE DISTRIBUTED OPERATING SYSTEM Sombrero.

Datacenter Power State-of-the-Art Randy H. Katz University of California, Berkeley LoCal 0 th Retreat “Energy permits things to exist; information, to.

High Performance Computing Course Notes High Performance Storage.

Exploring The Green Blade Ken Lutz University of California, Berkeley LoCal Retreat, June 8, 2009.

Energy Efficient Web Server Cluster Andrew Krioukov, Sara Alspaugh, Laura Keys, David Culler, Randy Katz.

©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.

What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.

CS : Creating the Grid OS—A Computer Science Approach to Energy Problems David E. Culler, Randy H. Katz University of California, Berkeley August.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Presented by : Ran Koretzki. Basic Introduction What are VM’s ? What is migration ? What is Live migration ?

Towards Eco-friendly Database Management Systems W. Lang, J. M. Patel (U Wisconsin), CIDR 2009 Shimin Chen Big Data Reading Group.

Virtual AMT for Unified Management of Physical and Virtual Desktops Kenichi Kourai Kouki Oozono Kyushu Institute of Technology.

Cloud Data Center/Storage Power Efficiency Solutions Junyao Zhang 1.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.

Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

Low-Power Wireless Sensor Networks

Distributed File Systems

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.

Heavy and lightweight dynamic network services: challenges and experiments for designing intelligent solutions in evolvable next generation networks Laurent.

Thermal-aware Issues in Computers IMPACT Lab. Part A Overview of Thermal-related Technologies.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.

VGreen: A System for Energy Efficient Manager in Virtualized Environments G. Dhiman, G Marchetti, T Rosing ISLPED 2009.

11 CLUSTERING AND AVAILABILITY Chapter 11. Chapter 11: CLUSTERING AND AVAILABILITY2 OVERVIEW  Describe the clustering capabilities of Microsoft Windows.

VMware vSphere Configuration and Management v6

GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Cloud Computing – UNIT - II. VIRTUALIZATION Virtualization Hiding the reality The mantra of smart computing is to intelligently hide the reality Binary->

Running clusters on a Shoestring Fermilab SC 2007.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.

Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.

Unit 2 VIRTUALISATION. Unit 2 - Syllabus Basics of Virtualization Types of Virtualization Implementation Levels of Virtualization Virtualization Structures.

Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.

Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.

Running clusters on a Shoestring US Lattice QCD Fermilab SC 2007.

Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.

Green cloud computing 2 Cs 595 Lecture 15.

Architecture and Design of AlphaServer GS320

CS 286 Computer Organization and Architecture

Database Architectures and the Web

Bluetooth Based Smart Sensor Network

Zhen Xiao, Qi Chen, and Haipeng Luo May 2013

TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for Online Search Balajee Vamanan, Hamza Bin Sohail, Jahangir Hasan, and T. N. Vijaykumar.

Cooperative Caching, Simplified

Cloud Computing Architecture

Outline - Energy Management

Chih-Hsun Chou Daniel Wong Laxmi N. Bhuyan

Presentation transcript:

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by Shifting the I/O Burden to a Management Processor

| 3| 3 Datacenter energy concerns Direct-attached storage issues KnightShift solution IPMI Modifications to IPMI Trace description Results On-going work and conclusions Outline

| 2| 2 Datacenter energy costs are a key concern Common-case utilizations are very low, but not zero Servers are not energy efficient at low utilizations Consolidation and power-down are effective solutions Long wakeup latencies from shutdown/low power modes are being mitigated Except, Direct-attached storage (DAS) datacenters can not benefit from consolidation Datacenter Energy Concerns

| 4| 4 Direct-Attached Storage Architecture Data is distributed on disks attached to individual nodes Client requests arrive at a load balancer (1) Load balancer assigns the request to one node (2) Satisfying a request requires data from multiple nodes (3a) Each remote node gets the data request Remote nodes access their local disks (3b) Generate response to the requestor Requestor performs necessary computation on the consolidated data Sends a response to the client (4)

| 5| 5 Server Power under DAS Servers show lack of energy proportionality at low utilization Power at 10% utilization is (much) more than 10% of the power at peak utilization Energy proportionality is not just a CPU problem Memory, disks, fans are one major source of power consumption Motherboard components (voltage regulators, PCI slots) also consume power CPUs are in fact becoming more energy proportional Power scales to a limit using DVFS, clock gating,.. Achieving energy proportional server requires putting all motherboard components to sleep

| 6| 6 KnightShift as a Solution KnightShift: Handle remote I/O requests using low power subsystem Main server sleeps during low utilization while maintaining availability of data on the disks Low power subsystem is called the Knight Knight has the following properties Closely attached to the main server to access its disk data Electrically isolated from main server Capable of receiving, interpreting, servicing remote request Transparent to outside world

| 7| 7 Intelligent Platform Management Interface Intelligent Platform Management Interface (IPMI) is a widely- implemented standard for out-of-band server management Admins can remotely monitor server health with sensors, power on/off the server, install software At the core of IPMI is Baseboard Management Controller (BMC) BMC uses the same network interface as the primary system and even the same IP address Embedded CPU, flash memory, separate power rails

| 8| 8 IPMI as a Knight IPMI satisfies most properties of a Knight Electrically isolated transparently handles network packets However, it does not have access to the primary server disks Modify IPMI Modify IO Hub with 2-input mux which switches between primary and Knight as needed BMC must be able to handle disk access requests and be able to understand a few filesystems BMC is already highly capable and can do complex network packet filtering Knight capabilities further enhanced when BMC supports the same ISA

| 9| 9 Using Knight for System-level Power Saving Primary server memory turned off BMCs flash memory to use as I/O buffers Dirty disk data cached in primary memory drained to disk Knight can handle even non-I/O requests Requests with limited compute demands Support the same ISA IBM ASMA supports full ISA Knight best for handling stateless workloads Many e-commerce transactions are stateless Significantly increases primary server sleep time by turning off the entire server (except disks), not just any single component

| 10 Trace Based Evaluation Minute-granularity utilization traces from USC's production datacenter Compute, mail and NFS file server cluster In particular, clusters use DAS Detailed SAR traces collected for 9 days Servers underutilized as can be seen from the graph 10% CPU utilized for nearly 90% of the time

| 11 CPU Utilization vs. System Utilization CPU utilization is closely tied to overall system utilization (shown also in prior work (Fan2007) Figure shows CPU utilization on Y-axis and disk utilization on secondary Y-axis for SCF

| 12 Ideal Case Power Savings Derived power versus utilization for current servers from SpecWEB power benchmarks Assume power consumption in ideal servers scales quadratically with performance Ideal machine power at 1/10 utilization is 1/100 of the peak power Huge gap between current and ideal system power consumption

| 13 KnightShift Power Savings When trace shows CPU utilization < 10% assume Knight is ON Knight power is constant at 1/100 of primary server power When trace shows CPU utilization > 10% assume primary is ON Primary server power is proportional to utilization (based on current server data from SpecWEB) At wakeup primary consume 100% power Primary Server ON Knight ON

| 14 Power Savings vs Performance Degradation Response time grows when operating with Knight Assuming a range of Knight capabilities the response time increases to 11% of the original time Energy savings increase as Knight becomes more capable, giving more opportunities for the primary server to sleep

| 15 Conclusion Datacenter energy consumption is a serious concern Consolidating and powering down idle servers is an effective approach Does not work for direct-attached storage datacenters KnightShift uses IPMI based BMC as a low power subsystem to handle remote I/O Knight exploits IPMIs unique characteristics to handle remote I/O requests Trace based evaluation to study the current headroom Traces collected for 9 days from USC datacenter for several clusters Headroom studies show 2.5X improvement in energy consumption with Knight Going forward plan to use a mix of analytical (queuing) models and emulation based implementation of KnightShift