Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.

Similar presentations


Presentation on theme: "Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by."— Presentation transcript:

1 Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC http://usc.edu/dept/ee/scip KnightShift: Enhancing Energy Efficiency by Shifting the I/O Burden to a Management Processor

2 | 3| 3 Datacenter energy concerns Direct-attached storage issues KnightShift solution IPMI Modifications to IPMI Trace description Results On-going work and conclusions Outline

3 | 2| 2 Datacenter energy costs are a key concern Common-case utilizations are very low, but not zero Servers are not energy efficient at low utilizations Consolidation and power-down are effective solutions Long wakeup latencies from shutdown/low power modes are being mitigated Except, Direct-attached storage (DAS) datacenters can not benefit from consolidation Datacenter Energy Concerns

4 | 4| 4 Direct-Attached Storage Architecture Data is distributed on disks attached to individual nodes Client requests arrive at a load balancer (1) Load balancer assigns the request to one node (2) Satisfying a request requires data from multiple nodes (3a) Each remote node gets the data request Remote nodes access their local disks (3b) Generate response to the requestor Requestor performs necessary computation on the consolidated data Sends a response to the client (4)

5 | 5| 5 Server Power under DAS Servers show lack of energy proportionality at low utilization Power at 10% utilization is (much) more than 10% of the power at peak utilization Energy proportionality is not just a CPU problem Memory, disks, fans are one major source of power consumption Motherboard components (voltage regulators, PCI slots) also consume power CPUs are in fact becoming more energy proportional Power scales to a limit using DVFS, clock gating,.. Achieving energy proportional server requires putting all motherboard components to sleep

6 | 6| 6 KnightShift as a Solution KnightShift: Handle remote I/O requests using low power subsystem Main server sleeps during low utilization while maintaining availability of data on the disks Low power subsystem is called the Knight Knight has the following properties Closely attached to the main server to access its disk data Electrically isolated from main server Capable of receiving, interpreting, servicing remote request Transparent to outside world

7 | 7| 7 Intelligent Platform Management Interface Intelligent Platform Management Interface (IPMI) is a widely- implemented standard for out-of-band server management Admins can remotely monitor server health with sensors, power on/off the server, install software At the core of IPMI is Baseboard Management Controller (BMC) BMC uses the same network interface as the primary system and even the same IP address Embedded CPU, flash memory, separate power rails

8 | 8| 8 IPMI as a Knight IPMI satisfies most properties of a Knight Electrically isolated transparently handles network packets However, it does not have access to the primary server disks Modify IPMI Modify IO Hub with 2-input mux which switches between primary and Knight as needed BMC must be able to handle disk access requests and be able to understand a few filesystems BMC is already highly capable and can do complex network packet filtering Knight capabilities further enhanced when BMC supports the same ISA

9 | 9| 9 Using Knight for System-level Power Saving Primary server memory turned off BMCs flash memory to use as I/O buffers Dirty disk data cached in primary memory drained to disk Knight can handle even non-I/O requests Requests with limited compute demands Support the same ISA IBM ASMA supports full ISA Knight best for handling stateless workloads Many e-commerce transactions are stateless Significantly increases primary server sleep time by turning off the entire server (except disks), not just any single component

10 | 10 Trace Based Evaluation Minute-granularity utilization traces from USC's production datacenter Compute, mail and NFS file server cluster In particular, clusters use DAS Detailed SAR traces collected for 9 days Servers underutilized as can be seen from the graph 10% CPU utilized for nearly 90% of the time

11 | 11 CPU Utilization vs. System Utilization CPU utilization is closely tied to overall system utilization (shown also in prior work (Fan2007) Figure shows CPU utilization on Y-axis and disk utilization on secondary Y-axis for SCF

12 | 12 Ideal Case Power Savings Derived power versus utilization for current servers from SpecWEB power benchmarks Assume power consumption in ideal servers scales quadratically with performance Ideal machine power at 1/10 utilization is 1/100 of the peak power Huge gap between current and ideal system power consumption

13 | 13 KnightShift Power Savings When trace shows CPU utilization < 10% assume Knight is ON Knight power is constant at 1/100 of primary server power When trace shows CPU utilization > 10% assume primary is ON Primary server power is proportional to utilization (based on current server data from SpecWEB) At wakeup primary consume 100% power Primary Server ON Knight ON

14 | 14 Power Savings vs Performance Degradation Response time grows when operating with Knight Assuming a range of Knight capabilities the response time increases to 11% of the original time Energy savings increase as Knight becomes more capable, giving more opportunities for the primary server to sleep

15 | 15 Conclusion Datacenter energy consumption is a serious concern Consolidating and powering down idle servers is an effective approach Does not work for direct-attached storage datacenters KnightShift uses IPMI based BMC as a low power subsystem to handle remote I/O Knight exploits IPMIs unique characteristics to handle remote I/O requests Trace based evaluation to study the current headroom Traces collected for 9 days from USC datacenter for several clusters Headroom studies show 2.5X improvement in energy consumption with Knight Going forward plan to use a mix of analytical (queuing) models and emulation based implementation of KnightShift


Download ppt "Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by."

Similar presentations


Ads by Google