Reducing Energy Consumption of Disk Storage Using Power- Aware Cache Management Q. Zhu, F. David, C. Devaraj, Z. Li, Y. Zhou, P. Cao* University of Illinois.

Slides:

Advertisements

Similar presentations

Dissemination-based Data Delivery Using Broadcast Disks.

Advertisements

Reducing Energy Consumption of Disk Storage Using Power Aware Cache Management Qingbo Zhu, Francis M. David, Christo F. Deveraj, Zhenmin Li, Yuanyuan Zhou.

Conserving Disk Energy in Network Servers ACM 17th annual international conference on Supercomputing Presented by Hsu Hao Chen.

Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.

1 A Hybrid Adaptive Feedback Based Prefetcher Santhosh Verma, David Koppelman and Lu Peng Louisiana State University.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

1 Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems Brian Forney Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Wisconsin Network Disks University.

1 Conserving Energy in RAID Systems with Conventional Disks Dong Li, Jun Wang Dept. of Computer Science & Engineering University of Nebraska-Lincoln Peter.

Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.

RIMAC: Redundancy-based hierarchical I/O cache architecture for energy-efficient, high- performance storage systems Xiaoyu Yao and Jun Wang Computer Architecture.

SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.

Increasing the Cache Efficiency by Eliminating Noise Philip A. Marshall.

An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.

Dynamic Power Management for Systems with Multiple Power Saving States Sandy Irani, Sandeep Shukla, Rajesh Gupta.

Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.

On the Limits of Leakage Power Reduction in Caches Yan Meng, Tim Sherwood and Ryan Kastner UC, Santa Barbara HPCA-2005.

SAIU: An Efficient Cache Replacement Policy for Wireless On-demand Broadcasts Jianliang Xu, Qinglong Hu, Dik Lun Department of Computer Science in HK University.

Overview: Memory Memory Organization: General Issues (Hardware) –Objectives in Memory Design –Memory Types –Memory Hierarchies Memory Management (Software.

Proteus: Power Proportional Memory Cache Cluster in Data Centers Shen Li, Shiguang Wang, Fan Yang, Shaohan Hu, Fatemeh Saremi, Tarek Abdelzaher.

An Intelligent Cache System with Hardware Prefetching for High Performance Jung-Hoon Lee; Seh-woong Jeong; Shin-Dug Kim; Weems, C.C. IEEE Transactions.

Minimizing Cache Usage in Paging Alejandro Salinger University of Waterloo Joint work with Alex López-Ortiz.

1 Ekow J. Otoo Frank Olken Arie Shoshani Adaptive File Caching in Distributed Systems.

Storage Allocation in Prefetching Techniques of Web Caches D. Zeng, F. Wang, S. Ram Appeared in proceedings of ACM conference in Electronic commerce (EC’03)

PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,

Exploiting Flash for Energy Efficient Disk Arrays Shimin Chen (Intel Labs) Panos K. Chrysanthis (University of Pittsburgh) Alexandros Labrinidis (University.

1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.

OPTIMAL SERVER PROVISIONING AND FREQUENCY ADJUSTMENT IN SERVER CLUSTERS Presented by: Xinying Zheng 09/13/ XINYING ZHENG, YU CAI MICHIGAN TECHNOLOGICAL.

© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

Minimizing Cache Usage in Paging Alejandro López-Ortiz, Alejandro Salinger University of Waterloo.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

10/18: Lecture topics Memory Hierarchy –Why it works: Locality –Levels in the hierarchy Cache access –Mapping strategies Cache performance Replacement.

COT 4600 Operating Systems Fall 2009 Dan C. Marinescu Office: HEC 304 Office hours: Tu-Th 3:00-4:00 PM.

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

CUHK Learning-Based Power Management for Multi-Core Processors YE Rong Nov 15, 2011.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

MIAO ZHOU, YU DU, BRUCE CHILDERS, RAMI MELHEM, DANIEL MOSSÉ UNIVERSITY OF PITTSBURGH Writeback-Aware Bandwidth Partitioning for Multi-core Systems with.

Computer Organization CS224 Fall 2012 Lessons 41 & 42.

Best Available Technologies: External Storage Overview of Opportunities and Impacts November 18, 2015.

Energy Efficient Prefetching and Caching Athanasios E. Papathanasiou and Michael L. Scott. University of Rochester Proceedings of 2004 USENIX Annual Technical.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

Project Summary Fair and High Throughput Cache Partitioning Scheme for CMPs Shibdas Bandyopadhyay Dept of CISE University of Florida.

Spin-down Disk Model Not Spinning Spinning & Ready Spinning & Access Spinning & Seek Spinning up Spinning down Inactivity Timeout threshold* Request Trigger:

Sunpyo Hong, Hyesoon Kim

Video Caching in Radio Access network: Impact on Delay and Capacity

Jiahao Chen, Yuhui Deng, Zhan Huang 1 ICA3PP2015: The 15th International Conference on Algorithms and Architectures for Parallel Processing. zhangjiajie,

1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.

Dynamic Power Management Using Online Learning Gaurav Dhiman, Tajana Simunic Rosing (CSE-UCSD) Existing DPM policies do not adapt optimally with changing.

Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.

Performance directed energy management using BOS technique

Green cloud computing 2 Cs 595 Lecture 15.

Less is More: Leveraging Belady’s Algorithm with Demand-based Learning

18742 Parallel Computer Architecture Caching in Multi-core Systems

System Control based Renewable Energy Resources in Smart Grid Consumer

Energy-Efficient Address Translation

Fine-Grain CAM-Tag Cache Resizing Using Miss Tags

CARP: Compression Aware Replacement Policies

Predictive Performance

Performance metrics for caches

Performance metrics for caches

CARP: Compression-Aware Replacement Policies

Adapted from slides by Sally McKee Cornell University

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Performance metrics for caches

CS 3410, Spring 2014 Computer Science Cornell University

Page Cache and Page Writeback

Performance metrics for caches

Presentation transcript:

Reducing Energy Consumption of Disk Storage Using Power- Aware Cache Management Q. Zhu, F. David, C. Devaraj, Z. Li, Y. Zhou, P. Cao* University of Illinois at Urbana Champaign & *Cisco Systems Inc. HPCA ‘04 Presented by: Justin Kliger & Scott Schneider

Motivation Reduce Energy Consumption Targeting large data center such as EMC Symmetrix (right) TBytes 128 GB of non-volatile memory cache

Motivation Dealing with large data storage and large caches separated from application servers

Motivation Consume huge amounts of power: Currently consuming W/ft 2 Expect an increase up to 25% annually Storage devices already account for 27% of power consumption at data center Significance of reducing energy consumption: Can limit costs to these data centers Keeps energy costs from becoming prohibitive and preventing data center expansion Positive environmental impacts

Motivation Focusing on cache replacement algorithm, conserve energy by changing the average idle time of disks Create Power-Aware algorithm Designate priority disks to allow some disks to greatly increase idle time Selectively keep blocks in cache from priority disks to decrease power usage

Outline Background for Disk Power Model Off-line analysis Online algorithm Evaluation & Results Write Policies Conclusion, Impact & Further Research

Disk Power Model Conventional disks have three states Active and Idle consume full power Standby consumes less power, but requires a spin up to satisfy a request Gurumuthi et al. proposed multi-speed disks Lower rotational speeds consume less energy Transition from lower speed to next higher speed is smaller than switching from standby to active

Disk Power Model Their disk model uses these proposed multi- speed disks Multi-speed disks can be configured to service requests at all speeds or only the highest speed Only servicing requests at highest speed makes the disks essentially multi-state disks, as opposed to two-state disks Their model uses 4 intermediate lower power modes

Disk Power Management Oracle disk power management (DPM) Term for when entire sequence is known ahead of time, perfect power management is possible Provides upper bound on energy saved Upon request completion Oracle DPM examines interval length t between requests If t is greater than break-even time, spin disk down immediately

Disk Power Management The minimum energy consumption is the curve that intersects with the consumption line for each state Online algorithms uses these crossover points as thresholds

DPM and Cache Replacement Disk

DPM and Cache Replacement Disk DPM

DPM and Cache Replacement Disk DPM Cache Replacement Policy

Power-Aware Off-line algorithms Optimal cache-hit algorithm (Belady’s) can be suboptimal for power-consumption [Figure 3: An example showing Belady’s algorithm is not energy-optimal] 6 misses; 24 time units at high energy consumption

Power-Aware Off-line algorithms Optimal cache-hit algorithm (Belady’s) can be suboptimal for power-consumption [Figure 3: An example showing Belady’s algorithm is not energy-optimal] 7 misses; 16 time units at high energy consumption

Power-Aware Off-line Algorithms Energy Optimal Algorithm Developed polynomial-time algorithm with dynamic programming, but not applicable to aiding online algorithm, details not included Off-line Power-Aware Greedy Algorithm (OPG) More realistic, used as comparison in traces Considers future deterministic misses

Power-Aware Off-line Algorithms OPG: Evicts blocks with minimum energy penalty: OL(L i ) + OL(F i ) – OL(L i + F i ) = Practical DPM LE(L i ) + LE(F i ) – LE(L i + F i ) = Oracle DPM Time complexity is O(n 2 ) Heuristic because only considers the current set of deterministic misses

Power-Aware Online Algorithm Insight gained from off-line analysis: avoid evicting blocks with large energy penalties Small increases in idle time can have big energy gains

Online Approach Online algorithm goals Use the cache replacement policy to reshape each disk’s access pattern Give priority to blocks from inactive disks to increase average idle time Allow average interval time of some disks to decrease so that interval time of already idle disks can increase Overall energy consumption is reduced

Other Online Factors Potential energy savings are also determined by Percentage of capacity misses must be high in a workload; cold misses can’t be avoided Distribution of accesses determine actual interval lengths; larger deviation from the mean has more opportunity for savings An online algorithm needs to identify these properties for each disk to make good decisions

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k Example, m = 13, k = h 1 ()h 2 ()h 3 ()h 4 ()

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k Example, m = 13, k = h1(a)h1(a)h2(a)h2(a)h3(a)h3(a)h4(a)h4(a) a

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k Example, m = 13, k = h1(a)h1(a)h2(a)h2(a)h3(a)h3(a)h4(a)h4(a) a

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k Example, m = 13, k = h1(a)h1(a)h2(a)h2(a)h3(a)h3(a)h4(a)h4(a) a

Tracking Cold Misses Use a Bloom Filter to track cold misses allocate a vector v of m bits k independent hash functions, h 1, h 2,..., h k Identifying a cold miss is always correct False positives are possible for non-cold misses For 1.6M blocks with v set to 2M and k = 7, false positives happen 0.82% of the time

Distribution Estimate Disk access distribution is estimated using an epoch-based histogram technique We use an approximation method to estimate the cumulative distribution function of interval length of a disk, F(x) = P[X ≤ x]

Distribution Estimate In each epoch Track interval length between consecutive disk accesses Each interval falls into a discrete range The sum of each range and all ranges above it approximates the cumulative probability below that range

Power Aware Cache Management Dynamically track cold misses and cumulative distribution of interval lengths for each disk Classify disks, each epoch, as Priority disks have a “small” percentage of cold misses and large interval lengths with a “high” probability Regular disks are the rest

Power Aware Cache Management The basic idea: reshape the access pattern to keep priority disks idle PA can be combined with other algorithms such as LIRS, ARC, MQ, etc.

Example: PA-LRU PA-LRU employs two LRU stacks LRU0 keeps regular disk blocks LRU1 keeps priority disk blocks Blocks are evicted from bottom of LRU0 first, then bottom of LRU1 Parameters α is the cold misses threshold p is the cumulative probability β is the CDF threshold epoch length

Overall Design Disk DPM Cache Replacement Policy

Overall Design Disk DPM PA Classification EngineCache Replacement Algorithm

Evaluation Specifications of disk: Additionally, 4 more low-speed power modes: 12000RPM, 9000RPM, 6000RPM, 3000RPM

Evaluation Two System traces: OLTP & Cello96 Cache Size: For OLTP = 128 MBytes For Cello96 = 32 MBytes

Evaluation Compared 4 algorithms: Belady’s, OPG, LRU, PA-LRU Also measured disk energy consumption with infinite cache size – provides lower bound (only cold misses access disk) Practical DPM use thresholds identified earlier as competitive with Oracle PA-LRU uses 4 parameters: Epoch length = 15 minutes, α = 50%, p = 80%, β = 5 seconds

Results PA-LRU can save 16% more energy than LRU on OLTP trace However, PA-LRU only saves 2-3% more energy than LRU for Cello96 trace

Results How PA-LRU improves performance: = PA-LRU

Write Policies Four write policies WB: Write-Back only writes dirty blocks upon eviction WT: Write-Through writes dirty blocks immediately WBEU: Write-Back with Eager Updates writes a block immediately if that disk is active; otherwise it waits WTDU: Write-Through with Deferred Update writes dirty blocks to a log if the target disk is in a low power mode

Write Policies Evaluation Synthetic traces that varied write/read ratios and the interarrival time WB vs. WT Write-Back consistently better up to 20% WBEU vs. WT WBEU consistently better up to 65% WTDU vs. WT WTDU consistently better up to 55%

Conclusion Effective analysis for off-line algorithm (OPG) Designed and evaluated Power-Aware online algorithm (PA-LRU), which can use 16% less energy than LRU Considered write policy effects on energy savings

Impact of work Theoretical off-line analysis of power-aware caching policies Identification of requirements for an online power-aware caching algorithm Published in 2004; 3 self citations plus Pinheiro, Bianchini ICS 04 and Papathanasiou, Scott USENIX 04

Further Research Reduce # of parameters (PB-LRU) Online algorithm applied to single disks Prefetching Consider storage cache energy consumption Evaluate in real storage system context