LRFU (Least Recently/Frequently Used) Block Replacement Policy

Slides:

Advertisements

Similar presentations

Destage Algorithms for Disk Arrays with Nonvolatile Caches IEEE TRANSACTIONS ON COMPUTERS, VOL. 47, NO. 2, JANUARY 1998 Anujan Varma, Member, IEEE, and.

Advertisements

ARC: A self-tuning, low overhead Replacement Cache

LRU-K Page Replacement Algorithm

Song Jiang1 and Xiaodong Zhang1,2 1College of William and Mary

Seoul National University Archi & Network LAB LRFU (Least Recently/Frequently Used) Block Replacement Policy Sang Lyul Min Dept. of Computer Engineering.

The Performance Impact of Kernel Prefetching on Buffer Cache Replacement Algorithms (ACM SIGMETRIC 05 ) ACM International Conference on Measurement & Modeling.

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

ARC: A SELF-TUNING, LOW OVERHEAD REPLACEMENT CACHE

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.

1 Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers By Sreemukha Kandlakunta Phani Shashank.

Outperforming LRU with an Adaptive Replacement Cache Algorithm Nimrod megiddo Dharmendra S. Modha IBM Almaden Research Center.

Application-Controlled File Caching Policies Pei Cao, Edward W. Felten and Kai Li Presented By: Mazen Daaibes Gurpreet Mavi ECE 7995 Presentation.

Mendel Rosenblum and John K. Ousterhout Presented by Travis Bale 1.

Kernel memory allocation

Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.

Cache Memory By JIA HUANG. "Computer Science has only three ideas: cache, hash, trash.“ - Greg Ganger, CMU.

Lecture 17 I/O Optimization. Disk Organization Tracks: concentric rings around disk surface Sectors: arc of track, minimum unit of transfer Cylinder:

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University.

FALL 2006CENG 351 Data Management and File Structures1 External Sorting.

U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.

ECE7995 Caching and Prefetching Techniques in Computer Systems Lecture 8: Buffer Cache in Main Memory (IV)

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

1 Memory Management in Representative Operating Systems.

By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.

1 Design and Performance of a Web Server Accelerator Eric Levy-Abegnoli, Arun Iyengar, Junehwa Song, and Daniel Dias INFOCOM ‘99.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

THE DESIGN AND IMPLEMENTATION OF A LOG-STRUCTURED FILE SYSTEM M. Rosenblum and J. K. Ousterhout University of California, Berkeley.

Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.

Introduction to DFS. Distributed File Systems A file system whose clients, servers and storage devices are dispersed among the machines of a distributed.

Computer Architecture Memory organization. Types of Memory Cache Memory Serves as a buffer for frequently accessed data Small  High Cost RAM (Main Memory)

An Effective Disk Caching Algorithm in Data Grid Why Disk Caching in Data Grids?  It takes a long latency (up to several minutes) to load data from a.

System Software Lab 1 Enhancement and Validation of Squid ’ s Cache Replacement Policy John Delley Martin Arlitt Stephane Perret WCW99 김 재 섭 EECS System.

Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Evaluating Content Management Techniques for Web Proxy Caches Martin Arlitt, Ludmila Cherkasova, John Dilley, Rich Friedrich and Tai Jin Proceeding on.

Project Presentation By: Dean Morrison 12/6/2006 Dynamically Adaptive Prepaging for Effective Virtual Memory Management.

Memory Management & Virtual Memory © Dr. Aiman Hanna Department of Computer Science Concordia University Montreal, Canada.

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

An Overview of Proxy Caching Algorithms Haifeng Wang.

Evaluating Content Management Technique for Web Proxy Cache M. Arlitt, L. Cherkasova, J. Dilley, R. Friedrich and T. Jin MinSu Shin.

Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.

Embedded System Lab. 정영진 The Design and Implementation of a Log-Structured File System Mendel Rosenblum and John K. Ousterhout ACM Transactions.

LIRS: Low Inter-reference Recency Set Replacement for VM and Buffer Caches Xiaodong Zhang College of William and Mary.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Informed Prefetching and Caching R. Hugo Patterson, Garth A. Gibson, Eka Ginting, Daniel Stodolsky, Jim Zelenka.

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of.

CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

Virtual memory.

Jonathan Walpole Computer Science Portland State University

Memory Management 6/20/ :27 PM

Cache Memory Presentation I

Adaptive Cache Replacement Policy

ECE7995 Caching and Prefetching Techniques in Computer Systems

ECE-752 Zheng Zheng, Anuj Gadiyar

Memory Management & Virtual Memory

Cooperative Caching, Simplified

Qingbo Zhu, Asim Shankar and Yuanyuan Zhou

Lecture 14: Large Cache Design II

Persistence: hard disk drive

CSC3050 – Computer Architecture

CS533 Concepts of Operating Systems Class 18

CENG 351 Data Management and File Structures

Virtual Memory: Working Sets

Cache - Optimization.

Garbage Collection Advantage: Improving Program Locality

ARC (Adaptive Replacement Cache)

CSE 542: Operating Systems

The Design and Implementation of a Log-Structured File System

Presentation transcript:

LRFU (Least Recently/Frequently Used) Block Replacement Policy Sang Lyul Min Dept. of Computer Engineering Seoul National University

Processor - Disk Speed Gap Why file cache? Processor - Disk Speed Gap 1950’s 1990’s Processor - IBM 701 17,000 ins/sec Disk - IBM 305 RAMAC Density - 0.002 Mbits/sq. in Average seek time - 500 ms Processor - IBM PowerPC 603e 350,000,000 ins/sec Disk - IBM Deskstar 5 Density - 1,319 Mbits/sq. in Average seek time - 10 ms x 20,000 x 600,000 x 50

File Cache processor main memory disk controller disks file cache or buffer cache disk cache

Operating System 101 LRU Replacement LFU Replacement heap LRU Block MRU Block LFU Block MFU Block New reference New reference O(1) complexity O(log n) complexity O(n) complexity

Operating System 101 LRU LFU Advantage Disadvantage Advantage High Adaptability Disadvantage Short sighted LFU Advantage Long sighted Disadvantage Cache pollution

Motivation Cache size = 20 blocks

Cache size = 60 blocks

Cache size = 100 blocks

Cache size = 200 blocks

Cache size = 300 blocks

Cache size = 500 blocks

Observation Both recency and frequency affect the likelihood of future references The relative impact of each is largely determined by cache size

Goal A replacement algorithm that allows a flexible trade-off between recency and frequency

Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them

CRF (Combined Recency and Frequency) Value Current time tc 1 2 3 time t1 t2 t3 Ctc(b) = F(1) + F(2) + F(3) || || || tc - t1 tc - t2 tc - t3

CRF (Combined Recency and Frequency) Value Estimate of how likely a block will be referenced in the future Every reference to a block contributes to the CRF value of the block A reference’s contribution is determined by weighing function F(x)

Hints and Constraints on F(x) should be monotonically decreasing should subsume LRU and LFU should allow efficient implementation

Conditions for LRU and LFU LRU Condition If F(x) satisfies the following condition, then the LRFU algorithm becomes the LRU algorithm LFU Condition If F(x) = c, then the LRFU algorithm becomes the LFU algorithm current time i block a: x block b: x x x x x x x x i+1 i+2      i+3

Weighing function F(x) F(x) = ()x Meaning: a reference’s contribution to the target block’s CRF value is halved after every 1/ 

Properties of F(x) = ()x Property 1 When  = 0, (i.e., F(x) = 1), then it becomes LFU When  = 1, (i.e., F(x) = ()x ), then it becomes LRU When 0 <  < 1, it is between LFU and LRU F(x) = ()x (LRU extreme) Spectrum (LRU/LFU) 1 F(x) = 1 (LFU extreme) F(x) X current time - reference time

Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them

Difficulties of Naive Implementation Enormous space overheads Information about the time of every reference to each block Enormous time overheads Computation of the CRF value of every block at each time

Update of CRF value over time  = (t2 - t1)  1  2  3 C t2(b) = F (1+) + F (2+) + F (3+) = ()(1+ ) + () (2+ ) + () (3+ ) = (()1 + ()2 + ()3 ) () = C t1(b) x F ()

Properties of F(x) = ()x Property 2 With F(x) = ()x, Ctk(b) can be computed from Ctk-1(b) as follows Ctk(b) = Ctk-1(b) F () + F (0) || tk - tk-1 Implications: Only two variables are required for each block for maintaining the CRF value One for the time of the last reference The other for the CRF value at that time

Difficulties of Naive Implementation Enormous space overheads Information about the time of every reference to each block Enormous time overheads Computation of the CRF value of every block at each time

Properties of F(x) = ()x Property 3 If Ct(a) > Ct(b) and neither a or b is referenced after t, then Ct'(a) > Ct'(b) for all t' > t Why? Ct'(a) = Ct(a) F() > Ct(b) F() = Ct'(b) (since F() > 0 ) Implications Reordering of blocks is needed only upon a block reference Heap data structure can be used to maintain the ordering of blocks with O(log n) time complexity

Optimized Implementation Blocks that can compete with a currently referenced block

Optimized Implementation Reference to a new block Reference to a block in the heap Reference to a block in the linked list linked list heap linked list heap linked list heap 1. replaced referenced block 2. promoted 2. demoted 1. demoted 3. new block 3. heap restored 4. heap restored referenced block 1. heap restored

Question What is the maximum number of blocks that can potentially compete with a currently referenced block?

time block a: x block b: x x x x x • • • dthreshold dthreshold +1 • • • block a: x block b: x x x x x time • • • + F(d threshold +2) + F(d threshold +1) + F(d threshold ) < F(0) dthreshold dthreshold +1 dthreshold +2 current

Properties of F(x) = ()x Property 4 : log  (1- ())  dthreshold = When   0 When   1 =  = 1 Archi & Network LAB Seoul National University

Optimized implementation (Cont’d) linked list heap (single element) LRU extreme LFU extreme linked list (null) O(log n) O(1) Archi & Network LAB Seoul National University

Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them

Correlated References

LRFU with correlated references Masking function Gc(x) C'tk(b), CRF value when correlated references are considered, can be derived from C'tk-1(b) C'tk(b) = F(tk - tk) + F(tk - ti )*Gc(ti+1 - ti ) = F( tk - tk-1) * [F(0) * Gc( tk - tk-1) + C'tk-1(b) - F(0)] + F(0) Archi & Network LAB Seoul National University

Trace-driven simulation Sprite client trace Collection of block references from a Sprite client contains 203,808 references to 4,822 unique blocks DB2 trace Collection of block references from a DB2 installation Contains 500,000 references to 75,514 unique blocks Archi & Network LAB Seoul National University

Effects of  on the performance Hit Rate (a) Sprite client  X (b) DB2  Hit Rate X Archi & Network LAB Seoul National University

Combined effects of  and correlated period Hit Rate Correlated Period  (a) Sprite client Hit Rate Correlated Period  (b) DB2 Archi & Network LAB Seoul National University

Previous works FBR (Frequency-Based Replacement) algorithm Introduces correlated reference concept LRU-K algorithm Replaces blocks based on time of the K’th-to-last non-correlated references Discriminates well the frequently and infrequently used blocks Problems Ignores the K-1 references linear space complexity to keep the last K reference times 2Q and sLRU algorithms Use two queues or two segments Move only the hot blocks to the main part of the disk cache Work very well for “used-only-once” blocks Archi & Network LAB Seoul National University

Comparison of the LRFU policy with other policies Hit Rate Cache Size (# of blocks) (a) Sprite client Hit Rate Cache Size (# of blocks) (b) DB2 Archi & Network LAB Seoul National University

Implementation of the LRFU algorithm Buffer cache of the FreeBSD 3.0 operating system Benchmark: SPEC SDET benchmark Simulates a multi-programming environment consists of concurrent shell scripts each with about 150 UNIX commands gives results in scripts / hour Archi & Network LAB Seoul National University

SDET benchmark results Hit rate SDET Throughput (scripts/ hour) Hit Rate  Archi & Network LAB Seoul National University

Conclusions LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them

Future Research Dynamic version of the LRFU algorithm LRFU algorithm for heterogeneous workloads File requests vs. VM requests Disk block requests vs. Parity block requests (RAID) Requests to different files (index files, data files)

People REAL PEOPLE (Graduate students) Lee, Donghee Choi, Jongmoo Kim, Jong-Hun Guides (Professors) Noh, Sam H. Min, Sang Lyul Cho, Yookun Kim, Chong Sang http://archi.snu.ac.kr/symin/

Adaptive LRFU policy Adjust  periodically depending on the evolution of workload Use the LRU policy as the reference model to quantify how good (or bad) the locality of the workload has been Algorithm of the Adaptive LRFU policy if ( > )  value for period i+1 is updated in the same direction else the direction is reversed Archi & Network LAB Seoul National University

Results of the Adaptive LRFU Client Workstation 54 DB2 Archi & Network LAB Seoul National University