LRFU (Least Recently/Frequently Used) Block Replacement Policy Sang Lyul Min Dept. of Computer Engineering Seoul National University
Processor - Disk Speed Gap Why file cache? Processor - Disk Speed Gap 1950’s 1990’s Processor - IBM 701 17,000 ins/sec Disk - IBM 305 RAMAC Density - 0.002 Mbits/sq. in Average seek time - 500 ms Processor - IBM PowerPC 603e 350,000,000 ins/sec Disk - IBM Deskstar 5 Density - 1,319 Mbits/sq. in Average seek time - 10 ms x 20,000 x 600,000 x 50
File Cache processor main memory disk controller disks file cache or buffer cache disk cache
Operating System 101 LRU Replacement LFU Replacement heap LRU Block MRU Block LFU Block MFU Block New reference New reference O(1) complexity O(log n) complexity O(n) complexity
Operating System 101 LRU LFU Advantage Disadvantage Advantage High Adaptability Disadvantage Short sighted LFU Advantage Long sighted Disadvantage Cache pollution
Motivation Cache size = 20 blocks
Cache size = 60 blocks
Cache size = 100 blocks
Cache size = 200 blocks
Cache size = 300 blocks
Cache size = 500 blocks
Observation Both recency and frequency affect the likelihood of future references The relative impact of each is largely determined by cache size
Goal A replacement algorithm that allows a flexible trade-off between recency and frequency
Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them
CRF (Combined Recency and Frequency) Value Current time tc 1 2 3 time t1 t2 t3 Ctc(b) = F(1) + F(2) + F(3) || || || tc - t1 tc - t2 tc - t3
CRF (Combined Recency and Frequency) Value Estimate of how likely a block will be referenced in the future Every reference to a block contributes to the CRF value of the block A reference’s contribution is determined by weighing function F(x)
Hints and Constraints on F(x) should be monotonically decreasing should subsume LRU and LFU should allow efficient implementation
Conditions for LRU and LFU LRU Condition If F(x) satisfies the following condition, then the LRFU algorithm becomes the LRU algorithm LFU Condition If F(x) = c, then the LRFU algorithm becomes the LFU algorithm current time i block a: x block b: x x x x x x x x i+1 i+2 i+3
Weighing function F(x) F(x) = ()x Meaning: a reference’s contribution to the target block’s CRF value is halved after every 1/
Properties of F(x) = ()x Property 1 When = 0, (i.e., F(x) = 1), then it becomes LFU When = 1, (i.e., F(x) = ()x ), then it becomes LRU When 0 < < 1, it is between LFU and LRU F(x) = ()x (LRU extreme) Spectrum (LRU/LFU) 1 F(x) = 1 (LFU extreme) F(x) X current time - reference time
Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them
Difficulties of Naive Implementation Enormous space overheads Information about the time of every reference to each block Enormous time overheads Computation of the CRF value of every block at each time
Update of CRF value over time = (t2 - t1) 1 2 3 C t2(b) = F (1+) + F (2+) + F (3+) = ()(1+ ) + () (2+ ) + () (3+ ) = (()1 + ()2 + ()3 ) () = C t1(b) x F ()
Properties of F(x) = ()x Property 2 With F(x) = ()x, Ctk(b) can be computed from Ctk-1(b) as follows Ctk(b) = Ctk-1(b) F () + F (0) || tk - tk-1 Implications: Only two variables are required for each block for maintaining the CRF value One for the time of the last reference The other for the CRF value at that time
Difficulties of Naive Implementation Enormous space overheads Information about the time of every reference to each block Enormous time overheads Computation of the CRF value of every block at each time
Properties of F(x) = ()x Property 3 If Ct(a) > Ct(b) and neither a or b is referenced after t, then Ct'(a) > Ct'(b) for all t' > t Why? Ct'(a) = Ct(a) F() > Ct(b) F() = Ct'(b) (since F() > 0 ) Implications Reordering of blocks is needed only upon a block reference Heap data structure can be used to maintain the ordering of blocks with O(log n) time complexity
Optimized Implementation Blocks that can compete with a currently referenced block
Optimized Implementation Reference to a new block Reference to a block in the heap Reference to a block in the linked list linked list heap linked list heap linked list heap 1. replaced referenced block 2. promoted 2. demoted 1. demoted 3. new block 3. heap restored 4. heap restored referenced block 1. heap restored
Question What is the maximum number of blocks that can potentially compete with a currently referenced block?
time block a: x block b: x x x x x • • • dthreshold dthreshold +1 • • • block a: x block b: x x x x x time • • • + F(d threshold +2) + F(d threshold +1) + F(d threshold ) < F(0) dthreshold dthreshold +1 dthreshold +2 current
Properties of F(x) = ()x Property 4 : log (1- ()) dthreshold = When 0 When 1 = = 1 Archi & Network LAB Seoul National University
Optimized implementation (Cont’d) linked list heap (single element) LRU extreme LFU extreme linked list (null) O(log n) O(1) Archi & Network LAB Seoul National University
Results LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them
Correlated References
LRFU with correlated references Masking function Gc(x) C'tk(b), CRF value when correlated references are considered, can be derived from C'tk-1(b) C'tk(b) = F(tk - tk) + F(tk - ti )*Gc(ti+1 - ti ) = F( tk - tk-1) * [F(0) * Gc( tk - tk-1) + C'tk-1(b) - F(0)] + F(0) Archi & Network LAB Seoul National University
Trace-driven simulation Sprite client trace Collection of block references from a Sprite client contains 203,808 references to 4,822 unique blocks DB2 trace Collection of block references from a DB2 installation Contains 500,000 references to 75,514 unique blocks Archi & Network LAB Seoul National University
Effects of on the performance Hit Rate (a) Sprite client X (b) DB2 Hit Rate X Archi & Network LAB Seoul National University
Combined effects of and correlated period Hit Rate Correlated Period (a) Sprite client Hit Rate Correlated Period (b) DB2 Archi & Network LAB Seoul National University
Previous works FBR (Frequency-Based Replacement) algorithm Introduces correlated reference concept LRU-K algorithm Replaces blocks based on time of the K’th-to-last non-correlated references Discriminates well the frequently and infrequently used blocks Problems Ignores the K-1 references linear space complexity to keep the last K reference times 2Q and sLRU algorithms Use two queues or two segments Move only the hot blocks to the main part of the disk cache Work very well for “used-only-once” blocks Archi & Network LAB Seoul National University
Comparison of the LRFU policy with other policies Hit Rate Cache Size (# of blocks) (a) Sprite client Hit Rate Cache Size (# of blocks) (b) DB2 Archi & Network LAB Seoul National University
Implementation of the LRFU algorithm Buffer cache of the FreeBSD 3.0 operating system Benchmark: SPEC SDET benchmark Simulates a multi-programming environment consists of concurrent shell scripts each with about 150 UNIX commands gives results in scripts / hour Archi & Network LAB Seoul National University
SDET benchmark results Hit rate SDET Throughput (scripts/ hour) Hit Rate Archi & Network LAB Seoul National University
Conclusions LRFU (Least Recently/Frequently Used) Replacement Algorithm that (1) subsumes both the LRU and LFU algorithms (2) subsumes their implementations (3) yields better performance than them
Future Research Dynamic version of the LRFU algorithm LRFU algorithm for heterogeneous workloads File requests vs. VM requests Disk block requests vs. Parity block requests (RAID) Requests to different files (index files, data files)
People REAL PEOPLE (Graduate students) Lee, Donghee Choi, Jongmoo Kim, Jong-Hun Guides (Professors) Noh, Sam H. Min, Sang Lyul Cho, Yookun Kim, Chong Sang http://archi.snu.ac.kr/symin/
Adaptive LRFU policy Adjust periodically depending on the evolution of workload Use the LRU policy as the reference model to quantify how good (or bad) the locality of the workload has been Algorithm of the Adaptive LRFU policy if ( > ) value for period i+1 is updated in the same direction else the direction is reversed Archi & Network LAB Seoul National University
Results of the Adaptive LRFU Client Workstation 54 DB2 Archi & Network LAB Seoul National University