LHD: Improving Cache Hit Rate By Maximizing Hit Density

Slides:

Advertisements

Similar presentations

SE-292 High Performance Computing

Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

SE-292 High Performance Computing Memory Hierarchy R. Govindarajan

1 Cache and Caching David Sands CS 147 Spring 08 Dr. Sin-Min Lee.

A KTEC Center of Excellence 1 Cooperative Caching for Chip Multiprocessors Jichuan Chang and Gurindar S. Sohi University of Wisconsin-Madison.

Qinqing Gan Torsten Suel Improved Techniques for Result Caching in Web Search Engines Presenter: Arghyadip ● Konark.

Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.

1 Chapter Seven Large and Fast: Exploiting Memory Hierarchy.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

Improving Proxy Cache Performance: Analysis of Three Replacement Policies John Dilley and Martin Arlitt IEEE internet computing volume3 Nov-Dec 1999 Chun-Fu.

1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.

Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

1 SRAM: –value is stored on a pair of inverting gates –very fast but takes up more space than DRAM (4 to 6 transistors) DRAM: –value is stored as a charge.

1 CSE SUNY New Paltz Chapter Seven Exploiting Memory Hierarchy.

Lecture 33: Chapter 5 Today’s topic –Cache Replacement Algorithms –Multi-level Caches –Virtual Memories 1.

Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.

Virtual Memory.

Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.

Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

Proposed Work 1. Client-Server Synchronization Proposed Work 2.

Lecture Topics: 11/17 Page tables TLBs Virtual memory flat page tables

Qingqing Gan Torsten Suel CSE Department Polytechnic Institute of NYU Improved Techniques for Result Caching in Web Search Engines.

Design and Analysis of Advanced Replacement Policies for WWW Caching Kai Cheng, Yusuke Yokota, Yahiko Kambayashi Department of Social Informatics Graduate.

Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD

The Design and Implementation of Log-Structure File System M. Rosenblum and J. Ousterhout.

Exploiting Gray-Box Knowledge of Buffer Cache Management Nathan C. Burnett, John Bent, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of.

Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.

Multicache-Based Content Management for Web Caching Kai Cheng and Yahiko Kambayashi Graduate School of Informatics, Kyoto University Kyoto JAPAN.

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.

CPS110: Page replacement Landon Cox. Replacement  Think of physical memory as a cache  What happens on a cache miss?  Page fault  Must decide what.

1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.

Transforming Policies into Mechanisms with Infokernel Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Nathan C. Burnett, Timothy E. Denehy, Thomas J.

Operating Systems ECE344 Ding Yuan Page Replacement Lecture 9: Page Replacement.

SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.

SketchVisor: Robust Network Measurement for Software Packet Processing

Popularity Prediction of Facebook Videos for Higher Quality Streaming

CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.

CMSC 611: Advanced Computer Architecture

Virtual memory.

Memshare: a Dynamic Multi-tenant Key-value Cache

Chapter 21 Virtual Memoey: Policies

CSE-291 Cloud Computing, Fall 2016 Kesden

Analytics and OR DP- summary.

Cache Memory Presentation I

Load Balancing Memcached Traffic Using SDN

CSE 120 Principles of Operating

Lecture 14 Virtual Memory and the Alpha Memory Hierarchy

EECE.4810/EECE.5730 Operating Systems

Optimal Elephant Flow Detection Presented by: Gil Einziger,

Shared Memory Multiprocessors

Distributed Systems CS

\course\cpeg324-05F\Topic7c

CSE451 - Section 10.

Feifei Li, Ching Chang, George Kollios, Azer Bestavros

Contents Memory types & memory hierarchy Virtual memory (VM)

CS510 - Portland State University

Computer Architecture

LHD: Improving Cache Hit Rate by Maximizing Hit Density

Last section! Project 4 + EC due tomorrow Today: Project 4 questions

Virtual Memory: Working Sets

Cache - Optimization.

CSE451 - Section 10.

CSE 153 Design of Operating Systems Winter 19

Fast Accesses to Big Data in Memory and Storage Systems

CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #03 Row/Column Stores, Heap Files, Buffer Manager, Catalogs Instructor: Chen Li.

Nathan Beckmann, CMU ML for

Presentation transcript:

LHD: Improving Cache Hit Rate By Maximizing Hit Density Nathan Beckmann CMU USENIX NSDI 2018 Haoxian Chen U. Penn Asaf Cidon Stanford & Barracuda Networks

Key-value cache is 100X faster than database Web Server 10 ms 100 µs

Key-value cache hit rate determines web application performance At 98% cache hit rate: +1% hit rate  35% speedup Old latency: 374 µs New latency: 278 µs Facebook study [Atikoglu, Sigmetrics ’12] Even small hit rate improvements cause significant speedup

Choosing the right eviction policy is hard Key-value caches have unique challenges Variable object sizes Variable workloads Prior policies are heuristics that combine recency and frequency No theoretical foundation Require hand-tuning  fragile to workload changes No policy works for all workloads Prior system simulates many cache policy configurations to find right one per workload [Waldspurger, ATC ‘17]

goal: Auto-tuning eviction policy across workloads

The “big picture” of key-value caching Goal: Maximize cache hit rate Constraint: Limited cache space Uncertainty: In practice, don’t know what is accessed when Difficulty: Objects have variable sizes

Where does cache space go? Time ⇒ Let’s see what happens on a short trace… … A B B A C B A B D A B C D A B C B … A A A A A A B B B B B B B C Hit!  Space ⇒ … Eviction!  Y Y Y Y Y X X X X X

Where does cache space go? Time ⇒ Green box = 1 hit Red box = 0 hits  Want to fit as many green boxes as possible Each box costs resources = area  Cost proportional to size & time spent in cache … A B B A C B A B D A B C D A B C B … A A A A C C B B B C B B B Hit!  Space ⇒ … A B B Eviction!  D D Y X

The Key Idea: Hit Density

Our metric: Hit density (HD) Hit density combines hit probability and expected cost Least hit density (LHD) policy: Evict object with smallest hit density But how do we predict these quantities? Hit density= 𝑶𝒃𝒋𝒆𝒄𝒕 ′ 𝒔 𝐡𝐢𝐭 𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝑂𝑏𝑗𝑒𝑐𝑡 ′ 𝑠 size×𝑶𝒃𝒋𝒆𝒄 𝒕 ′ 𝒔 𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐥𝐢𝐟𝐞𝐭𝐢𝐦𝐞

Estimating hit density (HD) Age – # accesses since object was last requested Random variables 𝐻 – hit age (e.g., P[𝐻=100] is probability an object hits after 100 accesses) 𝐿 – lifetime (e.g., P[L=100] is probability an object hits or is evicted after 100 accesses) Easy to estimate HD from these quantities: 𝐻𝐷= 𝑎=1 ∞ P[𝐻=𝑎] 𝑆𝑖𝑧𝑒× 𝑎=1 ∞ 𝑎 P[𝐿=𝑎] 𝐡𝐢𝐭 𝐩𝐫𝐨𝐛𝐚𝐛𝐢𝐥𝐢𝐭𝐲 𝐞𝐱𝐩𝐞𝐜𝐭𝐞𝐝 𝐥𝐢𝐟𝐞𝐭𝐢𝐦𝐞

Example: Estimating HD from object age Hit probability Estimate HD using conditional probability Monitor distribution of 𝐻 & 𝐿 online By definition, object of age 𝑎 wasn’t requested at age ≤𝑎  Ignore all events before 𝑎 Hit probability=P hit age 𝑎]= 𝑥=𝑎 ∞ P 𝐻=𝑥 𝑥=𝑎 ∞ P 𝐿=𝑥 Expected remaining lifetime=E 𝐿−𝑎 age 𝑎]= 𝑥=𝑎 ∞ (𝑥−𝑎) P 𝐿=𝑥 𝑥=𝑎 ∞ P 𝐿=𝑥 Candidate age 𝑎

LHD by example Common User-specific Users ask repeatedly for common objects and some user-specific objects More popular Less popular Common User-specific Best hand-tuned policy for this app: Cache common media + as much user-specific as fits

Probability of referencing object again Common object modeled as scan, user-specific object modeled as Zipf

LHD by example: what’s the hit density? Hit density large & increasing Hit density small & decreasing High hit probability Older objs closer to peak  expected lifetime decreases with age Older objects are probably unpopular  expected lifetime increases with age Low hit probability

LHD by example: policy summary Hit density large & increasing Hit density small & decreasing LHD automatically implements the best hand-tuned policy: First, protect the common media, then cache most popular user content

Improving LHD using additional object features Conditional probability lets us easily add information! Condition 𝐻 & 𝐿 upon additional informative object features, e.g., Which app requested this object? How long has this object taken to hit in the past? Features inform decisions  LHD learns the “right” policy No hard-coded heuristics!

LHD gets more hits than prior policies Lower is better!

LHD gets more hits across many traces

LHD needs much less space

Why does LHD do better? Case study vs. AdaptSize [Berger et al, NSDI’17] AdaptSize improves LRU by bypassing most large objects Biggest objects LHD admits all objects  more hits from big objects LHD evicts big objects quickly  small objects survive longer  more hits Smallest objects

RankCache: Translating Theory to Practice

The problem Prior complex policies require complex data structures Synchronization  poor scalability  unacceptable request throughput Policies like GDSF require 𝑂( log 𝑁 ) heaps Even 𝑂 1 LRU is sometimes too slow because of synchronization Many key-value systems approximate LRU with CLOCK / FIFO MemC3 [Fan, NSDI ‘13], MICA [Lim, NSDI ‘14]… Can LHD achieve similar request throughput to production systems?

RankCache makes LHD fast Track information approximately (eg, coarsen ages) Precompute HD as table indexed by age & app id & etc Randomly sample objects to find victim Similar to Redis, Memshare [Cidon, ATC ‘17], [Psounis, INFOCOM ’01], Tolerate rare races in eviction policy

Making hits fast Metadata updated locally  no global data structure Same scalability benefits as CLOCK, FIFO vs. LRU

Lookup hit density (pre-computed) Making evictions fast No global synchronization  Great scalability! (Even better than CLOCK/FIFO!) A B C D E F G Sample objects Lookup hit density (pre-computed) A Miss! C Evict E F E Repeat if race detected

Memory management Many key-value caches use slab allocators (eg, memcached) Bounded fragmentation & fast …But no global eviction policy  poor hit ratio Strategy: balance victim hit density across slab classes Similar to Cliffhanger [Cidon, NSDI’16] and GD- Wheel [Li, EuroSys’15] Slab classes incur negligible impact on hit rate

Serial bottlenecks dominate  LHD best throughput Optimization we don’t have time to talk about! GDSF & LRU don’t scale! CLOCK doesn’t scale when there are even a few misses! RankCache scales well with or without misses!

Related Work Using conditional probabilities for eviction policies in CPU caches EVA [Beckmann, HPCA ‘16, ’17] Fixed object sizes Different ranking function Prior replacement policies Key-value: Hyperbolic [Blankstein, ATC ‘17], Simulations [Waldspurger, ATC ‘17], AdaptSize [Berger, NSDI ‘17], Cliffhanger [Cidon, NSDI ‘16]… Non key-value: ARC [Megiddo, FAST ’03], SLRU [Karedla, Computer ‘94], LRU-K [O’Neil, Sigmod ‘93]… Heuristic based Require tuning or simulation

Future directions Dynamic latency / bandwidth optimization Smoothly and dynamically switch between optimized hit ratio and byte-hit ratio Optimizing end-to-end response latency App touches multiple objects per request One such object evicted  others should be evicted too Modeling cost, e.g., to maximize write endurance in FLASH / NVM Predict which objects are worth writing to 2nd tier storage from memory

Thank you!