LHD: Improving Cache Hit Rate by Maximizing Hit Density

LHD: Improving Cache Hit Rate by Maximizing Hit Density
Nathan Beckmann Haoxian Chen Asaf Cidon CMU U. Penn Stanford & Barracuda Networks USENIX NSDI 2018

Outline Background and motivation Least hit density (LHD) RankCache
Conclusion and future work

Key-value cache Key-value cache is 100X faster than database.
Web application servers typically send requests to the cache cluster over the network with latencies of about 100 µs while querying a much slower database is with latencies of about 10 ms.

Key-value cache Key-value cache hit rate determines web application performance At 98% cache hit rate: +1% hit rate --> 35% speedup Old latency: 374 μs New latency: 278 μs Facebook study [Atikoglu, Sigmetrics ’12] Even small hit rate improvements cause significant speedup

key-value cache Goal: Maximize cache hit rate
Constraint: Limited cache space Uncertainty: In practice, don’t know what is accessed when Difficulty: Objects have variable sizes

Eviction policy Eviction policies show up in many contexts, e.g., OS page management, database buffer management, web proxies, and processors. Least recently used (LRU) is widely used. It uses only recency, or how long it has been since an object was last referenced, to decide which object to evict. However, LRU also has common pathologies that hurt its performance. In other words, LRU assumes that recently used objects are always more valuable. But common access patterns like scans (e.g., AB. . . Z AB. . . Z ) violate this assumption. scan each character from A to Z, In this case, the most recently used (MRU) is the correct policy to adopt.

Eviction policy Choosing the right eviction policy is hard
Key-value caches have unique challenges Variable object sizes Variable workloads Prior policies are heuristics that combine recency and frequency No theoretical foundation Require hand-tuning --> fragile to workload changes No policy works for all workloads Prior system simulates many cache policy configurations to find right one per workload [Waldspurger, ATC ‘17]

GOAL: Auto-tuning Eviction Policy Across Workloads

Cache space The figure shows how cache space is used over time.
Throughout this paper, time is measured in accesses, not wall-clock time. Each block represents an object where cache space is shown vertically, and time increases from left to right.

The Key Idea: Hit Density

Hit density (HD) Time => Cache space is shown vertically, and time increases from left to right. Each block thus represents a single object lifetime Each block is colored green or red, indicating whether it ends in a hit or eviction, respectively. Cost proportional to size & time spent in cache Space => They want to fit as many green blocks into the figure as possible. Each object takes up resources proportional to both its size (block height) and the time it spends in the cache (block width). Hence, the replacement policy wants to keep small objects that hit quickly.

Hit density (HD) Hit density combines hit probability and expected time. HD = P/(S×L) P: Object's hit probability; S: Object's size; L: Object's expected lifetime. Least hit density (LHD) policy: Evict object with smallest hit density measures an object’s contribution to the cache’s hit rate (in units of hits per byte-access) To rank objects, LHD must compute their hit probability and the expected time they will spend in the cache. (We assume that an object’s size is known and does not change.)

Hit density (HD) Age is the number of accesses since an object was last referenced. For example, if an object enters the cache at access T, hits at accesses T + 4 and T + 6, and is evicted at access T + 12, then it has two hits at age 4 and 2 and is evicted at age 6 (each reference resets age to zero).

LHD by example Consider an example application that scans repeatedly over a few objects, and accesses many other objects with Zipf-like popularity distribution. In this case, each scanned object is accessed frequently , whereas each Zipf-like object is accessed much less frequently. The cache should ideally therefore keep the scanned objects and evict the Zipf- like objects when necessary.

Summary of access pattern
LHD by example This figure illustrates this application’s access pattern, namely the distribution of time (measured in accesses) between references to the same object. Scanned objects produce a characteristic peak around a single reference time, as all are accessed together at once. Zipf-like objects yield a long tail of reference times. Note that in this example 70% of references are to the Zipf-like objects and 30% to scanned objects, but the long tail of popularity in Zipf-like objects leads to a low reference probability in this figure. Summary of access pattern

LHD by example This figure illustrates LHD ’s behavior on this example application, showing the distribution of hits and evictions vs. an object’s age. LHD keeps the scanned objects and popular Zipf references, as desired. Distribution of hits and evictions.

LHD by example It plots the predicted hit density for objects of different ages. The hit density is high up until the scanning peak, because LHD predicts that objects are potentially one of the scanned objects, and might hit quickly. It drops after the scanning peak because it learns they are Zipf objects and therefore unlikely to hit quickly. Predicted hit density.

Hit density (HD) Hit density combines hit probability and expected time. HD = P/(S×L) P: Object's hit probability; S: Object's size; L: Object's expected lifetime. Least hit density (LHD) policy: Evict object with smallest hit density LHD ’s challenge is to predict an object’s hit density, without knowing whether it will result in a hit or eviction, nor how long it will spend in the cache. To rank objects, LHD must compute their hit probability and the expected time they will spend in the cache. (We assume that an object’s size is known and does not change.)

Hit density (HD) LHD infers these quantities in real-time using probability distributions. Specifically, LHD uses distributions of hit and eviction age. Random variables H – hit age (e.g., P[H = 100] is probability an object hits after 100 accesses) L – lifetime (e.g., P[L = 100] is probability an object hits or is evicted after 100 accesses) Easy to estimate HD from these quantities: Age is the number of accesses since an object was last referenced.

Hit density (HD) Estimate HD using conditional probability
Monitor distribution of H &L online By definition, object of age a wasn’t requested at age ≤ a -->Ignore all events before a Hit probability = Expected remaining lifetime = Let the random variables H and L give hit and lifetime age; that is, P[H = a] is the probability that an object hits at age a, and P[L = a] is the probability that an object is hit or evicted at age a. Now consider an object of age a. Since the object has reached age a, we know it cannot hit or be evicted at any age earlier than a.

Improving predictions
To incorporate reference frequency, they count how many times each object has been referenced and gather separate hit and eviction age distributions for each reference count. To generalize, an object belongs to an equivalence class c; e.g., c could be all objects that have been referenced twice. LHD predict this object’s hit density as: where P[H = a|c] and P[L = a|c] are the conditional hit and lifetime age distributions for class c. Using classification to improve predictions

LHD LHD gets more hits than prior policies Miss ratio (%) Size (GB)
Fig. 4 shows the miss ratio across many cache sizes for LHD, LRU , and three prior policies: GDSF [4, 16], AdaptSize [9], and Hyperbolic [11]. Based on these results, we configure LHD to classify by last hit age (16 classes) and application id (16 classes). GDSF and Hyperbolic use different ranking functions based on object recency, frequency, and size (e.g., Eq. 1). AdaptSize probabilistically admits objects to an LRU cache to avoid polluting the cache with large objects. Size (GB)

LHD LHD needs much less space Under different traces.
They evaluate 512M requests from each trace, ignoring the first 128M to warm up the cache. For the shorter traces, we replay the trace if it terminates to equalize trace length across results.

RankCache: Translating Theory to Practice

The problem Prior complex policies require complex data structures
Synchronization --> poor scalability --> unacceptable request throughput Policies like GDSF require O(log N) heaps Even O(1) LRU is sometimes too slow because of synchronization Many key-value systems approximate LRU with CLOCK / FIFO MemC3 [Fan, NSDI ‘13], MICA [Lim, NSDI ‘14]... Can LHD achieve similar request throughput to production systems? Since naïve LRU lists require global synchronization, most key-value caches in fact use approximations of LRU , like CLOCK and FIFO , that eliminate synchronization except during evictions. inter-thread synchronization

RankCache makes LHD fast
Track information approximately (eg, coarsen ages) Precompute HD as table indexed by age & app id & etc Randomly sample objects to find victim Similar to Redis, Memshare [Cidon, ATC ‘17], [Psounis, INFOCOM ’01],

Making evictions fast Two key techniques make LHD practical: (i) precomputation and (ii) sampling. No global synchronization --> Great scalability! it would be very expensive to compute Eq. 5 for all objects on every cache miss. Instead, two key techniques make LHD practical: (i) precomputation and (ii) sampling.

Making hits fast Metadata updated locally --> no global data structure Same scalability benefits as CLOCK, FIFO vs. LRU

Throughput They present throughput at 90% and 100% hit ratio where the former represents a realistic deployment, the latter peak performance. Serial bottlenecks dominate --> LHD best throughput RankCache scales nearly as well as random because sampling avoids nearly all synchronization, whereas LRU and GDSF barely scale because they serialize all operations. Similarly, CLOCK performs well at 100% hit ratio, but serializes evictions and underperforms RankCache tags in with 10% miss ratio. Finally, using separate RankCache lowers throughput with a 100% hit ratio, but improves performance even with a 10% miss ratio. (a) 90% Hit ratio. (b) 100% Hit ratio.

Conclusion The authors introduce hit density, an intuitive, workload-agnostic metric for ranking objects during eviction. Dynamic ranking enables LHD to adapt its eviction strategy to different application workloads over time without any hand tuning. RankCache is a key-value cache based on memcached that efficiently implements LHD (and other policies). On average, LHD needs 8× less space than LRU , 2.4× less than GDSF and 2.9× less than AdaptSize. Finally, at 16 threads, RankCache achieves 16× higher throughput than list-based LRU and, at 90% hit rate, 2× higher throughput than CLOCK .

Future directions Dynamic latency / bandwidth optimization
Smoothly and dynamically switch between optimized hit ratio and byte-hit ratio Optimizing end-to-end response latency App touches multiple objects per request One such object evicted --> others should be evicted too Modeling cost, e.g., to maximize write endurance in FLASH / NVM Predict which objects are worth writing to 2nd tier storage from memory

Thanks & questions

Related Work Using conditional probabilities for eviction policies in CPU caches EVA [Beckmann, HPCA ‘16, ’17] Fixed object sizes Different ranking function Prior replacement policies Key-value: Hyperbolic [Blankstein, ATC ‘17], Simulations [Waldspurger, ATC ‘17], AdaptSize [Berger, NSDI ‘17], Cliffhanger [Cidon, NSDI ‘16]... Non key-value: ARC [Megiddo, FAST ’03], SLRU [Karedla, Computer ‘94], LRU-K [O’Neil, Sigmod ‘93]... Heuristic based Require tuning or simulation

LHD Case study vs. AdaptSize [Berger et al, NSDI’17]
AdaptSize improves LRU by bypassing most large objects LHD admits all objects --> more hits from big objects LHD evicts big objects quickly --> small objects survive longer --> more hits Why does LHD do better? (a) AdaptSize. (b) LHD.

Memory management Many key-value caches use slab allocators (eg, memcached) Bounded fragmentation & fast ...But no global eviction policy --> poor hit ratio Strategy: balance victim hit density across slab classes Similar to Cliffhanger [Cidon, NSDI’16] and GDWheel [Li, EuroSys’15] Slab classes incur negligible impact on hit rate

LHD: Improving Cache Hit Rate by Maximizing Hit Density

Similar presentations

Presentation on theme: "LHD: Improving Cache Hit Rate by Maximizing Hit Density"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

LHD: Improving Cache Hit Rate by Maximizing Hit Density

Similar presentations

Presentation on theme: "LHD: Improving Cache Hit Rate by Maximizing Hit Density"— Presentation transcript:

Similar presentations

About project

Feedback