Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras
A simple caching environment
Basic assumptions 1. The number of all Web pages N is known. 2. The system is closed. 3. The requests for Web pages follow Zipf’s Law. 4. The requests are statistically independent.
(only order of magnitude matters) (yeah, right…but we won’t care) (plenty of experimental evidence) (very strong assumption - counterintuitive(?))
Zipf-like distributions More generally: where is a constant between , depending on the particular request stream.
Popularities according to Zipf where =1.
Our motivation Serpanos & Wolf prove analytically the optimality of Perfect-LFU under assumptions 3 and 4. Breslau et al. studied the implications of assumptions 3 and 4. Give evidence for Zipf-like distribution of page requests, and for the optimality of Perfect-LFU as a cache replacement policy. But, if so...
Why people don’t use Perfect-LFU? Answer: Because it is ‘Perfect’ (i.e. impractical). Perfect-LFU needs to store statistics for all the pages requested from the beginning of cache operation. Hence the resources (time/space) needed are of order N.
Our contribution : We show that under assumptions 1-4 we can efficiently approximate the Perfect-LFU hit rate within any constant ε.
Chernoff bounds Theorem [Chernoff]: The sum of R i.i.d. random variables is close to its expected value with very high probability:
Observation 1: Under our assumptions, the number of requests for a page in a random trace is close to its expected value, i.e. proportional to its popularity. Observation 2: With a small R we can distinguish the most popular objects.
Window-LFU Simple variation of Perfect-LFU. Instead of keeping statistics for all pages, keep only for a sample of the request stream (called window) of size where C is the cache size, and ε is the error parameter. Cache the C most frequent pages in the sample.
Theorem: Under our assumptions,
Window placement Observation : Under our assumptions, any sample of size |W| will achieve the Perfect-LFU hit rate. New request Request stream CACHE
Locality Two different types of locality phenomena: Temporal Popularity Our window will be the |W| most recent requests to take advantage of temporal locality as well.
Simulation results
Conclusions / Open problems Window-LFU is an efficient implementation of LFU It takes advantage of the different types of locality to achieve in practice better performance than Perfect- LFU. How can we determine the window size dynamically? (simple doubling heuristic performs very well) How can we detect that the Zipf-like distribution parameters (N,α) have changed?