Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras.

Slides:



Advertisements
Similar presentations
The Simple Linear Regression Model Specification and Estimation Hill et al Chs 3 and 4.
Advertisements

Introduction Simple Random Sampling Stratified Random Sampling
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Estimation of Means and Proportions
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Copyright © 2006 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
10 Further Time Series OLS Issues Chapter 10 covered OLS properties for finite (small) sample time series data -If our Chapter 10 assumptions fail, we.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
Class notes for ISE 201 San Jose State University
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
Adaptive Content Management in Structured P2P Communities Jussi Kangasharju Keith W. Ross David A. Turner.
Chapter Topics Types of Regression Models
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Chapter 7: Variation in repeated samples – Sampling distributions
The moment generating function of random variable X is given by Moment generating function.
Computer Science Characterizing and Exploiting Reference Locality in Data Stream Applications Feifei Li, Ching Chang, George Kollios, Azer Bestavros Computer.
Standard error of estimate & Confidence interval.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Chapter 7 Sampling Distribution
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
MS 305 Recitation 11 Output Analysis I
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
TinyLFU: A Highly Efficient Cache Admission Policy
Stat 13, Tue 5/15/ Hand in HW5 2. Review list. 3. Assumptions and CLT again. 4. Examples. Hand in Hw5. Midterm 2 is Thur, 5/17. Hw6 is due Thu, 5/24.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Statistics Workshop Tutorial 5 Sampling Distribution The Central Limit Theorem.
Determination of Sample Size: A Review of Statistical Theory
Population and Sample The entire group of individuals that we want information about is called population. A sample is a part of the population that we.
Chapter 7 Sampling and Sampling Distributions ©. Simple Random Sample simple random sample Suppose that we want to select a sample of n objects from a.
1 Chapter 7 Sampling Distributions. 2 Chapter Outline  Selecting A Sample  Point Estimation  Introduction to Sampling Distributions  Sampling Distribution.
Section 6-5 The Central Limit Theorem. THE CENTRAL LIMIT THEOREM Given: 1.The random variable x has a distribution (which may or may not be normal) with.
Simulation Example: Generate a distribution for the random variate: What is the approximate probability that you will draw X ≤ 1.5?
Chapter 18: Sampling Distribution Models
Chapter 7 Sampling Distributions. Sampling Distribution of the Mean Inferential statistics –conclusions about population Distributions –if you examined.
SAMPLING DISTRIBUTIONS
Content caching and scheduling in wireless networks with elastic and inelastic traffic Group-VI 09CS CS CS30020 Performance Modelling in Computer.
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
Chapter 18 Sampling Distribution Models *For Means.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Sampling Distributions
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Sampling and Sampling Distributions
Data-Streams and Histograms
Sample Mean Distributions
Chapter 18: Sampling Distribution Models
Parameter, Statistic and Random Samples
Spatial Online Sampling and Aggregation
Coded Caching in Information-Centric Networks
PENGOLAHAN DAN PENYAJIAN
Zipf-Distributions & Caching
Confidence Intervals Chapter 10 Section 1.
Sampling Distribution Models
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
Sampling Distributions
Sampling Distribution Models
CS639: Data Management for Data Science
Presentation transcript:

Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras

A simple caching environment

Basic assumptions 1. The number of all Web pages N is known. 2. The system is closed. 3. The requests for Web pages follow Zipf’s Law. 4. The requests are statistically independent.

(only order of magnitude matters) (yeah, right…but we won’t care) (plenty of experimental evidence) (very strong assumption - counterintuitive(?))

Zipf-like distributions More generally: where  is a constant between , depending on the particular request stream.

Popularities according to Zipf where  =1.

Our motivation Serpanos & Wolf prove analytically the optimality of Perfect-LFU under assumptions 3 and 4. Breslau et al. studied the implications of assumptions 3 and 4. Give evidence for Zipf-like distribution of page requests, and for the optimality of Perfect-LFU as a cache replacement policy. But, if so...

Why people don’t use Perfect-LFU? Answer: Because it is ‘Perfect’ (i.e. impractical). Perfect-LFU needs to store statistics for all the pages requested from the beginning of cache operation. Hence the resources (time/space) needed are of order N.

Our contribution : We show that under assumptions 1-4 we can efficiently approximate the Perfect-LFU hit rate within any constant ε.

Chernoff bounds Theorem [Chernoff]: The sum of R i.i.d. random variables is close to its expected value with very high probability:

Observation 1: Under our assumptions, the number of requests for a page in a random trace is close to its expected value, i.e. proportional to its popularity. Observation 2: With a small R we can distinguish the most popular objects.

Window-LFU Simple variation of Perfect-LFU. Instead of keeping statistics for all pages, keep only for a sample of the request stream (called window) of size where C is the cache size, and ε is the error parameter. Cache the C most frequent pages in the sample.

Theorem: Under our assumptions,

Window placement Observation : Under our assumptions, any sample of size |W| will achieve the Perfect-LFU hit rate. New request Request stream CACHE

Locality Two different types of locality phenomena: Temporal Popularity Our window will be the |W| most recent requests to take advantage of temporal locality as well.

Simulation results

Conclusions / Open problems Window-LFU is an efficient implementation of LFU It takes advantage of the different types of locality to achieve in practice better performance than Perfect- LFU. How can we determine the window size dynamically? (simple doubling heuristic performs very well) How can we detect that the Zipf-like distribution parameters (N,α) have changed?