1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June 2000

2 Overview Web Caching Goals Caching Levels Classical caching algorithms and the Independent Reference (IR) model Web caching issues New algorithms and analysis for Web caches Discussion

3 Web Caching Goals  Reduce response latency  Reduce bandwidth consumption  Reduce server load Exploit the locality of reference

4 Web Caching Levels Internet Clients Server Browser cache Proxy cache Reverse proxy

5 Caching: Performance Cache buffers have finite capacity Goal: Maximize the proportion of requests served by the cache (hit ratio) Need to devise algorithms that keep the “hot” documents in the cache

6 Caching Algorithms LRU FIFO CLIMB (Transpose)

7 LRU (Least Recently Used) 1 2 3 4 5 The buffer is arranged as a stack 5

8 LRU (ii) 1 2 3 4 5

9 LRU (iii) 1 2 3 4 5 3

10 LRU (iv) 1 2 4 5 3

11 CLIMB (Transpose) 1 2 3 4 5

12 CLIMB (ii) 1 3 2 4 5

13 Analysis: The IR model N: total number of pages p i : the probability that page i (i = 1,2,…,N) is requested Independent of previous requests Remarks: –Model mostly justified for proxy caches –Studies show that web page popularity follow a Zipf law

14 Cache algorithms K: Capacity storage of the cache (in pages) Ideally, place the K pages with the greatest value of p i into the cache Problem: the values p i are unknown a priori

15 LRU, FIFO, CLIMB analysis Under the IR model, the cache dynamics can be described by a Markov chain Each state {I 1, I 2,…, I K } represents the identity (URL) and ordering of the pages within the cache

16 LRU – Stationary Probabilities Allows to compute hit ratio Similar results for FIFO and CLIMB

17 Analysis - Summary Best hit ratio for CLIMB followed by LRU followed by FIFO Convergence rate much faster for LRU and FIFO than CLIMB Some mathematical issues still open

18 New Issues Non-uniform page size Non-uniform access costs –Nearby vs. distant servers –Underloaded vs. overloaded servers Page updates

19 The Extended IR model (Size) Same assumptions as in the IR model + The size of page i is s i The cache size is K

20 Off-Line Problem Knapsack Problem!

21 Heuristics Place documents in the cache with the greatest p i /s i values Perform, at most, twice worse than the optimal solution (except for extreme cases) Goal: Devise new on-line algorithms that learn to order documents according to p i /s i values

22 Size-LRU algorithm Set s min = min{s 1,s 2,…,s N } A randomized algorithm When page i is requested then – Act like LRU with probability s min /s i – Otherwise, do not change the cache ordering

23 Result IR model LRU p i Extended IR model Size-LRU p i /s i Size-LRU is dual to LRU

24 Example: Size-LRU Stationary Probabilities

25 Numerical Example N=100 documents Page popularity Heavy-tailed document size

26 Numerical Example

27 Summary New issues in Web caching Size-LRU algorithm Dual to LRU Extensions for cost issue On-going research The End

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

Similar presentations

Presentation on theme: "1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

Similar presentations

Presentation on theme: "1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June."— Presentation transcript:

Similar presentations

About project

Feedback