Download presentation
Presentation is loading. Please wait.
Published byMabel Russell Modified over 9 years ago
1
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research
2
Overview n Analytical tools have evolved to predict behavior of large-scale Web caches. u Are results from existing large-scale caches consistent with the predictions? F NLANR u What do the models predict for Content Distribution/Delivery Networks (CDNs)? n Goal: answer these questions by extending models to predict interior cache behavior.
3
Generalized Cache/CDN (External View) {request, reply} Origin Servers Clients {push, request, reply} CDNs Web Caches
4
Generalized Cache/CDN (Internal View) Leaf Caches Interior Caches root caches reverse proxies Request Routing Function ƒ bound client populations ƒ
5
Goals and Limitations n Focus on interior cache behavior. u Assume leaf caches are ubiquitous. u Model CDNs as interior caches. n Focus on hit ratio (percentage of accesses absorbed by the “cloud”). u Ignore push replication; at best it merely reduces some latencies by moving data earlier. n Focus on “typical” static Web objects. u Ignore streaming media and dynamic content.
6
Outline n Analytical model u applied to interior nodes of cache hierarchies u applied to CDNs n Implications of the model for CDNs in the presence of ubiquitous leaf caching n Match model with observations from the NLANR cache hierarchy n Conclusion
7
Analytical Model n [Wolman/Voelker/Levy et. al., SOSP 1999] u refines [Breslau/Cao et. al., 1999], and others n Approximates asymptotic cache behavior assuming Zipf-like object popularity u caches have sufficient capacity n Parameters: u = per-client request rate u = rate of object change u p c = percentage of objects that are cacheable u = Zipf parameter (object popularity)
8
Cacheable Hit Ratio: the Formula n C N is the hit ratio for cacheable objects achievable by population of size N with a universe of n objects. [Wolman/Voelker/Levy et. al., SOSP 99] N
9
Inside the Hit Ratio Formula Approximates a sum over a universe of n objects......of the probability of access to each object x... …times the probability x was accessed since its last change. C is just a normalizing constant for the Zipf-like popularity distribution (a PDF). C = 1/ in [Breslau/Cao 99] 0 < < 1 N
10
Level 2 Level 1 (Root) N 2 clients N 1 clients An Idealized Hierarchy Assume the trees are symmetric to simplify the math. Ignore individual caches and solve for each level.
11
Hit Ratio at Interior Level i n C N gives us the hit ratio for a complete subtree covering population N n The hit ratio predicted at level i or at any cache in level i is given by: “the hits for N i (at level i) minus the hits captured by level i+1, over the miss stream from level i+1”
12
Root Hit Ratio n Predicted hit ratio for cacheable objects, observed at root of a two-level cache hierarchy (i.e. where r 2 =Rp c ):
13
N L clients N clients Generalizing to CDNs Request Routing Function Interior Caches (supply side) N I clients ƒ(leaf, object, state) Leaf Caches (demand side) N L clients Symmetry assumption: ƒ is stable and “balanced”. ƒ
14
CDN1CDN2 Servers Leaf Caches Interior Caches
15
Servers Leaf Caches Interior Caches N I clients
16
Servers Leaf Caches What happens to C N if we partition the object universe?
17
Servers Leaf Caches
18
Servers Leaf Caches
19
Servers Leaf Caches
20
Servers Leaf Caches
21
CDN1CDN2 Servers Leaf Caches
22
Hit ratio in CDN caches n Given the symmetry and balance assumptions, the cacheable hit ratio at the interior (CDN) nodes is: N I is the covered population at each CDN cache. N L is the population at each leaf cache.
23
Analysis n We apply the model to gain insight into interior cache behavior with: varying leaf cache populations ( N L ) F e.g., bigger leaf caches varying ratio of interior to leaf cache populations ( N I /N L ) F e.g., more specialized interior caches u Zipf parameter changes F e.g., more concentrated popularity
24
Analysis (cont’d) n Fixed parameters (unless noted otherwise): u (client request rate) = 590 reqs./day u (rate of object change) = F once every 14 days (popular objects, 0.3%) F once every 186 days (unpopular objects) u p c (percent of requests cacheable) = 60% u (Zipf parameter - object popularity) = 0.8
25
Cacheable interior hit ratio observed at interior level fixing interior/leaf population ratio cacheable hit ratio increasing N I and N L -->
26
Interior hit ratio as percentage of all cacheable requests, fixing interior/leaf population ratio marginal cacheable hit ratio increasing N I and N L -->
27
Cacheable interior hit ratio fixing leaf population cacheable hit ratio increasing “bushiness” -->
28
Cacheable interior hit ratio as percentage of all requests fixing leaf population marginal cacheable hit ratio increasing “bushiness” -->
29
Cacheable interior hit ratio as percentage of all requests varying Zipf parameter N L fixed at 1024 clients cacheable hit ratio
30
Cacheable interior hit ratio as percentage of all requests varying Zipf parameter N I /N L fixed at 64K cacheable hit ratio
31
Conclusions (I) n Interior hit ratio captures effectiveness of upstream caches at reducing access traffic filtered by leaf/edge caches. u Hit ratios grow rapidly with covered population. Edge cache populations (N L ) are key: is it one thousand or one million? With large N L, interior ratios are deceptive. At N L = 10 5, interior hit ratios might be 90%, but the CDN sees less than 20% of the requests.
32
Correlating with NLANR Observations n Do the predictions match observations from existing large-scale caches? n Observations made from traces provided by NLANR (10/12/99). u Observed total hit ratio at (unified) root is 32% u 200 of the 914 leaf caches in the trace account for 95% of requests u daily request rate indicates population is on the order of tens of thousands u What is the predicted N?
33
Model vs. Reality n NLANR roots cooperate; we filter the traces to determine the unified root hit ratio. n NLANR caches are bounded; traces imply that capacity misses are low at 16GB. n Analysis assumes the population is balanced across the 200 leaves of consequence. n Analysis must compensate for objects determined to be uncacheable at a leaf.
34
Cacheable interior hit ratio varying percentage of requests detected as uncacheable by leaves 200+ leaf caches cacheable hit ratio
35
Cacheable interior hit ratio varying percentage of requests detected as uncacheable at request time 1000 clients per leaf cache cacheable hit ratio
36
Conclusions (II) n NLANR root effectiveness is around 32% today; it is serving its users well. n NLANR experiment could validate the model, but more data from the experiment is needed. u E.g., covered populations, leaf summaries n The model suggests that the population covered by NLANR is relatively small. With larger N and N L, higher root hit ratios are expected, with lower marginal benefit.
38
Modeling CDNs n If the routing function satisfies three properties: an interior cache sees all requests for each assigned object x from a population of size N I u every interior cache sees an equivalent object popularity distribution (n/ held constant) all requests are routed through leaf caches that serve N L clients n then interior cacheable hit ratio is:
39
Hit ratio with detected uncacheable documents n p u is the percentage of uncacheable requests detected at request time (and not forwarded to parents):
40
Cache Hierarchies n As introduced by the Harvest project u k levels of demand-side caches arranged in a tree (for now) u clients are bound to leaves u each node’s miss stream routes to its parent n As extended by NLANR (Squid) u NLANR-operated root caches cooperate by partitioning URL space
41
Servers Level 2 Level 1 (Root) Cache Hierarchies Illustrated
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.