Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.

Slides:



Advertisements
Similar presentations
Computer Science Generating Streaming Access Workload for Performance Evaluation Shudong Jin 3nd Year Ph.D. Student (Advisor: Azer Bestavros)
Advertisements

The Cache Location Problem. Overview TERCs Vs. Proxies Stability Cache location.
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
Memory System Characterization of Big Data Workloads
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Spring 2003CS 4611 Content Distribution Networks Outline Implementation Techniques Hashing Schemes Redirection Strategies.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 CPSC : Project Brainstorming Session Carey Williamson Department of Computer Science University of Calgary.
October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
CSE 291 System Services for the World Wide Web Winter 2000 Geoffrey M. Voelker.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
1 Internet Protocols and Network Performance Issues Carey Williamson iCORE Professor Department of Computer Science University of Calgary.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Submitting: Barak Pinhas Gil Fiss Laurent Levy
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Anycast Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Web Caching Schemes For The Internet – cont. By Jia Wang.
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
Content Distribution Networks (CDNs) Mike Freedman COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Content Distribution Network (CDN) Performance Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, Jan 16th.
ACDN: A CDN for Applications Pradnya Karbhari Michael Rabinovich Zhen Xiao Fred Douglis AT&T Labs -- Research.
Large-Scale Web Caching and Content Delivery Jeff Chase CPS 212: Distributed Information Systems Fall 2000.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
Distributing Content Simplifies ISP Traffic Engineering Abhigyan Sharma* Arun Venkataramani* Ramesh Sitaraman*~ *University of Massachusetts Amherst ~Akamai.
Network Traffic Modeling Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, March 6.
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Web Cache Consistency. “Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”
NUS.SOC.CS Roger Zimmermann (based in part on slides by Ooi Wei Tsang) 1 Proxy Caching for Streaming Media.
Hot Systems, Volkmar Uhlig
Web Prefetching Lili Qiu Microsoft Research March 27, 2003.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Video Caching in Radio Access network: Impact on Delay and Capacity
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Web Proxy Caching: The Devil is in the Details Ramon Caceres, Fred Douglis, Anja Feldmann Young-Ho Suh Network Computing Lab. KAIST Proceedings of the.
Content Distribution Networks (CDNs)
/ Fast Web Content Delivery An Introduction to Related Techniques by Paper Survey B Li, Chien-chang R Sung, Chih-kuei.
On the scale and performance of cooperative Web proxy caching 2/3/06.
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
Multicast in Information-Centric Networking March 2012.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Fault – Tolerant Distributed Multimedia Streaming Web Application By Nirvan Sagar – Srishti Ganjoo – Syed Shahbaaz Safir
Clustered Web Server Model
Proxy Caching for Streaming Media
The Impact of Replacement Granularity on Video Caching
Memory Management for Scalable Web Data Servers
On the Scale and Performance of Cooperative Web Proxy Caching
Distributed Content in the Network: A Backbone View
Distributed Systems CS
Zipf-Distributions & Caching
Network Traffic Modeling
EE 122: Lecture 22 (Overlay Networks)
Presentation transcript:

Web Caching and Content Delivery

Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve Web performance Duplicate requests to the same document served from cache Hits reduce latency, bandwidth demand, server load Misses increase latency (extra hops) ClientsProxy CacheServers Hits Misses Internet [Source: Geoff Voelker]

Proxy Caching How should we build caching systems for the Web? Seminal paper [Chankhunthod96] Proxy caches [Duska97] Akamai DNS interposition [Karger99] Cooperative caching [Tewari99, Fan98, Wolman99] Popularity distributions [Breslau99] Proxy filtering and transcoding [Fox et al] Consistency [Tewari,Cao et al] Replica placement for CDNs [et al] [Voelker]

Issues for Web Caching Binding clients to proxies, handling failover Manual configuration, router-based “transparent caching”, WPAD (Web Proxy Automatic Discovery) Proxy may confuse/obscure interactions between server and client. Consistency management At first approximation the Web is a wide-area read-only file service...but it is much more than that. caching responses vs. caching documents deltas Prefetching, scale, request routing, scale, performance Web caching vs. content distribution (CDNs, e.g., Akamai)

End-to-End Content Delivery request stream Internet hosting network request distributor surrogate caches CDN servers proxies server array + storage upstreamdownstream

Proxy Cache Effectiveness How to measure Web cache effectiveness (goals)? Hit ratio Savings in bandwidth or server load Reduction in perceived user latency What factors determine/limit effectiveness? Capacity? User population? Proxy placement in the network? Updates and invalidations?

Web Traffic Characterization Research question: how do goals and traffic behavior shape strategies for deploying and managing proxy caches? Replacement policy: what objects to retain in cache? Large vs. small, relative importance of popularity and stability Deployment: where to place the cache? Close to server or client? How many users per cache? Prefetching? Since the Web is in active deployment on a large-scale, Web traffic characterization is an empirical science. Science of mass behavior: observe and test hypotheses.

Zipf [Breslau/Cao99] and others observed that Web accesses can be modeled using Zipf-like probability distributions. Rank objects by popularity: lower rank i ==> more popular. The probability that any given reference is to the ith most popular object is p i Not to be confused with p c, the percentage of cacheable objects. Zipf says: “p i is proportional to 1/i , for some  with 0 <  < 1 ”. Higher  gives more skew: popular objects are way popular. Lower  gives a more heavy-tailed distribution. In the Web,  ranges from 0.6 to 0.8 [Breslau/Cao99]. With  =0.8, 0.3% of the objects get 40% of requests.

Zipf-like Reference Distributions p i 1/i   p i = 1 Probability of access to the object with popularity rank i: (This is equivalent to a power-law or Pareto distribution.) such that: head tail [Zipf 49, Duska et al. 97, Breslau et al. 98] Popularity rank heavy tail pipi

Importance of Traffic Models Analytical models like this help us to predict cache hit ratios (object hit ratio or byte hit ratio). E.g., get object hit ratio as a function of size by integrating under segments of the Zipf curve …assuming perfect LFU replacement Must consider update rate Do object update rates correlate with popularity? Must consider object size How does size correlate with popularity? Must consider proxy cache population What is the probability of object sharing? Enables construction of synthetic load generators SURGE [Barford and Crovella 99]

The “Trickle-Down Effect” clients cache to servers floodtrickle What is the effect on “downstream” traffic? What is the significance of this effect? How does it impact design choices for components “behind” the caches?

A Look at the Miss Stream synthetic trace SURGE-generated low locality:  = 0.6 log-log plot head: flattened midrange: tapers tail: intact Zipf-like

1998 ibm.com high locality fit Zipf  = 0.76 skewed: 77 % / 1% Effect on Server Trace (ibm.com)

What’s Happening? (LRU) Suppose the cache fills up in R references. (That’s a property of the trace and the cache size.) Then a cache miss on object with rank i occurs only if i is referenced…. probability p i …and i has not been referenced in the last R requests. probability (1 - p i ) R Stack distance P(a miss is to object i) is q i = p i (1 - p i ) R

Miss Stream Probability by Popularity q i : R = 10 4,  =0.7 IBM 1998 (32 MB) Moderately popular objects now dominate.

Object Hit Ratio by Popularity (1) synthetic  = 0.6

Object Hit Ratio by Popularity (2) IBM 1998

Limitations/Features of This Study static (cacheable) objects ignore misses caused by updates invalidation/expiration LRU replacement vary cache effectiveness by capacity cache intercepts all client traffic ignore effect on downstream traffic volume

Proxy Deployment and Use Where to put it? How to direct user Web traffic through the proxy? Request redirection Much more to come on this topic… Must the server consent? Protected content Client identity “Transparent” caching and the end-to-end principle Must the client consent?

Interception Switches ISP cache array The client doesn’t know. The server doesn’t know. Neither side told HTTP to disable it. Is it legal? Good thing? Bad thing?

Shouldn’t This Be Illegal? end middle RFC 1122: The Internet Architecture (IPv4) specifies that each packet has a unique destination “host” address. Problems middle boxes may be subversive IPsec and SSL dynamic routing

Cache Effectiveness Previous work has shown that hit rate increases with population size [Duska et al. 97, Breslau et al. 98] However, single proxy caches have practical limits Load, network topology, organizational constraints One technique to scale the client population is to have proxy caches cooperate