On the scale and performance of cooperative Web proxy caching 2/3/06.

Slides:

Advertisements

Similar presentations

Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.

Advertisements

A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:

Hadi Goudarzi and Massoud Pedram

Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.

Adapted from Menascé & Almeida.1 Workload Characterization for the Web.

Memory System Characterization of Big Data Workloads

1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.

Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.

An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.

Measurement, Modeling, and Analysis of a Peer-to-Peer File sharing Workload Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry.

1 Web Performance Modeling Chapter New Phenomena in the Internet and WWW Self-similarity - a self-similar process looks bursty across several time.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Locality-Aware Request Distribution in Cluster-based Network Servers 1. Introduction and Motivation --- Why have this idea? 2. Strategies --- How to implement?

Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.

Network Traffic Measurement and Modeling CSCI 780, Fall 2005.

Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.

1 CAPS: A Peer Data Sharing System for Load Mitigation in Cellular Data Networks Young-Bae Ko, Kang-Won Lee, Thyaga Nandagopal Presentation by Tony Sung,

1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.

Distributed Servers Architecture for Networked Video Services S. H. Gary Chan, Member IEEE, and Fouad Tobagi, Fellow IEEE.

A Distributed Search Service for Peer-to-Peer File Sharing in Mobile Application Presented by Tony Sung On Loy, MC Lab, CUHK IE 1 A Distributed Search.

Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.

Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.

Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.

Adaptive Web Caching Lixia Zhang, Sally Floyd, and Van Jacob-son. In the 2nd Web Caching Workshop, Boulder, Colorado, April 25, System Laboratory,

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.

Web Caching Schemes For The Internet – cont. By Jia Wang.

World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, Mohit Aron, et al. Rice University.

Large-Scale Web Caching and Content Delivery Jeff Chase CPS 212: Distributed Information Systems Fall 2000.

Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.

Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.

By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.

Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.

COCONET: Co-Operative Cache driven Overlay NETwork for p2p VoD streaming Abhishek Bhattacharya, Zhenyu Yang & Deng Pan.

1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.

On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,

Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.

Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan P. Balaji H. –W. Jin D.K. Panda Network-Based.

Distributing Layered Encoded Video through Caches Authors: Jussi Kangasharju Felix HartantoMartin Reisslein Keith W. Ross Proceedings of IEEE Infocom 2001,

World Wide Web Caching: Trends and Technologys Gerg Barish & Katia Obraczka USC Information Sciences Institute, USA,2000.

Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.

« Performance of Compressed Inverted List Caching in Search Engines » Proceedings of the International World Wide Web Conference Commitee, Beijing 2008)

Scalable Web Server on Heterogeneous Cluster CHEN Ge.

Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.

Huirong Fu and Edward W. Knightly Rice Networks Group Aggregation and Scalable QoS: A Performance Study.

ECO-DNS: Expected Consistency Optimization for DNS Chen Stephanos Matsumoto Adrian Perrig © 2013 Stephanos Matsumoto1.

Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep

1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,

Investigating the Performance of Audio/Video Service Architecture II: Broker Network Ahmet Uyar & Geoffrey Fox Tuesday, May 17th, 2005 The 2005 International.

Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,

Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part V Workload Characterization for the Web.

Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.

Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras.

Network Protocols: Design and Analysis Polly Huang EE NTU

 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.

Hot Systems, Volkmar Uhlig

MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.

On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.

An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.

#16 Application Measurement Presentation by Bobin John.

© 2003, Carla Ellis Strong Inference J. Pratt Progress in science advances by excluding among alternate hypotheses. Experiments should be designed to disprove.

1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.

Ó 1998 Menascé & Almeida. All Rights Reserved.1 Part VIII Web Performance Modeling (Book, Chapter 10)

Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.

The Impact of Replacement Granularity on Video Caching

Memory Management for Scalable Web Data Servers

On the Scale and Performance of Cooperative Web Proxy Caching

Cluster Resource Management: A Scalable Approach

Presentation transcript:

On the scale and performance of cooperative Web proxy caching 2/3/06

Related Work Static Analysis –request rate, –number of requests –diversity of population Trace driven cache simulation –Temporal locality of proxy traces Web page requests follow Ziph-like distribution –the number of requests to the ith –most popular document is proportional to for some

Related Work Hit ratio for a Web proxy grows logarithmically with the Client population of the proxy and the number of requests seen by the proxy

Coop Web Caching Work reduce access latency Bandwidth consumption Hierachical Directory based Multicast based Upshot – cooperation requires cache manager to determine when to cooperate

WebDoc Sharing and Caching 4 questions 1. What is the best performance one could achieve “perfect” cooperative caching? 2. For what range of client populations can cooperative caching work effectively? 3. Does the way in which clients are assigned to matter? 4. What cache hit rates are necessary to achieve worthwhile decreases in document access latency?

Two sites analyzed UW –200 organizations Microsoft Campus –Big corporation – large population, number of proxies WebDoc Sharing and Caching

Work specifically not done: Did not investigate the effects of cooperative caching on server load –General –Hot spot conditions WebDoc Sharing and Caching

First results

Trace Collection Existing traces inadequate New design UW has few proxies Traces are anonymized Grouped based on organizational membership –Track organizations across the campus Table 1 shows that 7 days gave 108 million requests by 60,000 clients to 360,000 servers in the Microsoft trace.

Simulation methodology Real caches will incur misses due to capacity limitations that we do not model. Capacity misses are rarely the bottleneck for Web caches. For example, only 3% of the requests to the MS Web proxies missed due to the finite capacity of the proxies (which have 9GB of RAM and 180GB of disk capacity).

Simulation Methodology Practical cache –models the cacheability of documents according to the algorithms in the Squid V2 implementation Ideal cache –All documents are cacheable – i.e. equal cacheability for all –Upper bound of improvement on workloads due to improvements in internet protocols

Population size In a cooperative-caching scheme, a proxy forwards a missing request to other proxies to determine if: 1.Another proxy holds the requested document 2.The document can be returned faster than a request to the server.

Population size A collection of cooperating caches will achieve the hit-rate of a single proxy acting over the combined population of all the proxies. Proxies will pay the overheads of inter- proxy communication latency. Examining a single, top-level proxy thus gives us an upper on cooperative-caching performance.

Population size

Hit rate vs. latency and bandwidth Latency, not hit rate is crucial to clients To ISPs: hit-rate = bandwidth savings – improving congestion

Hit rate vs. latency and bandwidth

Proxies and organizations If high localily && small population –Then: achieve max hit rate

Question What benefit would clients in real organizations see if their proxies were to cooperate with other real organizational proxies? UW environment is an attempted answer –each org acts like business, with own proxy sitting on its connection to the Internet. –Each org categorized into 1 of 200 UW organizations

Proxies and organizations

Question: is grouping of clients to proxies, for example, one based on each client’s document interests, better? Soln: –Clustering algo used – optimize intracluster sharing –randomly assigned clusters have a consistently lower hit rate than the optimally clustered organizations.

Proxies and organizations

Impact of larger population size Recall: cooperative caching can increase hit rate This indicates that there is little correlation between sharing and the cacheability of documents for the UW population. cooperative caching among populations larger than 2.4 million does not increase the hit rate to cacheable documents

Impact of larger population size Experiment: have MS and UW cooperate Results: –When scaled by equal factors, MS gains more benefit by cooperating with the UW population than the UW population gains by cooperating with MS. –Unpopular documents are universally unpopular;  a request in UW or MS will not find either proxy 1/500 (first access) has hit rate increase ~ regardless of popularity

Proxies and organizations

Impact of larger population size

Docs and Proxy Sharing Summary 1. The behavior of cooperative caching is characterized by two different regions of the hit rate vs. population curve. For smaller populations, hit rate increases rapidly with population; it is in this region that cooperative caching can be used effectively. However, these population sizes can be handled by a single proxy. Cooperative caching is only necessary to adapt to proxy assignments made for political or geographical reasons.

Docs and Proxy Sharing Summary 2. Larger populations (beyond the knee of the population vs. hit rate curve), cooperative caching is unlikely to provide significant benefit. Simultaneous traces of the MS and UW populations show that via cooperative caching: –UW: 4x increase of population via cooperative caching netted only a 2.7% increase in cacheable hit rate.

Docs and Proxy Sharing Summary 3. MS and others show clustering does occur but that cooperative caching specialized to interest groups is unlikely to be effective.

Docs and Proxy Sharing Summary 4. Previous work has hinted at the general trends, but CC end conclusions have not been show yet

An analytic model of Web accesses Steady-state performance The model Model parameters Performance of large scale proxy caching Summary

Steady-state performance

The Model Population has N clients n total documents The important characteristic of a Zipf-like distribution is that it is heavy-tailed – a significant fraction of the probability mass is concentrated in the tail, which in this case means that a significant fraction of requests go to the relatively unpopular documents.

The Model Ziph –Distribution Cot’d –Popularity of document is proportional to 1 / increases, the distribution becomes less heavy-tailed, and a larger fraction of the probability mass is concentrated on the most popular documents

The Model The probability that a requested document is cacheable is p c. Avg document size is E(S). Document size is independent of document popularity, latency, and change rate The last-byte latency to the server that houses that document has average value E(L). Last-byte latency is independent of document popularity and document change rate.

The Model Performance Characteristics

The Model Performance characteristics continued –The expected last-byte latency to serve a request is given by: –average bandwidth savings per request due to proxy caching:

The Model Differences between new and previous work: –we consider the steady state behavior of caching systems rather than caching behavior based on a finite request sequence –Incorporate document change rate into the model rather than assuming that documents are static –Goal: use our model to understand the performance of large-scale, cooperative-caching schemes in terms of hit rate, latency, bandwidth savings, and storage consumed.

Model parameters UW Trace

Model parameters

Performance of large scale proxy caching Hit rate, latency, and bandwidth Document rate of change Client request rate Document popularity and size of the Web

Performance of large scale proxy caching

Document popularity and size of the Web Zipf # documents n (alpha) skews the distribution towards popular documents significantly increasing hit rates for slower rates of change Slight increase in hit rates for faster rates of change. Increase the number of documents n shifts the curves for slow and fast rates of change to larger populations This population shift is ~ in proportion to the increase in n –n=3.2 billion  the slow curve reaches a 90% hit rate at a population of 250,000 –n=32 billion  the slow curve reaches a 90% hit rate at a population of 25 million –n=320 billion  the slow curve reaches a 90% hit rate at a population of 250 million.

Model Summary analytic model used to examine the steady-state performance of cooperative caching schemes. small populations achieve most of the performance benefits of cooperative caching.

Wrap Up 1. Without client behavior changes: –little point in continuing design and evaluation of highly scalable, cooperative-caching schemes Cooperative caching makes sense up to the level of a medium-sized city

Wrap Up The largest benefit for cooperative caching is achieved for relatively small populations. Analysis of cooperation among small organizations within the university environment. Traces of UW and MS confirmed marginal benefit of cooperative caching among organizations with populations of 20K clients or more. (large) Scale this big only in in very high-bandwidth, low-latency environments.

Wrap Up Performance of cooperative caching limited by document cacheability. Increasing cacheability of documents is the main challenge for future Web cache behavior research

Wrap Up Cluster-based analysis of client access patterns indicate: –cooperative-caching organizations based on mutual interest offer no obvious advantages over randomly assigned or organization- based groupings.

Wrap Up Fundamentally, the usefulness of cooperative Web proxy caching depends upon the scale at which it is being applied. Whether or not they use cooperative caching locally, large organizations should use proxy caching for their user populations. Concern: cooperative caching only marginally helpful

Wrap Up Results shown are on static data Shift in user workflow will change – i.e. streaming media –Average size is magnitudes larger –Reveal: better utilization of network resources necessary –Size and time of transfer for streaming objects shows that multicast methods might be good

Assumptions Static objects like web pages, documents User populations not too large Network latency, performance medium

Questions?