On the Scale and Performance of Cooperative Web Proxy Caching

Slides:



Advertisements
Similar presentations
Dynamic Replica Placement for Scalable Content Delivery Yan Chen, Randy H. Katz, John D. Kubiatowicz {yanchen, randy, EECS Department.
Advertisements

Cost-Based Cache Replacement and Server Selection for Multimedia Proxy Across Wireless Internet Qian Zhang Zhe Xiang Wenwu Zhu Lixin Gao IEEE Transactions.
A Survey of Web Cache Replacement Strategies Stefan Podlipnig, Laszlo Boszormenyl University Klagenfurt ACM Computing Surveys, December 2003 Presenter:
Latency-sensitive hashing for collaborative Web caching Presented by: Xin Qi Yong Yang 09/04/2002.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Cooperative Caching Middleware for Cluster-Based Servers Francisco Matias Cuenca-Acuna Thu D. Nguyen Panic Lab Department of Computer Science Rutgers University.
Prefix Caching assisted Periodic Broadcast for Streaming Popular Videos Yang Guo, Subhabrata Sen, and Don Towsley.
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
1 CAPS: A Peer Data Sharing System for Load Mitigation in Cellular Data Networks Young-Bae Ko, Kang-Won Lee, Thyaga Nandagopal Presentation by Tony Sung,
Submitting: Barak Pinhas Gil Fiss Laurent Levy
All Organizations Need to Share and Communicate Information...
DEPARTMENT OF COMPUTER SCIENCE SOFTWARE ENGINEERING, GRAPHICS, AND VISUALIZATION RESEARCH GROUP 15th International Conference on Information Visualisation.
Internet Cache Pollution Attacks and Countermeasures Yan Gao, Leiwen Deng, Aleksandar Kuzmanovic, and Yan Chen Electrical Engineering and Computer Science.
Introspective Replica Management Yan Chen, Hakim Weatherspoon, and Dennis Geels Our project developed and evaluated a replica management algorithm suitable.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Squirrel: A decentralized peer- to-peer web cache Paul Burstein 10/27/2003.
Adaptive Web Caching Lixia Zhang, Sally Floyd, and Van Jacob-son. In the 2nd Web Caching Workshop, Boulder, Colorado, April 25, System Laboratory,
Wide Web Load Balancing Algorithm Design Yingfang Zhang.
Web Caching Schemes For The Internet – cont. By Jia Wang.
The Medusa Proxy A Tool For Exploring User- Perceived Web Performance Mimika Koletsou and Geoffrey M. Voelker University of California, San Diego Proceeding.
ICTTA'04Arwa zabian1 On The Latency of BFS Interval Cooperation Web Caching Arwa Zabian Maurizio Bonuccelli Department of Computer Science University of.
Large-Scale Web Caching and Content Delivery Jeff Chase CPS 212: Distributed Information Systems Fall 2000.
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy,
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
A novel approach of gateway selection and placement in cellular Wi-Fi system Presented By Rajesh Prasad.
Mar 1, 2004 Multi-path Routing CSE 525 Course Presentation Dhanashri Kelkar Department of Computer Science and Engineering OGI School of Science and Engineering.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
Performance of Web Proxy Caching in Heterogeneous Bandwidth Environments IEEE Infocom, 1999 Anja Feldmann et.al. AT&T Research Lab 발표자 : 임 민 열, DB lab,
Network Protocols: Design and Analysis Polly Huang EE NTU
Hot Systems, Volkmar Uhlig
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
The Measured Access Characteristics of World-Wide-Web Client Proxy Caches Bradley M. Duska, David Marwood, and Michael J. Feeley Department of Computer.
An Overview of Proxy Caching Algorithms Haifeng Wang.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
On the scale and performance of cooperative Web proxy caching 2/3/06.
© 2003, Carla Ellis Strong Inference J. Pratt Progress in science advances by excluding among alternate hypotheses. Experiments should be designed to disprove.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
On Caching Search Engine Query Results Evangelos Markatos Evangelos Markatoshttp://archvlsi.ics.forth.gr/OS/os.html Computer Architecture and VLSI Systems.
Measurement-based Design
Presented by Tashana Landray
Web Caching? Web Caching:.
PA an Coordinated Memory Caching for Parallel Jobs
Internet Networking recitation #12
Networking Computer network A collection of computing devices that are connected in various ways in order to communicate and share resources Usually,
Cluster Resource Management: A Scalable Approach
CLUSTER COMPUTING.
Cooperative Caching, Simplified
Applying SVM to Data Bypass Prediction
EE 122: Lecture 22 (Overlay Networks)
Midterm.
Presentation transcript:

On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin, Henry Levy

Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most commonly used method to improve Web performance Duplicate requests to the same document served from cache Hits reduce latency, network utilization, server load Misses increase latency (extra hops) Clients Proxy Cache Servers Hits Misses Internet

Cache Effectiveness Previous work has shown that hit rate increases with population size [ Duska et al. 97, Breslau et al. 98 ] A single proxy cache has practical limits Load, network topology, organizational constraints One technique to scale the client population is to have proxy caches cooperate

Cooperative Web Proxy Caching Sharing and/or coordination of cache state among multiple Web proxy cache nodes Effectiveness of proxy cooperation depends on: Inter-proxy communication distance Size of client population served Proxy utilization and load balance Clients Proxy Internet

Cooperative Web Caching How much benefit does cooperative caching provide in the Web environment?

Outline Introduction & related work Trace methodology Cooperative caching for small & medium scale populations Scaling to larger client populations Latency Conclusions

Previous Research Cooperative proxy caching is a popular research topic: [e.g. Chankhunthod et al. 96, Zhang et al. 97, Fan et al. 98, Krishnan et al. 98, Menaud et al. 98, Tewari et al. 98, Touch 98, Karger et al. 99 ...] Focus is on highly scalable algorithms Some seek to scale to the entire Web

Challenges No real understanding of document sharing across diverse organizations Little analytic or empirical evaluation of these algorithms using realistic workloads for large-scale client populations Problem: Evaluating cooperative proxy caching requires multiple simultaneous traces of Web proxies, across a diverse set of organizations

Our Contribution We use multi-organization traces to evaluate cooperative proxy caching at small and medium scales We use analytic modelling to evaluate cooperative caching at scales beyond those available in our traces

A Multi-Organization Trace University of Washington (UW) is a large and diverse client population Approximately 50K people UW client population contains 200 independent campus organizations Museums of Art and Natural History Schools of Medicine, Dentistry, Nursing Departments of Computer Science, History, and Music A trace of UW is effectively a simultaneous trace of 200 diverse client organizations

Cooperation Across Organizations By considering each UW organization as an independent “company” with its own clients and its own proxy, we can empirically evaluate cooperative caching across diverse client populations How much Web document reuse is there between these organizations? Place a proxy cache in front of each organization. What is the benefit of cooperative caching among these 200 proxies?

UW Trace Characteristics Trace collected at UW network border (May 1999) Filtered: requests from UW clients, responses from external Web servers Most of requests come directly from clients (0.5% come from proxies)

Question What is the benefit of cooperative caching among the 200 UW organizational proxies?

Ideal Hit Rates for UW proxies Ideal hit rate - infinite storage, ignore cacheability, expirations Average ideal local hit rate: 43%

Ideal Hit Rates for UW proxies Ideal hit rate - infinite storage, ignore cacheability, expirations Average ideal local hit rate: 43% Explore benefits of perfect cooperation rather than a particular algorithm Average ideal hit rate increases from 43% to 69% with cooperative caching

Cacheable Hit Rates for UW proxies Cacheable hit rate - same as ideal, but doesn’t ignore cacheability Cacheable hit rates are much lower than ideal (average is 20%) Average cacheable hit rate increases from 20% to 41% with (perfect) cooperative caching

Scaling Cooperative Caching Organizations of this size can benefit significantly from cooperative caching We don’t need cooperative caching to handle the entire UW population size A single proxy (or small cluster) can handle this entire population! No technical reason to use cooperative caching for this environment In the real world, decisions of proxy placement are often political or geographical How effective is cooperative caching at scales where a single cache will not work?

Hit Rate vs. Client Population Curves similar to other studies [e.g., Duska97, Breslau98] Small organizations Significant increase in hit rate as client population increases The reason why cooperative caching is effective for UW Large organizations Marginal increase in hit rate as client population increases

Extrapolation to Larger Client Populations Use least squares fit to create a linear extrapolation of hit rates Hit rate increases logarithmically with client population, e.g., to increase hit rate by 10%: Need 8 UWs (ideal) Need 11 UWs (cacheable) “Low ceiling”, though: 100% at 11.3M clients (UW ideal) 61% at 2.1M clients (UW cacheable) A city-wide cooperative cache would get all the benefit

Question What is the benefit of cooperative caching among large organizations?

UW & Microsoft Cooperation What if we ran a wire across Lake Washington, to connect UW & Microsoft? We collected a Microsoft proxy trace during same time period as the UW trace Combined population is ~80K clients Increases the UW population by a factor of 3.6 Increases the MS population by a factor of 1.4

UW & Microsoft Traces

UW & MS Cooperative Caching Is this worth it?

What about Latency? From the client’s perspective, latency matters far more than hitrate How does latency change with population? Median latencies improve only a few 100 ms with ideal caching compared to no caching. On average, a web page consists of 4.5 HTTP objects

Conclusions A negative result: without significant workload changes, designing highly-scalable cooperative proxy-cache schemes is unnecessary Largest benefit is achieved with small populations (up to 2K-5K clients) Limited benefit of cooperation when we combined the UW & Microsoft populations Document cacheability is a severe limitation with current workloads Analytic model results: Confirm that most of benefit is obtained once you reach populations the size of a large city Future workloads: large-scale cooperative caching could become more relevant with different rate-of-change characteristics

UW Cooperative Caching Results

Extrapolating UW & MS Hit Rates

UW Latency

Complete Hit Rate Graph

Blank Slide blankness here...