1 The Content and Access Dynamics of a Busy Web Server: Findings and Implications Venkata N. Padmanabhan Microsoft Research Lili Qiu Cornell University.

Slides:



Advertisements
Similar presentations
1 Analyzing Browse Patterns of Mobile Clients Lili Qiu Joint work with Atul Adya and Victor Bahl Microsoft Research ACM.
Advertisements

Inktomi Confidential and Proprietary The Inktomi Climate Lab: An Integrated Environment for Analyzing and Simulating Customer Network Traffic Stephane.
Availability in Globally Distributed Storage Systems
2005/2/23 HUT T Characterizing Web Workload of Mobile Clients Chuang Yu Juha Raitio.
Multi-Layer Analysis of Web Browsing Performance for Wireless PDAs Adesola Omotayo & Carey Williamson June 1, 2015.
Small-World File-Sharing Communities Adriana Iamnitchi, Matei Ripeanu and Ian Foster,
1 School of Computing Science Simon Fraser University, Canada Modeling and Caching of P2P Traffic Mohamed Hefeeda Osama Saleh ICNP’06 15 November 2006.
Small-world Overlay P2P Network
Fresh Analysis of Streaming Media Stored on the Web Rabin Karki M.S. Thesis Presentation Advisor: Mark Claypool Reader: Emmanuel Agu 10 Jan, 2011.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
1 Clustering Web Content for Efficient Replication Yan Chen, Lili Qiu*, Weiyu Chen, Luan Nguyen, Randy H. Katz EECS Department UC Berkeley *Microsoft Research.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
ECE 7995 CACHING AND PREFETCHING TECHNIQUES. Locality In Search Engine Queries And Its Implications For Caching By: LAKSHMI JANARDHAN – ba8671 JUNAID.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Towards a Better Understanding of Web Resources and Server Responses for Improved Caching Craig E. Wills and Mikhail Mikhailov Computer Science Department.
IMC 2004Jeff Pang 1 Availability, Usage, and Deployment Characteristics of the Domain Name System Jeffrey Pang *, James Hendricks *, Aditya Akella *, Roberto.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
Adaptive Content Delivery for Scalable Web Servers Authors: Rahul Pradhan and Mark Claypool Presented by: David Finkel Computer Science Department Worcester.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Clustering of Web Content for Efficient Replication Yan Chen, Lili Qiu, Wei Chen, Luan Nguyen and Randy H. Katz {yanchen, wychen, luann,
World Wide Web Caching: Trends and Technology Greg Barish and Katia Obraczka USC Information Science Institute IEEE Communications Magazine, May 2000 Presented.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 1: Introduction to Windows Server 2003.
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
By Huang et al., SOSP 2013 An Analysis of Facebook Photo Caching Presented by Phuong Nguyen Some animations and figures are borrowed from the original.
Harness Your Internet Activity. Zeroing in On Zero Days DNS OARC Spring 2014 Ralf Weber
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
Ao-Jan Su, David R. Choffnes, Fabián E. Bustamante and Aleksandar Kuzmanovic Department of EECS Northwestern University Relative Network Positioning via.
Performance of Web Applications Introduction One of the success-critical quality characteristics of Web applications is system performance. What.
1 An SLA-Oriented Capacity Planning Tool for Streaming Media Services Lucy Cherkasova, Wenting Tang, and Sharad Singhal HPLabs,USA.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.
1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University.
Strong Cache Consistency Support for Domain Name System Xin Chen, Haining Wang, Sansi Ren and Xiaodong Zhang College of William and Mary, Williamsburg,
Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan,
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
1 Lifetime Behavior and its Impact on Web Caching X. Chen and P. Mohapatra, IEEE Workshop on Internet Applications (WIAPP), 김호중, CA Lab. Site 별,
1 An Integrated Approach to Improving Web Performance Lili Qiu Cornell University B-exam December, 2000.
Characterizing User Access To Videos On The World Wide Web MMCN 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Peter Parnes.
Summary of WWW Characterizations James E. Pitkow Xerox Palo Alto Research Center WWW Journal 99 발표자 : 노양우.
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
Hot Systems, Volkmar Uhlig
1 Network Tomography Using Passive End-to-End Measurements Venkata N. Padmanabhan Lili Qiu Helen J. Wang Microsoft Research DIMACS’2002.
We.b : The web of short URLs Demetris Antoniades, lasonas Polakis, Gerogios Kontaxis, Elias Athansapoulos, Sotiris loannidis, Evangelos P.Markatos, Thomas.
Evaluation of a Novel Two-Step Server Selection Metric Presented by Karthik Lakshminarayanan
1 Exploiting Nonstationarity for Performance Prediction Christopher Stewart (University of Rochester) Terence Kelly and Alex Zhang (HP Labs)
An Overview of Proxy Caching Algorithms Haifeng Wang.
MiddleMan: A Video Caching Proxy Server NOSSDAV 2000 Brian Smith Department of Computer Science Cornell University Ithaca, NY Soam Acharya Inktomi Corporation.
Web Prefetching Lili Qiu Microsoft Research March 27, 2003.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
#16 Application Measurement Presentation by Bobin John.
On the scale and performance of cooperative Web proxy caching 2/3/06.
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego.
Does Internet media traffic really follow the Zipf-like distribution? Lei Guo 1, Enhua Tan 1, Songqing Chen 2, Zhen Xiao 3, and Xiaodong Zhang 1 1 Ohio.
Coral: A Peer-to-peer Content Distribution Network
The Impact of Replacement Granularity on Video Caching
On the Scale and Performance of Cooperative Web Proxy Caching
Group 3: Olena Hunsicker and Divya Josyula
Presentation transcript:

1 The Content and Access Dynamics of a Busy Web Server: Findings and Implications Venkata N. Padmanabhan Microsoft Research Lili Qiu Cornell University SIGCOMM’2000, Stockholm, Sweden August 30, 2000

2 Outline Motivation Related Work Overview Content Dynamics Access Dynamics Summary & Implications Future Work

3 Motivation Solid understanding of Web workload is critical for designing robust and scalable systems Each of the Web components provides a unique perspective on the functioning of the Web Internet replica proxy replica proxy Clients Servers

4 Motivation (Cont.) Distinguishing features of our work Study MSNBC web site a large news server consistently ranked among the busiest sites in the Web Study content & access dynamics The dynamics of file modification and creation The dynamics of users access

5 Related Work Server-based study [ABC+96] observed File popularity follows Zipf’s distribution (   1) Temporal locality in file accesses [AW96] found 10 invariants 10% files account for 90% accesses [MS97] Long latencies are not necessarily due to server over- loading or CGI traffic [AJ99] studied 1998 worldcup traces Significant volume of cache consistency traffic

6 Related Work (Cont.) Proxy workload characterization Page popularity follows a Zipf-like distribution, i.e. request frequency  1/i  (  < 1) [BCF+99] Hit rate of proxy caches no more than 50% [DMF97,GB97] A substantial fraction of misses arises from first- time accesses [VDA+99] Significance in organizational membership [WVS+99] Client-based study [CBC95] and [BBB+98] report Change in file popularity and temporal locality

7 Overview MSNBC server site a large news site server cluster with 40 nodes 25 million accesses a day (HTML content alone) Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd Server logs HTTP access logs Content Replication System (CRS) logs HTML content logs Data analysis Content dynamics Access dynamics

8 Major Findings Content dynamics Modification history is a rough predictor Frequent but minimal file modifications Access dynamics Set of popular files remains stable for days Domain membership has a significant bearing on client accesses except during a flash crowd of global interest Zipf-like distribution of file popularity but with a much larger  than at proxies Accesses to old documents account for most first- time misses  hard to anticipate such accesses, and eliminate these first-time misses

9 Content Dynamics Period studied: 10/1/99 – 10/28/99 CDF of modification intervals Distinct knees in the CDF at one hour and one day Predictive power of modification history Modification history is a rough predictor of future modification interval Extent of change upon file modification Most file modifications are minimal  delta encoding can be very useful

10 CDF of Modification Intervals Distinct knees in the CDF at one hour and one day

11 Predictive Power of Modification History Has significant bearing on cache consistency control algorithms, such as adaptive TTL Prediction algorithm studied Estimate the future modification interval as the mean of the past x samples Performance metrics Correlation coefficient between the predicted and actual values Error in prediction

12 Correlation Coefficient A larger averaging window size helps to predict the future modification interval up to a certain point.

13 Error in Prediction Averaging window: 16 samples Mean error: 226% Median error: 45% Percentage error in predicting file modification interval Modification history yields a rough predictor  need alternative mechanism (e.g. call-back based invalidation) as backup

14 Extent of Change Upon File Modifications  Compute delta using vdelta algorithm  Metric  as |vdelta(v1,v2)| |v1|+|v2| 2  Results  In 77% cases,   1%  In 96% cases,   10% Modification between successive versions is small  Delta encoding can be very useful

15 Access Dynamics Correlation between content and access dynamics Impact of age on file popularity Causes of first-time misses Spatial locality in client accesses Domain membership is significant except when there is a “hot” event of global interest Temporal stability of file popularity The set of popular documents mostly remains stable over a timescale of days Distribution of file popularity Zipf-like distribution but with a much larger  than at proxies

16 Impact of Age on Popularity For most documents, accesses are concentrated soon after creation

17 Causes of First-time Misses Up to 40% of cache misses are due to first time misses [VDA+99] DateNew files (%)Old files (%) Oct. 8, Oct. 9, Oct. 10, Oct. 11, Accesses to old documents account for most first-time misses  hard to anticipate such accesses & eliminate first-time misses

18 Temporal Stability of File Popularity Methodology Consider the traces from a pair of days Pick the top n popular documents from each day Compute the overlap Results One day apart:significant overlap (  80%) Two months apart: smaller overlap (20-80%) Ten months apart: very small overlap (mostly below 20%) The set of popular documents remains stable for days

19 Spatial Locality in Client Accesses Domain membership is significant except when there is a “hot” event of global interest

20 The Applicability of Zipf-law to Web requests The Web requests follow Zipf-like distribution Request frequency  1/i , where i is a document’s ranking The value of  is much larger in MSNBC traces 1.4 – 1.8 in MSNBC traces smaller or close to 1 in the proxy traces close to 1 in the small departmental server logs [ABC+96] Highest when there is a hot event

21 Impact of larger  Accesses in MSNBC traces are much more concentrated 90% of the accesses are accounted by Top 2-4% files in MSNBC traces Top 36% files in proxy traces (Microsoft proxies and the proxies studied in [BCF+99]) Top 10% files in small departmental server logs reported in [AW96] Popular news sites like MSNBC see much more concentrated accesses  Reverse caching and replication can be very effective!

22 Summary of Results & Implications FactsImplications Past modification history, when averaged over a sufficiently large window, yields a rough predictor Guide for setting TTL, but need alternative mechanism (e.g. callback- based invalidation) as backup Modification between successive versions is small Delta encoding can be very useful

23 Summary of Results & Implications (Cont.) FactsImplications The set of popular documents remains stable over a timescale of days Prefetch/push previously popular files that have undergone modification File popularity follows Zipf- like distribution, but with a much larger  than at proxies Potential of reverse caching & replication Accesses to old documents account for most first-time accesses Hard to anticipate such accesses, and eliminate first-time misses

24 Future Work Study data sets from other large server sites Different types of Web servers may have very different workload More studies such as ours will be needed Develop efficient cache consistency algorithms

25 Acknowledgement Jason Bender and Ian Marriott Erich Nahum Kiem-Phong Vo Damon Cole, Susan Dumais, Niccole Golden, Chris Haslam, Eric Horvitz, Geoff Voelker Anonymous reviewers