Application Measurements: Web Measurement. Motivation Web is the single most popular Internet application. Measurement can be very useful.

Slides:



Advertisements
Similar presentations
Inktomi Confidential and Proprietary The Inktomi Climate Lab: An Integrated Environment for Analyzing and Simulating Customer Network Traffic Stephane.
Advertisements

Internet Applications INTERNET APPLICATIONS. Internet Applications Domain Name Service Proxy Service Mail Service Web Service.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
Web Measurement Chapter 7, Section 7.3 Hessam Mirsadeghi ECE Department, University of Tehran Fall 2009.
Web Cache Characterizing Roles of Front-end Servers in End-to-End Performance of Dynamic Content Distribution  Li ZHANG  Dakuo WANG.
Outline Web measurement motivation Challenges of web measurement Web measurement tools Current web measurements Web properties Web traffic data gathering.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Web Server Hardware and Software
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
CDNs & Replication Prof. Vern Paxson EE122 Fall 2007 TAs: Lisa Fowler, Daniel Killebrew, Jorge Ortiz.
Design, Implementation and Evaluation of a Client Characterization Driven Web Server Balachander Krishnamurthy – AT&T Research Labs Balachander Krishnamurthy.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
1 Drafting Behind Akamai (Travelocity-Based Detouring) AoJan Su, David R. Choffnes, Aleksandar Kuzmanovic, and Fabian E. Bustamante Department of Electrical.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
Evaluation of the Proximity between Web Clients and their Local DNS Servers Z. Morley Mao UC Berkeley C. Cranor, M. Rabinovich,
1 Web Content Delivery Reading: Section and COS 461: Computer Networks Spring 2007 (MW 1:30-2:50 in Friend 004) Ioannis Avramopoulos Instructor:
Unconstrained Endpoint Profiling (Googling the Internet)‏ Ionut Trestian Supranamaya Ranjan Aleksandar Kuzmanovic Antonio Nucci Northwestern University.
Network-Aware Clustering of Web Clients Advanced IP Topics Seminar, Fall 2000 Supervisor: Anat Bremler Speaker: Zotenko Elena.
Evaluation of the Proximity between Web Clients and their Local DNS Servers Z. Morley Mao Chuck Cranor, Fred Douglis, Misha Rabinovich, Oliver Spatscheck,
Caching and Content Distribution Networks. Web Caching r As an example, we use the web to illustrate caching and other related issues browser Web Proxy.
 Proxy Servers are software that act as intermediaries between client and servers on the Internet.  They help users on private networks get information.
Information-Centric Networks05a-1 Week 5 / Paper 1 On the use and performance of content distribution networks –Balachander Krishnamurthy, Craig Wills,
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
Content Distribution March 8, : Application Layer1.
Active Network Applications Tom Anderson University of Washington.
1 Proceeding the Second Exercises on Computer and Systems Engineering Professor OKAMURA Laboratory. Othman Othman M.M.
/dev/urandom Barry Britt, Systems Support Group Department of Computer Science Iowa State University.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
Krerk Piromsopa. Advance Net-Centric Computing Technology Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
Advanced Network Architecture Research Group 2001/11/149 th International Conference on Network Protocols Scalable Socket Buffer Tuning for High-Performance.
Lecturer: Ghadah Aldehim
Copyright © 2002 Pearson Education, Inc. Slide 3-1 CHAPTER 3 Created by, David Zolzer, Northwestern State University—Louisiana The Internet and World Wide.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Identifying Application Impacts on Network Design Designing and Supporting Computer.
Component 4: Introduction to Information and Computer Science Unit 2: Internet and the World Wide Web Lecture 2 This material was developed by Oregon Health.
Master Thesis Defense Jan Fiedler 04/17/98
2: Application Layer1 Chapter 2 outline r 2.1 Principles of app layer protocols r 2.2 Web and HTTP r 2.3 FTP r 2.4 Electronic Mail r 2.5 DNS r 2.6 Socket.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Identifying Application Impacts on Network Design Designing and Supporting.
Hour 7 The Application Layer 1. What Is the Application Layer? The Application layer is the top layer in TCP/IP's protocol suite Some of the components.
Advanced Network Architecture Research Group 2001/11/74 th Asia-Pacific Symposium on Information and Telecommunication Technologies Design and Implementation.
Hands-On Microsoft Windows Server Implementing Microsoft Internet Information Services Microsoft Internet Information Services (IIS) –Software included.
1 UNIT 13 The World Wide Web Lecturer: Kholood Baselm.
Measuring and Mitigating Web Performance Bottlenecks in Broadband Access Networks Srikanth Sundaresan, Nick Feamster (Georgia Tech) Renata Teixeira (Inria)
Doc.: IEEE /1317r0 Submission December 2009 Vinko Erceg, BroadcomSlide 1 Internet Traffic Modeling Date: Authors: NameAffiliationsAddressPhone .
On the Impact of Clustering on Measurement Reduction May 14 th, D. Saucez, B. Donnet, O. Bonaventure Thanks to P. François.
27.1 Chapter 27 WWW and HTTP Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Overlay Networks : An Akamai Perspective
On the Effect of Server Adaptation for Web Content Delivery IMW ’ 02, Marseille, Nov Joint work with Balachander Krishnamurthy (AT&T) Craig Wills.
Information-Centric Networks Section # 5.1: Content Distribution Instructor: George Xylomenos Department: Informatics.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
09/13/04 CDA 6506 Network Architecture and Client/Server Computing Peer-to-Peer Computing and Content Distribution Networks by Zornitza Genova Prodanoff.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
#16 Application Measurement Presentation by Bobin John.
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
Performance Evaluation of Redirection Schemes in Content Distribution Networks Jussi Kangasharju, Keith W. Ross Institut Eurecom Jim W. Roberts France.
Whole Page Performance Leeann Bent and Geoffrey M. Voelker University of California, San Diego.
1 UNIT 13 The World Wide Web. Introduction 2 Agenda The World Wide Web Search Engines Video Streaming 3.
1 UNIT 13 The World Wide Web. Introduction 2 The World Wide Web: ▫ Commonly referred to as WWW or the Web. ▫ Is a service on the Internet. It consists.
NT1210 Introduction to Networking
Heat-seeking Honeypots: Design and Experience John P. John, Fang Yu, Yinglian Xie, Arvind Krishnamurthy and Martin Abadi WWW 2011 Presented by Elias P.
Presentation on Distributed Web Based Systems Submitted by WWW
Content Distribution Networks
The Internet.
Processes The most important processes used in Web-based systems and their internal organization.
CHAPTER 3 Architectures for Distributed Systems
Early Measurements of a Cluster-based Architecture for P2P Systems
Section 14.1 Section 14.2 Identify the technical needs of a Web server
Content Distribution Networks
EE 122: Lecture 22 (Overlay Networks)
Unconstrained Endpoint Profiling (Googling the Internet)‏
Presentation transcript:

Application Measurements: Web Measurement

Motivation Web is the single most popular Internet application. Measurement can be very useful.

Stanford versus MIT Web Stanford MIT StanfordMIT Users with non-empty WWW directories Percent who link to at least one other person14%33% Percent who are linked to by at least one other person 22%58% Percent with links in either direction29%69% Percent with links in both directions7%22%

Bow-tie of the WWW

Challenges to measurement Hidden Data – Much of the traffic is intra-net and inaccessible. – Access to remote server data, even old logs is often unavailable. – From the server end, information about the clients (e.g. connection bandwidth) is obscured. Hidden layers – Measuring the in flight packets is much harder than measuring the server response time the protocol and network layers are harder to measure. Hidden entities – The web involves proxies, HTTP and TCP redirectors

Tools: Sampling and DNS Sampling traffic (e.g. netflow) can help determine the fraction of HTTP traffic. Examine DNS records. – Well know sites are more likely to be looked up often.

Tools: Server logs From a web server perspective, you can examine the server logs. However, there are some challenges here: – Web crawlers – Clients hidden behind proxies

Tools: Surveys Estimating the number of web servers can be done via surveys. Users can download a tool bar and rank sites.

Tools: Locating servers We might assume that the servers for a site would be in a fixed geographical location. However: – Servers can be mirrored in different locations – Several businesses can use the same server farm to increase utilization.

Tools: Web crawling

Tools: Web performance Approaches: – Measuring a particular web site’s latency and availability form a number of client perspectives. – Examining different latency components such as DNS, TCP or HTTP differences, and CDNs – Global measurements of the web to examine protocol compliance, ensure reduction of outages and look at the dark site of the web. A variety of companies offer such services: – Keynote, Akamai, etc.

Tools: Role of Network aware clustering We can cluster groups of IP addresses using BGP routing table snapshots and longest prefix matching. This clustering allows for better analysis of server logs. Balachander Krishnamurthy and Jia Wang. On Network-Aware Clustering of Web Clients. In Proceedings of ACM Sigcomm, August 2000.

Tools: Handling mobile clients Jesse Steinberg and Joseph Pasquale. A Web Middleware Architecture for Dynamic Customization of Content for Wireless Clients. In Proceedings of the World Wide Web Conference, May 2002.

Tools: Handling mobile clients Figure 3. Document Browsing with Summarizer on WAP Christopher C. Yang and Fu Lee Wang. Fractal Summarization for Mobile Devices to Access Large Documents on the Web. In Proceedings of the World Wide Web Conference, May 2003.

Tools: Handling mobile clients Mobile web use continues to grow. Similar methods: – Server logs of mobile content providers – Lab experiments (e.g emulate mobile devices, induce packet loss) – Wide-area experiments

State of the Art Four main parts of Web Measurement: – High level characterization (properties) – Traffic gathering and analysis – Performance issues (CDNs, client connectivity, compliance) – Applications (searching, flash crowds, blogs)

Web properties: high level The number of Web sites numbers in the tens of millions. Popular search engines index billions of web pages, and exclude private Intranets. There has been a shift from Web, to P2P and now to CDN in the traffic patterns of the Internet. Monthly surveys by sites like Netcraft have shown around a million new sites a month. Estimates in the fall of 2014 showed 959 million web sites, – the vast majority have little or no traffic compared to the top 180 K – 39 million in March 2014

Web Properties: High level Netcraft survey. (news.netcraft.com)

Web Properties: High Level Netcraft survey. (news.netcraft.com)

Web properties: Location Steadily number of users are in Asian countries such as China and India. The fraction of web content from the US and Europe is falling.

Web properties: Configuration Popular sites use a variety of techniques to improve server performance: – Distribute servers geographically (e.g. 3 world cup servers in the U.S., 1 in France) – Use a reverse proxy to cache common requests. – CDNs – Cloud Figure 10-10: Cisco DistributedDirector

Web properties: User workload Models We measure user workload by looking at: – the duration of HTTP connections – request and response sizes, – unique number of IP addresses contacting a given Web site – number of distinct sites accessed by a client population, number – frequency of accesses of individual resources at a given Web site – distribution of request methods and response codes

Web properties: Traffic perspective Redirector devices at the edge of an ISP network can serve web pages from a cache These traditional caches are still sold. Reduction in cache hit rates have prompted companies (e.g. NetScaler, Redline) to integrate caching with other services.

Web Traffic: Software Aid In order to study the web traffic, a large number of geographically separate measurements need to be repeatedly done. httperf: – Sends HTTP requests and processes responses – Simulates workload – Gathers statistics

Web Traffic: Software Aid (2) wget – Fetches a large number of pages located at a root node. – Can fetch all the pages up to a certain “level” according to links Mercator (a personalized crawler) – Uses a seed page and then does breadth-first search on the links to find pages.

Web Performance: Intro User-perceived latency is a key factor because it affects the popularity of a site. In one study that passively gathered HTTP data for one day found that beyond a certain delay, user cancellations of the page increased sharply.

Web Performance: CDN’s Content distribution networks (CDNs) combine the workload of several sites into a single provider. The CDNs can be mirrored to be located near clients. – DNS can be used to redirect clients to mirror sites.

How CDN Works

Web Performance: CDNs Zhuoqing Morley Mao, Charles D. Cranor, Fred Douglis, Michael Rabinovich, Oliver Spatscheck, and Jia Wang. A precise and efcient evaluation of the proximity between web clients and their local DNS servers. In Proceedings of the USENIX Technical Conference, Monterey, CA, June 2002.

Web Performance: CDNs Balachander Krishnamurthy, Craig Wills, and Yin Zhang. On the use and performance of content distribution networks. In Proceedings of the ACM SIGCOMM Internet Measurement Workshop, San Francisco, November 2001.

Web performance: Client connectivity It is not practical to dynamically query a client’s connectivity type, however such data can be stored on a server. We can measure the inter-arrival time of requests. – Clients with higher bandwidth connections are more likely to request pages sooner. If we assume that client connectivity will be stationary (as one experiment showed), then we can adapt the server response based on the client connectivity

Web performance: Client connectivity Balachander Krishnamurthy, Craig E. Wills, Yin Zhang, and Kashi Vishwanath. Design, Implementation, and Evaluation of a Client Characterization Driven Web Server. In Proceedings of the World Wide Web Conference, May Server Action conclusions: - Compression - consistently good results for poorer but not well- connected clients. - Reducing the quality of objects only yielded benefits for a modem client. - Bundling was effective when there was good connectivity or poor connectivity with large latency. - Persistent connections with serialized requests did not show significant improvement - Pipelining was only significant for client with high throughput or RTT.

Web performance: protocol compliance A 16-month study used the httperf tool to test for HTTP protocol compliance. The popular Apache server was most compliant, then Microsoft’s IIS.