On Network-Aware Clustering of Web Clients Balachander Krishnamurthy AT&T Labs-Research, Florham Park, NJ, USA Jia Wang

Slides:



Advertisements
Similar presentations
SkipNet: A Scalable Overlay Network with Practical Locality Properties Nick Harvey, Mike Jones, Stefan Saroiu, Marvin Theimer, Alec Wolman Microsoft Research.
Advertisements

Topology Modeling via Cluster Graphs Balachander Krishnamurthy and Jia Wang AT&T Labs Research.
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
Ningning HuCarnegie Mellon University1 Optimizing Network Performance In Replicated Hosting Peter Steenkiste (CMU) with Ningning Hu (CMU), Oliver Spatscheck.
1 Content Delivery Networks iBAND2 May 24, 1999 Dave Farber CTO Sandpiper Networks, Inc.
An Engineering Approach to Computer Networking
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada ISP-Friendly Peer Matching without ISP Collaboration Mohamed Hefeeda (Joint.
Mini Introduction to BGP Michalis Faloutsos. What Is BGP?  Border Gateway Protocol BGP-4  The de-facto interdomain routing protocol  BGP enables policy.
MOBILITY SUPPORT IN IPv6
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
1 Deriving Traffic Demands for Operational IP Networks: Methodology and Experience Anja Feldmann*, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Optimal Proxy Cache Allocation for Efficient Streaming Media Distribution Bing Wang, Subhabrata Sen, Micah Adler, and Don Towsley INFOCOM 2002.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Routing and Routing Protocols
1 Deriving Traffic Demands for Operational IP Networks: Methodology and Experience Anja Feldmann*, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer.
Evaluation of the Proximity between Web Clients and their Local DNS Servers Z. Morley Mao UC Berkeley C. Cranor, M. Rabinovich,
Network-Aware Clustering of Web Clients Advanced IP Topics Seminar, Fall 2000 Supervisor: Anat Bremler Speaker: Zotenko Elena.
Evaluation of the Proximity between Web Clients and their Local DNS Servers Z. Morley Mao Chuck Cranor, Fred Douglis, Misha Rabinovich, Oliver Spatscheck,
1 Semester 2 Module 6 Routing and Routing Protocols YuDa college of business James Chen
CS 4700 / CS 5700 Network Fundamentals Lecture 17.5: Project 5 Hints (Getting a job at Akamai) Revised 3/31/2014.
Lecture Week 8 The Routing Table: A Closer Look
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
A LIGHT-WEIGHT DISTRIBUTED SCHEME FOR DETECTING IP PREFIX HIJACKS IN REAL TIME Changxi Zheng, Lusheng Ji, Dan Pei, Jia Wang and Paul Francis. Cornell University,
NAROS : Host-Centric IPv6 Multihoming with Traffic Engineering A solution to perform traffic engineering in a IPv6 multihomed end-site, using a multi-addressing.
Impact of Prefix Hijacking on Payments of Providers Pradeep Bangera and Sergey Gorinsky Institute IMDEA Networks, Madrid, Spain Developing the Science.
Real-Time BGP Data Access 1 Mikhail Strizhov Colorado State University.
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 6 Routing and Routing Protocols.
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
CCNA 1 Module 10 Routing Fundamentals and Subnets.
Using Measurement Data to Construct a Network-Wide View Jennifer Rexford AT&T Labs—Research Florham Park, NJ
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
TOMA: A Viable Solution for Large- Scale Multicast Service Support Li Lao, Jun-Hong Cui, and Mario Gerla UCLA and University of Connecticut Networking.
Review of the literature : DMND:Collecting Data from Mobiles Using Named Data Takashima Daiki Park Lab, Waseda University, Japan 1/15.
Web Cache Redirection using a Layer-4 switch: Architecture, issues, tradeoffs, and trends Shirish Sathaye Vice-President of Engineering.
1 On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001,
More on Internet Routing A large portion of this lecture material comes from BGP tutorial given by Philip Smith from Cisco (ftp://ftp- eng.cisco.com/pfs/seminars/APRICOT2004.
1 Route Optimization for Large Scale Network Mobility Assisted by BGP Feriel Mimoune, Farid Nait-Abdesselam, Tarik Taleb and Kazuo Hashimoto GLOBECOM 2007.
IPv6 Routing Milo Liu SW2 R&D ZyXEL Communications, Inc.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering.
A Light-Weight Distributed Scheme for Detecting IP Prefix Hijacks in Real-Time Lusheng Ji†, Joint work with Changxi Zheng‡, Dan Pei†, Jia Wang†, Paul Francis‡
The Intranet.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
CCNA 2 Week 6 Routing Protocols. Copyright © 2005 University of Bolton Topics Static Routing Dynamic Routing Routing Protocols Overview.
Internet Protocol: Routing IP Datagrams Chapter 8.
DYNAMIC LOAD BALANCING ON WEB-SERVER SYSTEMS by Valeria Cardellini Michele Colajanni Philip S. Yu.
Routing protocols. Static Routing Routes to destinations are set up manually Route may be up or down but static routes will remain in the routing tables.
On the Impact of Clustering on Measurement Reduction May 14 th, D. Saucez, B. Donnet, O. Bonaventure Thanks to P. François.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Routing and Routing Protocols PJC CCNA Semester 2 Ver. 3.0 by William Kelly.
© 2002, Cisco Systems, Inc. All rights reserved..
1 Chapter 8: DHCP in IP Configuration Designs Designs That Include DHCP Essential DHCP Design Concepts Configuration Protection in DHCP Designs DHCP Design.
BGP Routing Stability of Popular Destinations Jennifer Rexford, Jia Wang, Zhen Xiao, and Yin Zhang AT&T Labs—Research Florham Park, NJ All flaps are not.
1 Chapter 14-16a Internet Routing Review. Chapter 14-16: Internet Routing Review 2 Introduction Motivation: Router performance is critical to overall.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
1 On the Impact of Route Monitor Selection Ying Zhang* Zheng Zhang # Z. Morley Mao* Y. Charlie Hu # Bruce M. Maggs ^ University of Michigan* Purdue University.
Presentation on Distributed Web Based Systems Submitted by WWW
BGP Routing Stability of Popular Destinations
Affinity Depending on the application and client requirements of your Network Load Balancing cluster, you can be required to select an Affinity setting.
Chapter 6 – Routing.
Early Measurements of a Cluster-based Architecture for P2P Systems
Group 3: Olena Hunsicker and Divya Josyula
Chord and CFS Philip Skov Knudsen
Classless and Subnet Address Extensions (CIDR)
An Engineering Approach to Computer Networking
Architectural Implications of the “FixIt” KP Application
Presentation transcript:

On Network-Aware Clustering of Web Clients Balachander Krishnamurthy AT&T Labs-Research, Florham Park, NJ, USA Jia Wang Cornell University, Ithaca, NY, USA

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients2 Outline Introduction Simple approaches to clustering Network-aware approach Applications of client clustering Conclusion and future work

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients3 Introduction Original goal: identify the group of clients that are responsible for a significant portion of a Web site’s requests Cluster –Non-overlapping –Topologically close –Under common administrative control But, identifying clusters requires knowledge that is not available to anyone outside the administrative entities. Network-aware approach – BGP based

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients4 Simple approaches Two approaches 1.Use traditional Class A, Class B and Class C networks 2.Assume prefix length is 24 bits They are simple, but do not give good results (~50% accuracy). Counter example IP addressNamePrefix/netmask client bellatlantic.net / mailsrv1.wakefern.com / firewall.commonhealthusa.com /28

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients5 Network-aware approach Use BGP routing and forwarding table snapshots Routing table entries  clusters Example snapshot of BGP routing table PrefixPrefix descriptionNext hopAS path Peer AS description /8Army Information System Center cs.ny- nap.vbns.net (IGP) AT&T Government Markets /20Harvard University cs.cht.vbns.net1742 (IGP) Harvard University /8Massachusetts Institute of Technology cs.cht.vbns.net3 (IGP)Massachusetts Institute of Technology

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients6 Automated process Clustering process Source of IP addressesBGP routing tables IP address extraction IP addresses Prefix extraction, unification, merging Prefix table Client cluster identification Raw client clusters Validation (optional) Examining impact of network dynamics Client clusters Self-correction and adaptation

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients7 Network prefix extraction Prefix entry extraction (BGP tables from 14 places via automated scripts) AADS, MAE-EAST, MAE-WEST, PACBELL, PAIX, ARIN, AT&T-Forw, AT&T-BGP, CANET, CERFNET, NLANR, OREGON, SINGAREN, and VBNS. Prefix format unification and merging Three formats: x1.x2.x3.x4/k1.k2.k3.k4 x1.x2.x3.x4/m x1.x2.x3.0 Assembled total 391,497 unique prefix entries (412,109 entries by 7/24/2000)

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients8 Client cluster identification Methodology Extract the client IP address from the server log Perform longest prefix matching on each client IP address Classify all the client IP addresses which have the same longest matched prefix into a client cluster Experiments Experiments on wide range of Web server logs Results > 99% clients can be grouped into clusters ~ 90% sampled clusters passed our validation tests

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients9 Server logs used in our experiments LogDescriptionDateDuration (days) # requests# clients# clusters ApacheApache site10/1/99- 11/18/99 493,461,36151,53635,563 Ew3AT&T content hosting site 7/1/99- 7/31/99 311,199,27621,5197,754 Nagano1998 Winter Olympic Game 2/13/98111,665,71359,5829,853 SunSun Micro- systems site 9/30/97- 10/9/97 913,871,352219,52833,468

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients10 Example: Nagano server log

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients11 Example: Nagano server log (cont.)

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients12 Validation of clustering Validation - fundamentally difficult problem A client cluster may be mis-identified by being too large or too small Two approaches nslookup-based test Optimized traceroute-based test Results on sampled 1% client clusters A client cluster is mis-identified even if there is one client in the cluster doesn’t share same suffix with others. Error rate of network-aware approach: ~10% Error rate of simple approach: ~50% Possible reason of mis-clustering: route aggregation, national gateway proxies Effect of BGP prefix changes: < 3% (during 2 weeks)

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients13 Applications Web caching, content distribution, server replication, traffic management and load balancing, Internet map discovery, etc. Example: Web caching Client classification: Normal client, proxy, and spider Identifying spiders/proxies based on access patterns ? spiderproxy

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients14 Detecting proxy/spider

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients15 Thresholding client clusters Metric: number of requests issued from within a client cluster 70% of the total requests in the server log Web caching simulation Log# requests# clients# clusters# busy clusters Accuracy Apache3,461,36151,53635,5632,86992% Ew31,199,27621,5197,7541,60096% Nagano11,665,71359,5829, % Sun13,871,352219,52833,4682,53691%

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients16 New dataset Altavista server log containing 60,011,458 requests issued by 2,503,974 clients all over the world. # clusters: 100,091 # busy clusters: 242 Accuracy: 91% Clustering works on large, general portal site data. Thanks to Altavista for sharing data with us. The data included only client IP addresses with no personally identifiable information.

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients17 Conclusion and future work Network-aware client clustering –Based on BGP routing table snapshots –Ability to cluster >99% of clients in the server logs –Error rate is 10% (~ 50% for the simple approach) –Immune to BGP dynamics –Variety of applications Ongoing work –Online algorithm –Super/sub clustering –Server clustering –Server replication application Future work –Better validation –Lower error rate –Other applications

ACM SIGCOMM'2000On Network-Aware Clustering of Web Clients18 Acknowledgement Thanks to the following people for helping us in this project. Jennifer RexfordAnja Feldmann Tim GriffinBill Manning Vern PaxsonCraig Labovitz Thomas Narten Steven Bellovin Emden GansnerNick Duffield S. KeshavWalter Willinger