New Directions in Internet Topology Measurement and Modeling John Byers Topology Modeling Group, Boston University CS: Mark Crovella, Marwan Fayed, Anukool.

Slides:



Advertisements
Similar presentations
Hidden Metric Spaces and Navigability of Complex Networks
Advertisements

University of Nevada, Reno Router-level Internet Topology Mapping CS790 Presentation Modified from Dr. Gunes slides by Talha OZ.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Analysis and Modeling of Social Networks Foudalis Ilias.
The Connectivity and Fault-Tolerance of the Internet Topology
By Hitesh Ballani, Paul Francis, Xinyang Zhang Slides by Benson Luk for CS 217B.
Inferring Autonomous System Relationships in the Internet Lixin Gao Dept. of Electrical and Computer Engineering University of Massachusetts, Amherst
Part II: Inter-domain Routing Policies. March 8, What is routing policy? ISP1 ISP4ISP3 Cust1Cust2 ISP2 traffic Connectivity DOES NOT imply reachability!
Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Ten Years in the Evolution of the Internet Ecosystem
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
University of Nevada, Reno Ten Years in the Evolution of the Internet Ecosystem Paper written by: Amogh Dhamdhere, Constantine Dovrolis School of Computer.
On the Geographic Location of Internet Resources CSCI 780, Fall 2005.
Mapping the Internet Topology Via Multiple Agents.
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
Computer Science 1 An Approach to Universal Topology Generation Alberto Medina Anukool Lakhina Ibrahim Matta John Byers
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
Computer Science Sampling Biases in IP Topology Measurements John Byers with Anukool Lakhina, Mark Crovella and Peng Xie Department of Computer Science.
Path Protection in MPLS Networks Using Segment Based Approach.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
Correlational Designs
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
1 Network Topology Measurement Yang Chen CS 8803.
Science and Engineering Practices
PALMTREE M. Engin TozalKamil Sarac The University of Texas at Dallas.
INTERNET TOPOLOGY MAPPING INTERNET MAPPING PROBING OVERHEAD MINIMIZATION  Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Inference for regression - Simple linear regression
On the Geographic Location of Internet Resources Mark Crovella Boston University Computer Science with Anukool Lakhina, John Byers, and Ibrahim Matta or.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Chapter 9 Statistical Data Analysis
On the Power of Off-line Data in Approximating Internet Distances Danny Raz Technion - Israel Institute.
(jeez y) Where is the Internet? Answers from : (G. Whilikers) Out there. (Mike) the way I see it, the "internet" has to be somewhere. a router collects.
Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L.
Path Stitching: Internet-Wide Path and Delay Estimation from Existing Measurements DK Lee, Keon Jang, Changhyun Lee, Sue Moon, Gianluca Iannaccone* ASIAFI.
FIDEMO 2009, Nov. 18 A Step Towards a Planet-scale Measurements Retrieval Infrastructure In this work, we propose to design an end-to-end path and delay.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
Network Characterization via Random Walks B. Ribeiro, D. Towsley UMass-Amherst.
L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.
Issues with Inferring Internet Topological Attributes Lisa Amini ab, Anees Shaikh a, Henning Schulzrinne b a IBM T.J. Watson Research Center b Columbia.
1 CS 425 Distributed Systems Fall 2011 Slides by Indranil Gupta Measurement Studies All Slides © IG Acknowledgments: Jay Patel.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Advanced Networking Lab. Given two IP addresses, the estimation algorithm for the path and latency between them is as follows: Step 1: Map IP addresses.
Biostatistics Class 1 1/25/2000 Introduction Descriptive Statistics.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Issues concerning the interpretation of statistical significance tests.
Determining the Geographic Location of Internet Hosts Venkata N. Padmanabhan Microsoft Research Lakshminarayanan Subramanian University of California at.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Measurement in the Internet Measurement in the Internet Paul Barford University of Wisconsin - Madison Spring, 2001.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
Magellan: A Tool for Unicast Fault Isolation Cengiz Alaettinoglu Packet Design LLC Ramesh Govindan Information Sciences Institute John Mehringer Information.
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Topics In Social Computing (67810) Module 1 Introduction & The Structure of Social Networks.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
1 On the Impact of Route Monitor Selection Ying Zhang* Zheng Zhang # Z. Morley Mao* Y. Charlie Hu # Bruce M. Maggs ^ University of Michigan* Purdue University.
Interaction and Animation on Geolocalization Based Network Topology by Engin Arslan.
Inferring Autonomous System Relationships in the Internet Lixin Gao Dept. of Electrical and Computer Engineering University of Massachusetts, Amherst.
Statistical Data Analysis
No Direction Home: The True cost of Routing Around Decoys
Elementary Statistics
Measured Impact of Crooked Traceroute
Statistical Data Analysis
Lecture 26: Internet Topology CS 765: Complex Networks.
CSE 550 Computer Network Design
Presentation transcript:

New Directions in Internet Topology Measurement and Modeling John Byers Topology Modeling Group, Boston University CS: Mark Crovella, Marwan Fayed, Anukool Lakhina, Ibrahim Matta, Alberto Medina Physics: Paul Krapivsky, Sid Redner Statistics: Eric Kolaczyk

Some observations about the Internet Rapid, decentralized growth: –90% of Internet systems were added in the last four years –Connecting to the network can be a purely local operation This rapid, decentralized growth has opened significant questions about the physical structure of the network; e.g., –The number of hosts connected to the network –The properties of network links (delay, bandwidth) –The interconnection pattern of hosts and routers –The interconnection relationships of ISPs –The geographic locations of hosts, routers and links

Approaching the Internet Scientifically Engineering or Science? Engineering: study of things made Science: study of things found Although the Internet is an engineered artifact, it now presents us with questions that are better approached from a scientific posture. Worthwhile scientific goals –Understand what drives Internet growth –Basic investigations pay off in unexpected ways

Talk Organization A brief retrospective. Towards a scientific understanding of the Internet. Specific directions: –Geometry/geography-driven topology generation. Where are the nodes/links in the Internet (recap) What is the geographical extent of ASes? –Measuring and modelling the time-evolution of AS sizes.

Case Study: “Origins” paper, [MMB ‘00] Goal: Causal explanation for then unexplained power-laws in [FFF ‘99]. Our hypothesis: Simple Barabasi-Albert model of incremental growth, preferential attachment. Model led to topologies which fit known metrics… BUT –How to validate this explanation? –[FFF ‘99] snapshots inadequate for testing hypotheses about time-evolution of the system. –In fact, no adequate set of measurements available. Also, much easier to invalidate than validate.

Case Study: “Origins” paper, [MMB ‘00] Not an entirely wasted effort... Some positive outcomes: –BRITE/BRIANA topology generation framework / analysis engine for testing wide assortment of models. –Motivation to focus on modeling problems where validation was possible… –… or better yet, to start from measurements themselves. And some future considerations: –Do topology models need to be explanatory, or just descriptive? –How to place value on a model that cannot be validated?

Our current approach Measurement: –understand topological features from direct study –leverage measurements from others as possible –measurements of time-evolution of a system are especially helpful Characterization and Modeling Validation: provide empirical confirmation that model predictions fit measured data Tool-Building: build our models / other models into BRITE (open source).

Breathing life into topology generation Raw topology analogous to a skeleton –presents coarse structure, but incomplete, inanimate –inadequate for conducting most simulations Flesh out by building annotated graphs: –Label nodes with autonomous system (AS) ID’s. –Label edges with link bandwidths. –Label edges with latencies. –Do this in a representative manner. Animate the topology: –Generate representative traffic workloads across the annotated graph. –Consider other dynamic factors (churn, link failures) Now we’re ready to conduct a simulation.

One primary direction: Geography Long-term goal: annotated graph generation: –Label nodes with autonomous system (AS) ID’s. –Label edges with link bandwidths and latencies. How? –All of the problems seem more tractable when we consider the underlying geometry of the network. But next to nothing is known about the geometry/ geography of today’s Internet. –Geographic extent of ASes? –Distribution of link lengths? Inter-AS link lengths? Our first step (now complete): measurements.

Where is the Internet? ? ? ? ? ?

Assumptions and Definitions We treat the Internet as an undirected graph embedded on the Earth’s surface –Nodes correspond to routers or interfaces –Edges correspond to physical router-router links –Routers associated with an administering AS –Not concerned with hosts (end systems) We will ignore many higher and lower level questions

Our Basic Approach Obtain IP-level router maps  Mercator and Skitter Find geographic location of each router  Ixia’s IxMapping; Akamai’s EdgeScape Identify AS associated with each router  RouteViews

Mercator: Govindan et al., USC/ISI, ICSI Skitter: Moore et al., CAIDA Based on active probing from a single site Resolves aliases Uses loose source routing to explore alternate paths Traceroutes from 19 monitors to large set of destinations Does not resolve aliases Destinations attempt to cover IP address space

Datasets Mercator Collected August ,263 routers 320,149 links Skitter Collected January ,107 interfaces 1,075,454 links

IxMapping: Moore et al., CAIDA Given an IP address, infers geographic location based on a variety of heuristics –Hostnames, DNS LOC, whois e.g., 190.ATM8-0-0.GW3.BOS1.ALTER.NET is in Boston Able to map over 98% of routers/interfaces Similar to GeoTrack [Padmanabhan] which exhibits reasonable accuracy –Median error of 64 mi –90% queries within 250 mi

EdgeScape: Akamai Given an IP address, returns lat./long. Methods employed not currently published; available as a commercial service. Claims mean error of < 50 miles Able to map over 99% of routers/interfaces

RouteViews Provides daily BGP table snapshots For each of the router/interface inventories, we pull a BGP snapshot from the same date. Then, for each interface, infer the associated AS by the AS advertising the containing block. For routers with multiple interfaces, use the majority vote; discard if there is no majority vote (2% of all routers).

Where are the routers? USA

Europe

Interfaces and People: USA, Skitter Grid size: ~90 mi x 90 mi

Routers and People Upper, Mercator; Lower, Skitter USA Europe Japan

Router Location: Summary Router location is strongly driven by population density Superlinear relationship between router and population density: R  k P a k varies with economic development (users online) a is greater than one ( ) More routers per person in more densely populated areas

Link Preference Function Interested in influence of distance on link formation: f(d) = P[C|d] i.e., Probability two nodes separated by distance d are directly connected by a link. Estimated as: number links of length d f(d) = number of router pairs separated by d

f(d) for USA (Skitter) Distance Sensitive Distance Insensitive

Link Distance Preference for USA Skitter, d < 250, semi-log plot  L  140 mi.

Large d: distance insensitivity F(d) =  f(u) d u=1 USA data, Skitter

Link Formation: Summary Link formation seems to be a mixture of distance-dependent and –independent processes Waxman (exponential) model remarkably good for large fraction of all links! –But, crucial difference is that we are using a very irregular spatial distribution of nodes Small fraction of non-local links are very important (structural)

Where are the ASes? Two measures: –# of distinct locations (grid cells) –area of the convex hull of the set of distinct locations Computing convex hull –cut earth along Int’l Date Line and unroll –use Albers equal area projection to approximately preserve areas

AS Findings (1) [TDGJSW ‘01] Size of an AS (in routers) and AS degree are well correlated. We find 3-way correlation between size, degree and # of distinct locations. Distribution of number of locations is long- tailed, highly variable.

AS Findings (2) 80% of ASes have 0 area -- two or fewer distinct locations. The rest of the ASes fall in two regimes: –small ASes have considerable variability –largest ASes are fully dispersed –cutoff: degree > 100 or interfaces > size, degree and # of distinct locations.

Direction 2: AS Size Distribution Goal: Model the growth and evolution of ASes and their sizes. Bonus: RouteViews BGP logs may later help validate model predictions. Hosts enter the system and either: –Create a new AS or –Join an existing AS At each timestep, a pair of ASes may also merge.

The Simplest Plausible Model N(t) = number of Ases M(t) = number of hosts dN/dt = (q-r)N dM/dt = pM + qN where q is the rate of new AS creation r is the rate of AS coalescence p is the rate of creation of new nodes Relative values of p, q and r determine average AS size. When p > q - r, average AS size grows as N^((p-q+r)/(q-r)).

Preliminary Findings Model behavior tractable to analyze. AS births, deaths and mergers can be identified with some degree of confidence from RouteViews logs. –But… differentiating BGP churn from bona fide events can be challenging –Statistical de-noising methods may apply (?) Simple model makes reasonable predictions –But… coalescence kernel needs fine-tuning, i.e. measurements indicate that r is not size-indep.

In Conclusion Generating test networks rather than test topologies is a natural next step. Geometry/geography provides leverage. Plenty of unexplored territory. Validation and measurement continues to be underappreciated. Measurements of time-evolving systems are in particularly short supply. Modeling problems can be a bonanza for statisticians and physicists.