Router-level Internet Topology Discovery Mehmet H. Gunes.

Slides:



Advertisements
Similar presentations
University of Nevada, Reno Router-level Internet Topology Mapping CS790 Presentation Modified from Dr. Gunes slides by Talha OZ.
Advertisements

Heuristic Search techniques
Communication Networks Recitation 3 Bridges & Spanning trees.
Analysis and Modeling of Social Networks Foudalis Ilias.
UNIT-IV Computer Network Network Layer. Network Layer Prepared by - ROHIT KOSHTA In the seven-layer OSI model of computer networking, the network layer.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Internet Topology Mapping
Router-level Internet Topology Mapping By Talha OZ.
MCDST : Supporting Users and Troubleshooting a Microsoft Windows XP Operating System Chapter 13: Troubleshoot TCP/IP.
1 A survey of Internet Topology Discovery. 2 Outline Motivations Internet topology IP Interface Level Router Level AS Level PoP Level.
King : Estimating latency between arbitrary Internet end hosts Krishna Gummadi, Stefan Saroiu Steven D. Gribble University of Washington Presented by:
CSE331: Introduction to Networks and Security Lecture 9 Fall 2002.
CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 22 Introduction to Computer Networks.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
Measurement in the Internet. Outline Internet topology Bandwidth estimation Tomography Workload characterization Routing dynamics.
On the Effectiveness of Route- Based Packet Filtering for Distributed DoS Attack Prevention in Power-Law Internets Kihong Park and Heejo Lee Network Systems.
© 2007 Pearson Education Inc., Upper Saddle River, NJ. All rights reserved.1 Computer Networks and Internets with Internet Applications, 4e By Douglas.
Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.
Heuristics for Internet Map Discovery R. Govindan, H. Tangmunarunkit Presented by Zach Schneirov.
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #5 Mobile Ad-Hoc Networks TBRPF.
Spring Routing & Switching Umar Kalim Dept. of Communication Systems Engineering 06/04/2007.
1 Internet Networking Spring 2003 Tutorial 4 ICMP (Internet Control Message Protocol) usage TBRPF (Topology Broadcast based on Reverse Path Forwarding)
Multipath Routing CS 522 F2003 Beaux Sharifi. Agenda Description of Multipath Routing Necessity of Multipath Routing 3 Major Components Necessary for.
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Measuring ISP topologies with Rocketfuel Ratul Mahajan Neil Spring David Wetherall University of Washington ACM SIGCOMM 2002.
Lecture Week 3 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.
1 Network Topology Measurement Yang Chen CS 8803.
PALMTREE M. Engin TozalKamil Sarac The University of Texas at Dallas.
INTERNET TOPOLOGY MAPPING INTERNET MAPPING PROBING OVERHEAD MINIMIZATION  Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University.
Support Protocols and Technologies. Topics Filling in the gaps we need to make for IP forwarding work in practice – Getting IP addresses (DHCP) – Mapping.
G64INC Introduction to Network Communications Ho Sooi Hock Internet Protocol.
Traceroute-like exploration of unknown networks: a statistical analysis A. Barrat, LPT, Université Paris-Sud, France I. Alvarez-Hamelin (LPT, France) L.
Measuring ISP Toplogies with Rocketfuel Neil Spring, Ratul Mahajan, and David Wetherall Presented By: David Deschenes March 25, 2003.
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
Objectives: Chapter 5: Network/Internet Layer  How Networks are connected Network/Internet Layer Routed Protocols Routing Protocols Autonomous Systems.
Senior Project Ideas: Blind Communication & Internet Measurements Mehmet H. Gunes.
Network Layer4-1 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside a router r 4.4 IP: Internet.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 11 Unicast Routing Protocols.
Tony McGregor RIPE NCC Visiting Researcher The University of Waikato DAR Active measurement in the large.
Paper Group: 20 Overlay Networks 2 nd March, 2004 Above papers are original works of respective authors, referenced here for academic purposes only Chetan.
Infrastructure adapted from Mark Crovella and Balachander Krishnamurthy.
1 Internet Control Message Protocol (ICMP) Used to send error and control messages. It is a necessary part of the TCP/IP suite. It is above the IP module.
CS4550 Computer Networks II IP : internet protocol, part 2 : packet formats, routing, routing tables, ICMP read feit chapter 6.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 6: Static Routing Routing and Switching Essentials.
Chelebi: Subnet-level Internet Mapper Mehmet H. Gunes University of Nevada, Reno.
Lecture 14 Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency.
Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The.
ICS 156: Networking Lab Magda El Zarki Professor, ICS UC, Irvine.
Routing protocols. Static Routing Routes to destinations are set up manually Route may be up or down but static routes will remain in the routing tables.
Lecture 14: Internet Measurement CS 765: Complex Networks.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
KAIS T On the problem of placing Mobility Anchor Points in Wireless Mesh Networks Lei Wu & Bjorn Lanfeldt, Wireless Mesh Community Networks Workshop, 2006.
Lecture 17 Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 6: Static Routing Routing and Switching Essentials.
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
University of Nevada, Reno Resolving Anonymous Routers Hakan KARDES CS 790g Complex Networks.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
Ad Hoc On-Demand Distance Vector Routing (AODV) ietf
Spring Routing: Part I Section 4.2 Outline Algorithms Scalability.
Internet Measurements. 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The.
Distance Vector Routing
Lecture 2: Internet Measurement CS 790g: Complex Networks.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
1 Computer Networks Chapter 5. Network layer The network layer is concerned with getting packets from the source all the way to the destination. Getting.
Routing Information Protocol (RIP)
Introduction to Networking
RESOLVING IP ALIASES USING DISTRIBUTED SYSTEMS
Measured Impact of Crooked Traceroute
Lecture 26: Internet Topology CS 765: Complex Networks.
Presentation transcript:

Router-level Internet Topology Discovery Mehmet H. Gunes

Internet Topology Discovery 2 Web of interconnected networks Grows with no central authority Autonomous Systems optimize local communication efficiency The building blocks are engineered and studied in depth Global entity has not been characterized Most real world complex-networks have non-trivial properties. Global properties can not be inferred from local ones Engineered with large technical diversity Range from local campuses to transcontinental backbone providers Internet

Understand topological and functional characteristics of the Internet Essential to design, implement, protect, and operate underlying network technologies, protocols, services, and applications Need for Internet measurements arises due to commercial, social, and technical issues Realistic simulation environment for developed products, Improve network management Robustness with respect to failures/attacks Comprehend spreading of worms/viruses Know social trends in Internet use Scientific discovery Scale-free (power-law), Small-world, Rich-club, Dissasortativity,… Internet Measurements 3 Internet Topology Discovery

Types of Internet topology maps Autonomous System (AS) level maps Router level maps A router level Internet map consists of Nodes: End-hosts and routers Links: Point-to-point or multi-access links Router level Internet topology discovery A process of identifying nodes and links among them Internet Topology Measurement 4 Internet Topology Discovery Lumenta Jan 06CAIDA Jan 08CAIDA Jan 00

Outline Introduction Internet Topology Measurement Topology Discovery Issues Impact of IP Alias Resolution Topology Discovery Resolving Anonymous Routers Graph-based Induction Technique Resolving Alias IP Addresses Analytical and Probe-based Alias Resolution Resolving Genuine Subnets Dynamic Subnet Inference Summary 5 Internet Topology Discovery

Internet topology measurement studies Involves topology collection / construction / analysis Current state of the research activities Distributed topology data collection studies/platforms iPlane, Skitter, Dimes, DipZoom, … 20M path traces with over 20M nodes (daily) Topology discovery issues 1.Sampling 2.Anonymous routers 3.Alias IP addresses 4.Genuine subnets Internet Topology Measurement Background 6 Internet Topology Discovery

Direct probing Indirect probing A DBC Internet Topology Measurements Probing IP B TTL=64 IP B IP D TTL=64 IP D Vantage Point A DBC IP B IP D TTL=2IP D TTL=1 IP C 7 Internet Topology Discovery

Probe packets are carefully constructed to elicit intended response from a probe destination traceroute probes all nodes on a path towards a given destination TTL-scoped probes obtain ICMP error messages from routers on the path ICMP messages includes the IP address of intermediate routers as its source Merging end-to-end path traces yields the network map S DABC Destination Internet Topology Measurement Topology Collection (traceroute) TTL=1 IP A TTL=2 IP B TTL=3 IP C TTL=4 IP D Vantage Point 8 Internet Topology Discovery

Internet Topology Measurement: Background Internet Topology Mapping 9 S L U H C N W A s.2 l.1 s.3 u.1 l.3 u.3 h.1 k.3 h.2 h.3 a.3 u.2 k.1 c.4 a.1 a.2 w.3 c.3 w.1 c.2 n.1 n.3 w.2 l.2 K c.1 k.2 d h.4 Trace to Seattle h.4 l.3 s.2 Trace to NY h.4 a.3 w.3 n.3 Internet2 backbone

Internet Topology Measurement: Background Internet Topology Mapping 10 S L U C N A s.2 l.1 s.3 u.1 l.3 h.1 k.3 h.2 a.3 u.2 k.1 c.4 a.1 a.2 w.3 c.3 w.1 c.2 n.1 n.3 w.2 l.2 K c.1 k.2 h.3 d h.4 s.1 e f n.2 H W u.3

Internet Topology Measurement Topology Collection Internet2 backbone Traces d - H - L - S - e d - H - A - W - N - f e - S - L - H - d e - S - U - K - C - N - f f - N - C - K- H - d f - N - C - K - U - S - e S L U K C H A W N e d f 11 Internet Topology Discovery

12 Sampling to discover networks Infer characteristics of the topology Different studies considered Effect of sample size [Barford 01] Sampling bias [Lakhina 03] Path accuracy [Augustin 06] Sampling approach [Gunes 07] Utilized protocol [Gunes 08] ICMP echo request TCP syn UDP port unreachable Topology Sampling Issues Internet Topology Discovery

13 Sampling techniques Path sampling Diameter Edge sampling Capacity Node sampling Degree characteristics Sampling approach (n,n) – traceroute based topology Returns the Internet map among n vantage points (k,m) – traceroute based topology where k<<m (k=n) Returns the Internet map between k sources and m destinations Topology Sampling Approaches Path sampling vs Node sampling (k,m)-sampling vs (n,n)-sampling Internet Topology Discovery

ICMP path traces from skitter 1 st collection cycle of each year (from 1999 to 2008) Skitter had updates to destination IP addresses major update in the system in 2004 Processing Alias IP addresses Analytical Alias Resolver (AAR) [Gunes-06] Analytical and Probe Based Alias Resolver (APAR) [Gunes-09] Anonymous routers Graph Based Induction (GBI) [Gunes-08] Historical Perspective on Responsiveness Data Set 14 Internet Topology Discovery

Historical Perspective on Responsiveness Anonymous node ratio YearTraces (million) Reached (%) Nodes (million) Anonymous (%) Internet Topology Discovery

Historical Perspective on Responsiveness Anonymous node ratio after processing YearNodes (million) Anonymous (million) Nodes (thousand) Anonymous (thousand) Anonymous (%) InitialFinal 16 Internet Topology Discovery

Historical Perspective on Responsiveness Unique substrings YearUnique substrings , , , , ,13363,6626,3604, ,82959,1715,8284, ,26373,26314,0195, ,18263,94413,7335,772 %67%27%4%2 17 Internet Topology Discovery

Historical Perspective on Responsiveness Summary End system responsiveness is in considerable decline %86 to %23 ICMP rate limiting increased especially since 2004 ~%0 to %7 Overall router responsiveness has reduced ~%98 to ~%90 Most anonymous regions are single hop (%67) then two hop (%27) 18 Internet Topology Discovery

536,743 destination IP addresses from skitter and iPlane projects Between 7-11 April 2008 Probes ICMP echo request TCP SYN UDP to random ports Direct probes ping Indirect probes traceroute Current Practices in Responsiveness Data Set 19 Internet Topology Discovery

Current Practices in Responsiveness Direct probes ProbeResponsive (%) ICMP81.9 TCP67.3 UDP59.9 Anonymous (%) Router (%) End-host (%) K IPs 320 K217 K 20 Internet Topology Discovery

Current Practices in Responsiveness Direct probes (domain) ProbeAnonymous (%) ICMP18.1 TCP32.7 UDP40.1.net (%).com (%).edu (%).org (%).gov (%) K IPs5 K1.7 K25.5 K10.1 K0.5 K 21 Internet Topology Discovery

Current Practices in Responsiveness Indirect probes ProbeReached (%) Nodes (thousand) Anonymous (%) ICMP93.11, TCP UDP45.01, Nodes (thousand) Anonymous (%) InitialFinal 22 Internet Topology Discovery 306 K traces

Current Practices in Responsiveness Summary Nodes that respond to indirect probes might not respond to direct probes Nodes are most responsive to ICMP probes (%82) least responsive to UDP probes (%60) End hosts are less responsive than routers Responsiveness is similar for different domains 23 Internet Topology Discovery

Anonymous Router Resolution Problem Anonymous routers do not respond to traceroute probes and appear as a  in path traces Same router may appear as a  in multiple traces. Anonymous nodes belonging to the same router should be resolved. Anonymity Types 1. Ignore all ICMP packets 2. ICMP rate-limiting 3. Ignore ICMP when congested 4. Filter ICMP at border 5. Private IP address 24 Internet Topology Discovery

Anonymous Router Resolution Problem Internet2 backbone S L U K C H A W N e d Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e 25 Internet Topology Discovery f

Anonymous Router Resolution Problem UKCN LHAW S d e f Sampled network d e f S U L C A W Resulting network 26 Internet Topology Discovery Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e

Each interface of a router has an IP address. A router may respond with different IP addresses to different queries. Alias Resolution is the process of grouping the interface IP addresses of each router into a single node. Inaccuracies in alias resolution may result in a network map that includes artificial links/nodes misses existing links Alias Resolution: Denver 27 Internet Topology Discovery

28 S L U C N W A s.2 l.1 s.3 u.1 l.3 u.3 h.1 k.3 h.2 a.3 u.2 k.1 c.4 a.1 a.2 w.3 c.3 w.1 c.2 n.1 n.3 w.2 l.2 K c.1 k.2 h.3 d h.4 s.1 e f n.2 H Traces d - h.4 - l.3 - s.2 - e d - h.4 - a.3 - w.3 - n.3 - f e - s.1 - l.1 - h.1 - d e - s.1 - u.1 - k.1 - c.1 - n.1 - f f - n.2 - c.2 - k.2 - h.2 - d f - n.2 - c.2 - k.2 - u.2 - s.3 - e IP Alias Resolution Problem Internet Topology Discovery

29 IP Alias Resolution Problem Internet Topology Discovery UKCN LHAW S d e f Sampled network Sample map without alias resolution s.3 s.1 s.2 l.3 l.1 u.1 u.2 k.1 c.1n.1 n.2 k.2 c.2 w.3 a.3 h.2 h.4 h.1 e d f n.3 Traces d - h.4 - l.3 - s.2 - e d - h.4 - a.3 - w.3 - n.3 - f e - s.1 - l.1 - h.1 - d e - s.1 - u.1 - k.1 - c.1 - n.1 - f f - n.2 - c.2 - k.2 - h.2 - d f - n.2 - c.2 - k.2 - u.2 - s.3 - e

30 Genuine Subnet Resolution Problem Alias resolution IP addresses that belong to the same router Subnet resolution IP addresses that are connected over the same medium IP2IP3 IP4 IP1 IP6IP5 IP2 IP3 IP1 IP2IP3 IP1 Internet Topology Discovery

31 Impact of IP Alias Resolution Effects on Topological Characteristics Generate synthetic network topology (Random, Power-law, Transit-Stub), Annotate it to add interface addresses, Emulate traceroute to collect path traces, Build sample topologies with different alias resolution success rate s r t r.1 r.2 r.3 Consider an example A path from s to t: s.2 – r.2 – t.1 A path from t to s: t.1 – r.3 – s.2 Case 1: resolve r Case 2: do not resolve r s.1 s.2 t.1 t.2 s r t r.2 r.3 s.2 t.1 s t r.2 s.2 t.1 r.3 Internet Topology Discovery

Alias Resolution: Experimental Procedure Apply alias resolution with different success rate 0%, 20%, 40%, 60%, 80%, and 100% success rates. Generate various synthetic graphs to represent the Internet Random: Waxman (WA), Power-law: Barabasi-Albert (BA) and Inet, Hierarchical: Transit-Stub (TS) Analyze changes in topological characteristics Topology Size,● Characteristic Path Length, Node Degree, ● Hop Distribution, Degree Distribution, ● Betweenness, Joint Degree Distribution, ● Clustering. Analyze a genuine Internet sample Utilize state-of-the art alias resolution tools.

33 Impact of IP Alias Resolution Graph Size With no alias resolution, average artificial nodes 57% edges 62% Impact of imperfect alias resolution increases with sample size. Internet Topology Discovery

Observed degree: degrees with imperfect alias resolution True degree: degrees with perfect alias resolution. Frequency distribution: number of nodes at each node degree Effects on Topological Characteristics : Node Degree Degree of these nodes are underestimated since their aliases are not resolved. WA – 0% success Degree of these nodes are overestimated due to non-resolved neighbors. WA – 40% successWA – 80% success Frequency distribution 34 Internet Topology Discovery

Observed degree: degrees with imperfect alias resolution True degree: degrees with perfect alias resolution. Frequency distribution: number of nodes at each node degree Effects on Topological Characteristics : Node Degree WA – 0% successWA – 40% successWA – 80% success Frequency distribution With improving alias resolution, some of the underestimation cases change to overestimation Alias resolution problems at a node may introduce a significantly large number of artificial nodes 35 Internet Topology Discovery

Effects on Topological Characteristics : Degree Distribution The probability P(k) that a randomly chosen node has degree k. Imperfect alias resolution, especially, distorts the power-law characteristic of BA- and Inet-based samples, impacts especially low degree ranges (3-13) of TS-based samples, impacts especially high degree ranges (20-up) of WA-based samples. Degree-related characteristics do not always improve with an increasing success rate 36 Internet Topology Discovery

Effects on Topological Characteristics : Joint Degree Distribution The probability P(k1, k2) that a node of degree k1 and a node of degree k2 are connected. Assortativity coefficient: The tendency of a network to connect nodes of the same or different degrees. Positive values indicate assortativity most of the links are between similar degree nodes. Negative values indicate disassortativity most of the links are between dissimilar degree nodes. 0 indicates non-assortativity. seem to be assortative with 0% alias resolution, but is non-assortative 37 Internet Topology Discovery

Effects on Topological Characteristics : Characteristic Path Length & Hop Distribution Characteristic Path Length The average of the shortest path lengths between all node pairs. Reduces with the increasing alias resolution success rate. On average 30% for BA, Inet and WA-based sample topologies. Hop Distribution The average percentage of the nodes reached at each hop As alias resolution improves, less number of hops are required to reach others. 24%, 60%, 78%, and 83% of the nodes are reachable within 7 hops with 0%, 40%, 80% and 100% alias resolution, respectively 38 Internet Topology Discovery

39 Impact of IP Alias Resolution Betweenness & Clustering Betweenness Centrality As the alias resolution success rate increases The average betweenness reduces The normalized betweenness increases Clustering Coefficient All samples yield a clustering coefficient of 0 with 0% alias resolution success rate It almost always increases with the improving alias resolution. Internet Topology Discovery

Effect of alias resolution on a genuine topology. Changes in observed topological characteristics Ally is the current state-of-the-art probe based approach. APAR is our analytical approach. 40 Impact of IP Alias Resolution Impact on a Genuine Topology InitialAllyAPARAlly & APAR Number of Nodes Number of Edges Average Degree Assortativity Coefficient Characteristic Path Length Normalized Betweenness Clustering Coefficient Internet Topology Discovery

Accuracy of the alias resolution process may significantly distort, almost all, topological characteristics that considered in this study. Internet measurement studies should employ all the means possible to increase the accuracy/ completeness of the alias resolution process. 41 Impact of IP Alias Resolution Impact on a Genuine Topology Internet Topology Discovery Path length related characteristics are closer to that of TS samples. Degree related characteristics are mostly similar to that of BA samples

Outline Introduction Internet Topology Measurement Topology Discovery Issues Impact of IP Alias Resolution Topology Discovery Resolving Anonymous Routers Graph-based Induction Technique Resolving Alias IP Addresses Analytical and Probe-based Alias Resolution Resolving Genuine Subnets Dynamic Subnet Inference Summary 42 Internet Topology Discovery

Anonymous routers do not respond to traceroute probes and appear as  in traceroute output Same router may appear as  in multiple traces. 43 Anonymous Router Resolution Problem y: S – L – H – x x: H – L – S – y y: S –  – H – x x: H –  – S – y S L H y x S L H y x y S 11 22 H x Internet Topology Discovery Current daily raw topology data sets include ~ 20 million path traces with ~ 20 million occurrences of  s along with ~ 500K public IP addresses The raw topology data is far from representing the underlying sampled network topology

Anonymous Router Resolution Previous Approaches Internet2 backbone S L U K C H A W N e d f Traces d -  - L - S - e d -  - A - W -  - f e - S - L -  - d e - S - U -  - C -  - f f -  - C -  -  - d f -  - C -  - U - S - e 44 Internet Topology Discovery

45 Anonymous Router Resolution Previous Approaches Basic heuristics IP: Combine anonymous nodes between same known nodes [Bilir 05] Limited resolution NM: Combine all anonymous neighbors of a known node [Xin 06] High false positives More theoretic approaches Graph minimization approach [Yao 03] Combine  s as long as they do not violate two accuracy conditions: (1) Trace preservation condition and (2) distance preservation condition High complexity O(n 5 ) – n is number of  s ISOMAP based dimensionality reduction approach [Xin 06] Build an n x n distance matrix then use ISOMAP to reduce it to a n x 5 matrix Distance: (1) hop count or (2) link delay High complexity O(n 3 ) – n is number of nodes UK C N L HA W S x y z Sampled network x y z S U L C A W After resolution x y z S U L C A W H x y z S U L C A W Resulting network Internet Topology Discovery

46 Anonymous Router Resolution Problem Complexity Graph minimization For an observed topology, accept the minimal underlying network as the underlying topology. [Yao 03] Mergeability relation Reduction from Graph Coloring Given a graph, find minimum number of colors such that no two neighboring vertices have the same color Build a graph G = (V, E) Add a vertex for each node in the network topology Add edges between non-mergeable vertices all pair of vertices representing non-anonymous nodes; all pair of vertices that appear in the same trace; etc. Find minimum set of colors for G and merge nodes that have the same color in the network topology

47 Graph Based Induction (GBI) Approach Graph based induction A graph data mining technique Find frequent substructures in a graph data Commonly used in mining biological and chemical graph data Use of GBI for anonymous router resolution Observe common graph structures due to anonymous routers Develop localized algorithms with manageable computational and storage overhead Trace Preservation Condition Merge anonymous nodes as long as they cause no loops in path traces Internet Topology Discovery

48 Anonymous Router Resolution Anonymity Types Type 1: Do not send any ICMP responses Type 2: Filtered ICMP responses at border routers Type 3: Rate limit ICMP responses Type 4: Do not send ICMP responses when congested Type 5: ICMP responses with private source IP address Internet Topology Discovery

49 Graph Based Induction Common Structures Parallel nodes A x C y2 y1 y3    A x C y2 y1 y3  Star DA wx C y E z  DA wx C y E z    Complete Bipartite A C x y D w F v E z  A C x y D w F v E z       Clique A C x y D w E z  A C x y D w E z       Internet Topology Discovery

50 Graph Based Induction Parallel nodes Algorithm For each  -substrings (a,  i,c), represent it as a tuple (a||c,  i ) a||c is the tuple identifier and a<c Read path traces and build the sorted list L of two tuples Subsequently read tuples are compared to the ones in the list based on tuple identifiers and duplicates are excluded from L Handling anonymity due to ICMP rate limiting or congestion A second scan of path traces looking for substrings of the form (a,b,c) corresponding to (a,  i,c) in L a c b a c b     Internet Topology Discovery

Generate a new graph G* = (V*,E*) For each  -substring of type (a,  e, b), V* ← V* U {a, b} E* ← E* U {e(a,b)} First identify 4-cliques and grow them by adding nodes that are connected to at least 4 nodes of the structure Helps in tolerating few missing links in large cliques Then, process all 3-cliques 51 Graph Based Induction Clique-like a c d e a c d e a c d e        Internet Topology Discovery

First search for a small size, i.e., K 2,3, complete bipartite structure in G* and then grow it to a larger one Take each pair of nodes and look whether they are in a K 2,3 Identifying a K 2,3, look for larger complete bipartite graphs K 2,m and then K n,m that contain the identified K 2,3. Then, process all K 2,2 ’s 52 Graph Based Induction Complete Bipartite A C D F E A C D F E In G C D F E In G* In G A        Internet Topology Discovery

53 Graph Based Induction Star Combine anonymous neighbors of a known node under trace preservation condition Starting from ones with smallest number of anonymous neighbors DA w C y E z DA w C y E z Note: Operate on G and not on G*     Internet Topology Discovery

54 Evaluations Effectiveness iPlane data set 229,425 IP addresses and ~ 9 M anonymous nodes, ~ 18M traces from 190 vantage points toward ~ 90K destinations. ISOMAP dimensionality reduction approach takes ~10 18 operations Graph minimization approach takes ~10 30 operations # Anonymous# Resolved Parallel nodes8,972,9398,171,360 ICMP rate limiting801,579585,887 Clique-like215, Complete bipartite215,15961,968 Star153,19154,581 Final98,6108,874, x 10 9 operations Internet Topology Discovery

Evaluations Accuracy Graph edit distance: # of node splits (# of false positives in resolution) # of node merges (# of false negatives in resolution) Experimental setup Genuine AMP topology with 2376 routers and 3770 links Synthetic transit-stub topology with 50K nodes and 138.5K links Samples from these topologies From AMP topology: (10,500) and (10,1000) path traces From TS topology: (10,1000), (10,2000), and (10,3000) path traces 2%4%6%8%10%12%14% Initial3,7984,5769,09310,51911,04516,38319,079 IP ,252 NM ,190 GBI Average Graph Edit Distance 55 Internet Topology Discovery

56 Summary Anonymous Router Resolution DA C E GBI DA C E Underlying   DA C E Collected   DA C E Neighbor Matching  Internet Topology Discovery Responsiveness reduced in the last decade NP-hard problem Graph Based Induction Technique Practical approach for anonymous router resolution Takes ~6 hours to process data sets of ~20M path traces Identifies common structures Handles all anonymity types Helpful in resolving multiple anonymous routers in a locality

Outline Introduction Internet Topology Measurement Topology Discovery Issues Impact of IP Alias Resolution Topology Discovery Resolving Anonymous Routers Graph-based Induction Technique Resolving Alias IP Addresses Analytical and Probe-based Alias Resolution Resolving Genuine Subnets Dynamic Subnet Inference Summary 57 Internet Topology Discovery

IP Alias Resolution Problem a c d b e a sub-graph a1 c1 b2 b1 c2 with no alias resolution w zy x A set of collected traces w, …,b1, a1, c1, …, x z, …,d1, a2, e1, …, y x, …,c2, a3, b2, …, w y, …,e2, a4, d2, …, z xw a3 a2 e1 d2 d1 e2 yz a4 Sample map from the collected path traces Internet Topology Discovery 58 A router may appear with different IP addresses in different path traces Need to resolve IP addresses belonging to the same router

IP Alias Resolution Problem a c1 b2 b1 c2 partial alias resolution (only router a is resolved) x w e1 d2d1 e2 y z partial alias resolution (only router a is not resolved) a2 c d b e w zy x a3 a4 a1 59 Internet Topology Discovery a c d b e sub-graph w zy x

60 IP Alias Resolution Previous Approaches Dest = A B Dest = B A, ID=100 Dest = B B, ID=99 B, ID=103 A B A B Source IP Address Based Method [Pansiot 98] Relies on a particular implementation of ICMP error generation. IP Identification Based Method (ally) [Spring 03] Relies on a particular implementation of IP identifier field, Many routers ignore direct probes. DNS Based Method [Spring 04] Relies on similarities in the host name structures sl-bb21-lon-14-0.sprintlink.net sl-bb21-lon-8-0.sprintlink.net Works when a systematic naming is used. Record Route Based Method [Sherwood 06] Depends on router support to IP route record processing Internet Topology Discovery

Analytical Alias Resolution Approach Leverage IP address assignment convention to infer IP aliases Identify symmetric path segments within the collected set of path traces Infer IP aliases Use a number of checks to Remove false positives Increase confidence in the identified IP aliases Internet Topology Discovery 61

IP address Assignment Practices Point-to-point Links For a point-to-point link use either /30 subnet or /31 subnet The interface IP addresses on the link are consecutive and are within /30 subnet or /31 subnet use ↔ to represent subnet relation between two IP addresses Use subnet relation ( ↔ ) to infer IP aliases AB / /31 /30 network /31 network 62 Internet Topology Discovery

IP address Assignment Practices Multi-access Links A similar relation between IP addresses belonging to the same multi-access link holds Example: Consider two IP addresses A: and B: A and B are not together in a /30 or a /31 subnet However, they are together in /29 subnet /29 A: B: AB /29 subnet Internet Topology Discovery 63

64 Analytical Alias Resolution Sample traceroute pairs MIT UTD no response no response Aliases … Internet Topology Discovery

65 There is possibility of incorrect subnet assumption, Two /30 subnets assumed as a /29, incorrect alignment of path traces. IP 4 and IP 8 are thought of as aliases. To prevent false positives, some conditions are defined Trace preservation, Distance preservation (probing component of APAR), Completeness, Common neighbor. APAR Analytical and Probe-based Alias Resolution a sample network a cd b ef IP 1 IP 2 IP 9 IP 3 IP 4 IP 8 IP 7 Internet Topology Discovery

Analytical Alias Resolution Main Idea Use traceroute collected path traces only No probing is required at this point Study the relations between IP addresses in different traces Infer subnets: Use the IP address assignment convention to infer Point-to-point (/30 or /31) subnets, or Multi-access (/x where x<30) subnets from the path traces Infer IP aliases: Align path segments to infer IP aliases from the detected subnets 66 Internet Topology Discovery

Analytical Alias Resolution: Potential Issues Problems with inferring subnets accurately False positive: two separate subnets with consecutive /30 subnet numbers may be inferred as one /29 subnet False negative: a /29 subnet may be inferred as two separate /30 subnets Problems with inferring IP aliases accurately False positives and false negatives possible due to incorrectly formed subnets Both false positives and false negatives introduce inaccuracies to the resulting topology map 67 Internet Topology Discovery

Analytical Alias Resolution Potential Solutions How to verify the accuracy of formed subnets Accuracy condition: Two or more IP addresses from the same subnet cannot appear in a loop-free trace (unless they are consecutive) Check if a newly formed subnet violates this condition for any pair of available IP addresses from this subnet in any other path trace Completeness condition: To infer a /x subnet among a set of IP addresses that belong the address range, require that some fraction (e.g., 50%) of these addresses appear in our data set Needed to increase our confidence on the inferred subnet Processing order: Start with subnets with higher completeness ratio 68 Internet Topology Discovery

Analytical Alias Resolution Potential Solutions How to verify the accuracy of inferred IP aliases No loop condition: No inferred IP aliases should introduce any routing loops in any of the path traces Example: Consider two traces (…, a, b, c, d, …) (…, e, f, g, h, b, i, …)(reverse trace) Assume a subnet relation (g ↔ c) Inferred alias pair: (b,g) CAUSES LOOP! 69 Internet Topology Discovery

Analytical Alias Resolution Potential Solutions How to verify the accuracy of inferred IP aliases Common neighbor condition: Given two IP addresses s and t that are candidate aliases belonging to a router R, one of the following cases should hold: 1.s and t have a common neighbor in some path trace 2.There exists an alias pair (b,o) such that b is a successor (or predecessor) of s o is a predecessor (or successor) of t 3.involved traces are aligned such that they form two subnets, one at each side of router R Distance condition: Given two IP addresses s and t that are candidate aliases for a router R, s and t should be at similar distance to a vantage point Adds an active probing component to the solution 70 Internet Topology Discovery

AMP: ally (1,884 pairs) and APAR (2,034 pairs) iPlane: ally (39,191 pairs) and APAR (50,206 pairs) 71 Evaluations Coverage Comparisons 1,003 Causing LoopAlly APARAlly disagree AllyAPAR Ally disagree Causing loop Source IP based 11,070 2,514 8,206 3,058 6,179 iPlane10,67822,886 ? Complete ally requires (275K) 2 probes Internet Topology Discovery

72 Summary Analytical and Probe-base Alias Resolution IP alias resolution task has a considerable effect on most of the analyzed topological characteristics In general, false negatives have more impact than false positives. APAR benefits from IP address assignment of links, focuses on structural connections between routers, more effective on data sets that include symmetric path segments collected from large number of vantage points requires no/minimal probing overhead. complements probe-based approaches Internet Topology Discovery

Outline Introduction Internet Topology Measurement Topology Discovery Issues Impact of IP Alias Resolution Topology Discovery Resolving Anonymous Routers Graph-based Induction Technique Resolving Alias IP Addresses Analytical and Probe-based Alias Resolution Resolving Genuine Subnets Dynamic Subnet Inference Summary 73 Internet Topology Discovery

74 Genuine Subnet Resolution Problem Subnet resolution Identify IP addresses that are connected over the same medium Improve the quality of resulting topology map IP2 IP3 IP1 IP2IP3 IP1 Internet Topology Discovery (observed topology)(inferred topology)(underlying topology) CD AB CD AB CD AB CD AB

Subnet Resolution: Advantages Improve the quality of resulting topology map vs Increase the scope of the map (observed topology)(inferred topology)(genuine topology) CD AB CD AB CD AB CD AB CD AB CD AB 75 Internet Topology Discovery

Subnet Resolution: Advantages Improve alias resolution process Reduce the number of probes in ally based alias resolution ally tool requires O(n 2 ) probes to resolve aliases among n IP addresses. We could determine ally probes based on subnets This approach reduces the number of probes to O(n.s) where s is the average of number of IP addresses in a subnet. Trace: IP a ……...IP b ……... IP c ……... IP d IP e IP f IP g IP h IP i IP k IP l subnets 76 Internet Topology Discovery

77 Subnet Resolution: Approach Importance of IP Alias Resolution /16 /30 /31 /24 /28 / / / / / / / / /31

78 Genuine Subnet Resolution Trace Preservation / / /21 /30 /31 /24 /28 / / / / / / / / / / / /22 Internet Topology Discovery

Genuine Subnet Resolution Distance Preservation V.P. /30 /31 /24 /28 / / / / / / / / / / / / / /31 Internet Topology Discovery

80 Genuine Subnet Resolution Dynamic Subnet Inference Approach Inferring Subnets Cluster IP addresses into maximal subnets up to a given size (e.g. /24) Perform accuracy and distance analysis on candidate subnets and break them down as necessary. IP1 IP2 IP3 IP4 IP5 IP6 IP7 IP8 IP9 Completeness: Ignore candidate subnets that have less than one quarter of their IP addresses present. /25 /29 /26 /30 /31 /27 A /27 subnet can have up to 2 5 IP addresses. /24 Internet Topology Discovery

Internet2 backbone topology on Apr 29, 2007 Inferred 116 verifiable subnets 95 exact size 12 smaller (observed IPs formed a smaller subnet) 9 bigger (false positives) 81 Evaluations Internet2 backbone verification 150 subnets 547 routers 793 IPs R1 H1 1 R4 9 R2 R3 2 6 R /29 R10 2 R /28 R2 6 R6 1 R1 /29 Internet Topology Discovery

82 Identified a new step (i.e., subnet inference) to improve topology mapping studies. Introduced a technique to infer subnets and demonstrated its effectiveness Detect connectivity between nodes An inferred /24 subnet had only a single link between two of its 73 observed IP addresses. Using subnets, we may reduce the number of ally probes for alias IP resolution e.g. 362K to 35.5K. Summary Genuine Subnet Resolution Internet Topology Discovery

Outline Introduction Internet Topology Measurement Topology Discovery Issues Impact of IP Alias Resolution Topology Discovery Resolving Anonymous Routers Graph-based Induction Technique Resolving Alias IP Addresses Analytical and Probe-based Alias Resolution Resolving Genuine Subnets Dynamic Subnet Inference Summary 83 Internet Topology Discovery

84 Summary The Internet is man-made, so why do we need to measure it? Because we still don’t really understand it Sometimes things go wrong Measurement for network operations Detecting and diagnosing problems What-if analysis of future changes Measurement for scientific discovery Creating accurate models that represent reality Identifying new features and phenomena Researchers have been sampling and analyzing Internet topology Building network graph from raw-data is not easy. There are several issues due to sampling Resolving anonymous routers, IP aliases, and genuine subnets Huge computational and probing overhead due to very large data size Internet Topology Discovery

Questions ? Internet Topology Discovery 85