Comparative Analysis of Internet Topology Data Sets

Comparative Analysis of Internet Topology Data Sets
Jay Thom

Outline Introduction Problem Statement Methodology Conclusion 2

Introduction What is Internet Topology? Why measure the Internet?
How is this done?

Topology Data Sets Caida-Archipelago (Ark) Measurement Lab (M-Lab)
Ripe NNC Atlas University of Washington iPlane ISI Ant Census Internet Research Lab (IRL) CIDR

The Problem… Big problem: Smaller problem:
What does the Internet look like right now? Smaller problem: Acquire data to infer this topology Collect data Recurring collection Python vs. C/C++ Parse data Collect statistical information Make comparisons

Data Collection Data stored in numerous formats…
Ripe - .json files at anchors Ark - .warts (scamper), compressed binary files iPlane - compressed binary files, iPlane.c M-Lab - Google cloud storage, nested compressed files Ant Census - Released every 2 months UCSD CAIDA (BGP Data) - compressed text files CIDR – compressed text files IRL – (BGP Data) - compressed text files Retrieve traceroute files as needed by date Python vs. C/C++

Data Cleaning and Parsing
Remove all un-necessary information Parse data into a common format Store in a consistent manner 30-day set vs. 5-day set 30-day set = 1TB 5-day set = 181GB reduce size to save time

Total Unique Source/Destination IP Addresses
For each data source, how many unique source or destination IP addresses are found? This will indicate the number of vantage points or targets the data source has access to. Question: does the number of vantage points/targets affect how much of the Internet a source can see? Question: what is the relationship between number of vantage points/targets and the number of unique traces, unique IP addresses, and unique edges found?

Total Unique Traces How many unique traces is each data source able to find? Why would one source find more than another? What mechanisms are present that would affect these numbers?

Total Unique Edges Visited
Question: what does an edge represent? Connection between two routers An ingress/egress point between two ASes

IP/Trace Counts Number of unique IP addresses vs. all collected IPs
Number of unique traces vs. all collected traces Question: Why is this important? How many times is a data source repeating the same measurements? How many duplicated efforts are seen? Why would this be?

Problem - Unresponsive Routers

Problem - Unresponsive Routers
Count as an edge? Keep, or disregard? If kept, how should they be noted?

Distribution per Source/Destination IP
Find the distribution of our data points per source and per destination Analyze this to understand the effectiveness of each platform’s approach to measurement IPs Traces Edges Sources Destinations

Firewalls, Loops, Repeated IP Addresses
A-B-C-C A-B-C-D-C A-B-C-C-D A-B-B-C

Ripe Atlas Hardware Rack mounted anchor
Small probe (connected anywhere)

Ripe Atlas: User Defined Measurements

Trace IP in traces not seen in Ant Census
Question: will some IP addresses be discovered in traces that were not found in the Ant census? Some addresses will respond to ICMP time exceeded that will not respond to ICMP echo request New IP addresses will be discovered that can then be used as active target IP addresses for future probes

Prefix Announcements vs. Mask Distribution
Question: What is the distribution of subnets that are announced by each AS data source: CAIDA, IRL, CIDR Why do some perform better than others?

Conflicts in Subnet Announcements
/30 /29 /29 /28 /28 /28 /28 Determine total number of subnets announced Combine all smaller subnets to see if they make up a complete larger subnet Compare larger subnets to see if they are announced by more than one AS in a data set Analyze to determine if sources clean up conflicts

Trace Data Coverage by BGP Data
Not all IP addresses found in our trace data will be visible by our AS data sources BGP data comes from RouteViews project, Univ. of Oregon May not see addresses, say somewhere in Asia Track statistics on IP addresses not found by data sources Track AS coverage per data source Track total number of prefixes announced by data source

AS Rank by Origin, Destination, IP, Edge
Rank ASes by the number of data points found in each per data source Compare coverage of ASes by each trace data source Question: Why are some ASes more visible to sources than others?

AS Coverage by Origin, Destination, IP, Edge
Track numbers of source iPs, destination IPs, total numbers of IPs, and total edges per AS by data source Rank sources based on these values (which source sees how many ASes per value) Create visual graphs of these ASes, collect and analyze graph data such as degree, centrality, etc. (use tool from CAIDA)

Conclusion Problem Statement Methodology Collection Parsing Statistics
Analysis Problems

Questions?

Thanks

Comparative Analysis of Internet Topology Data Sets

Similar presentations

Presentation on theme: "Comparative Analysis of Internet Topology Data Sets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Comparative Analysis of Internet Topology Data Sets

Similar presentations

Presentation on theme: "Comparative Analysis of Internet Topology Data Sets"— Presentation transcript:

Similar presentations

About project

Feedback