Presentation is loading. Please wait.

Presentation is loading. Please wait.

Comparative Analysis of Internet Topology Data Sets

Similar presentations


Presentation on theme: "Comparative Analysis of Internet Topology Data Sets"— Presentation transcript:

1 Comparative Analysis of Internet Topology Data Sets
Jay Thom

2 Outline Introduction Background Related Works Project Goal Conclusion
Why is this important? Background History Internet Topology Measurement Related Works Data Sources Papers Project Goal Data Collection and Preparation Analysis Conclusion

3 Why Is This Important? My motivation: Internet Topology Measurement
Our project Challenges with data collection Dealing with large amounts of data What can be learned? Project goals

4 Some History… Circuit Switching
First commercial circuit switched networks appear

5 Some History… Packet Switching
First packet-switched network (ARPANET) sends a message from UCLA to Stanford, 1969 Attempted to send the word “LOGIN”, but crashed, sending the message “LO”

6 History of the Internet
1970 – First network protocol introduced: Network Control Protocol (NCP) 1970 – First network applications begin to appear 1972 – First ’hot’ application introduced; 1982 – TCP/IP protocol introduced 1986 – NSFNET interconnects computer centers at several universities 1990 – First commercial connections to the network appear 1991 – World Wide Web goes live, widespread access to the network begins Why is all of this important?

7 The Network of Networks
Internet is made up of 55,483 autonomous systems (Ases) Each system is managed independently Cooperation within the network is voluntary ASes seek to improve their own performance, maintain competitive relationships Topological details of each AS is undisclosed, proprietary How do we know what the Internet actually looks like? What tools do we have to monitor it?

8 Measurement Tools – Ping

9 Measurement Tools - Traceroute

10 Measurement Platforms
CAIDA – Archipelago (Ark) Measurement Lab (M-Lab) University of Washington Information Plane (iPlane) Ripe NNC Atlas University of Southern California ISI Ant Census PlanetLab

11 Ark Statistics CAIDA – Center for Applied Internet Data Analysis
University of California San Diego Measurement and data curation (archives) 165 monitors in 57 countries Growing by 766 million traces per month Growing by 316 GB per month stored 41 billion traces performed (total) Data stored in a binary format (.warts file) Began running in 1998

12 Ark Raspberry Pi Monitor

13 Ark Monitor Locations

14 Ripe Atlas Ripe NNC – Reseaux IP Europeens Network Coordination Centre
Regional Internet Registry (RIR) for Europe, Middle East, and Central Asia Based in Amsterdam, Netherlands 13,554 probes (small network devices) 208 anchors (rack-mounted servers) Traces are performed regularly from probes to anchors Anchors can also perform user-defined traces to any IP address Traces are stored at anchors, and can be downloaded in .json format Established in 1992

15 Ripe Atlas Hardware Rack mounted anchor
Small probe (connected anywhere)

16 Ripe Atlas Anchors

17 Ripe Atlas Probes

18 PlanetLab Nodes hosted by corporate/academic institutions
1090 nodes in 507 countries Affiliated users are granted a ”slice” to run experiments on most* available nodes UNR has 2 PlanetLab nodes in operation Used by M-Lab and iPlane as vantage points for measurements Slowly dying…only about 176 nodes currently in operation * Some nodes are reserved for exclusive use by M-Lab

19 PlanetLab

20 M-Lab Utilizes PlanetLab nodes to generate traces
Supports a number of tools, and archives all data making it accessible to the public. such as: Glasnost - detects prioritization or censorship of network traffic Network Diagnostic Tool (NDT) – measures TCP performance Neubot – for studying broadband performance NPAD – diagnoses issues in a network plan to improve performance OONI – censorship, surveillance, traffic manipulation. Paris Traceroute – network topology mapping SideStream – TCP state information Mlab-collectd – monitors M-Lab slices on PlanetLab A collaborative effort involving Google, New America, Princeton University, and others.

21 iPlane Utilizes PlanetLab nodes to perform traceroutes to infer router level topology. Clusters interfaces into PoPs: iPlane clusters interfaces that are in the same Point of Presence (PoP). For this, every interface in the router atlas is probed using UDP and ICMP packets. Interfaces that respond with the same source address or have similar return TTLs to all the vantage points are clustered together. Measures link attributes: loss rate, bandwidth capacity of all inter-cluster links. Performs route prediction: iPlane composes segments of observed Internet paths to predict the end-to-end path between any pair of end-hosts, and uses this prediction to estimate end-to-end performance for overlay services.

22 Ant Census Started in 2003 Scans IPv4 space every 42 days
4.3 billion addresses Utilizes ICMP ping, looks for response Shows ownership of all 256 /8 subnets Brighter – more replies About 6% of addresses respond

23 Problems Extract all files and convert to a common format
Each platform storing data in a different format on a different file system Ripe – uncompressed .json format, data stored at each anchor Ark – compressed .warts files (binary), nested file system, password protected iPlane – compressed binary files, custom format (C++), password protected M-Lab – Google Cloud storage, compressed files within files Ant Census – released once every two months Extract all files and convert to a common format 1 month = 1TB of data

24 Related Work Pietro Marchetta et al, “Topology discovery at the router level: a new hybrid tool targeting ISP Networks”, 2011 An attempt to find a new way to collect measurements without traceroute and alias resolution. Introduce a new tool, Merlin, a central server controls distributed vantage points to probe a targeted AS. Uses MRINFO, a ping-like tool to monitor active multicast groups, utilizes IGMP messages ASK_NEIGHBORS and NEIGHBORS_REPLY. Deployed recursively to all routers in an AS and gives a listing of each router’s multicast neighbors. Performs worse than traceroute for inter-AS measurements, but much better for mapping the core of an AS.

25 Related Work Benoit Donnet and Timur Friedman, “Internet Topology Discovery: A Survey”, 2007 Discusses the four levels of Internet topology: IP interface level, Router level, PoP level, and AS level Seek to build a formal graph from network measurement data From this perspective, study characteristics of the network in terms of average degree, degree distribution, clustering coefficient, and between-ness centrality Would like to build a visualization of the network based on these factors No actual data collection.

26 Related Work Hakan Kardes, Mehmet Gunes, Talha Oz, “Cheleby: A subnet-level Internet Topology Mapping System”, 2012 Divide available PlanetLab nodes into 7 teams based on geographic location Each node is assigned a block of IP addresses from other sources, plus the first address from each /24 subnet (to reach all subnets) Each monitor probes 4 destination blocks at a time, and each block is only probed by 1 monitor Each process is independent, and eventually all blocks are probed by all monitors Amassed data set is analyzed to discover features in the network topology (i.e. number of nodes, edges, alias resolution statistics)

27 Related Work Kimberly Claffy et al, “Internet Mapping: from Art to Science”, 2009 Seeking to improve on previous tool (SKITTER) This was the paper written to introduce the Ark project by CAIDA Intend to amass largest data set Plan to build repository for measurement data from their study as well as others Will make data available to research community

28 Related Work Bradley Huffaker et al, “Internet Topology Data Comparison”, 2012 Studies topology at IP, Router, and AS level Notes that different topology studies are producing conflicting results Discusses metrics; Average node degree Graph size Number of edges Node degree distribution Clustering Mean local clustering Discusses possible inaccuracies in previous works because of a lack of IP alias resolution leading to an over-estimation of the number of routers

29 Related Work Vaibhav Bajpai and Jorgen Schownwilder, “A Survey on Internet Performance Measurement Platforms and Related Standardization Efforts”, 2015 Survey of network performance tools Focus on performance metrics rather than topology (bandwidth, reachability, censorship, throttling, etc. Consider some of the same platforms, but from another perspective

30 Related Work John Heidemann et al, “A Survey of the Visible Internet”, 2008 Many hosts are hidden (firewalls, private IP space), but there is much to be learned from the visible address space Census: walk the entire address space and look for responsive hosts Survey: frequently sample a fraction of that space Some results: 3.6% of the allocated space are actually occupied by visible hosts ¼ of responsive /24 subnet blocks are less than 5% filled 9% of responsive /24 subnet blocks are more than ½ filled 16% (34 million IPs) are responsive and stable Estimated from this, 60 million Internet-accessible computers exist

31 Conclusion History Why is this important? Measurement Platforms
Problems Related Work Project Goals

32


Download ppt "Comparative Analysis of Internet Topology Data Sets"

Similar presentations


Ads by Google