University of Nevada, Reno Resolving Anonymous Routers Hakan KARDES CS 790g Complex Networks
Outline Introduction Anonymous router resolution – Problem – Previous approaches Anonymity types Anonymity resolution via graph-based induction (GBI) Conclusions 2 CS 790g: Complex Networks
Internet Topology Measurement: Internet topology measurement studies Involves topology collection / construction / analysis Current state of the research activities Distributed topology data collection studies/platforms – Skitter, AMP, iPlane, Dimes, DipZoom, … – 20M path traces with over 20M nodes Issues in topology construction 1.Verifying accuracy of path traces 2.IP alias resolution 3.Subnet inference 4.Anonymous router resolution CS 790g: Complex Networks 3
Topology Collection (traceroute) Probe packets are carefully constructed to elicit intended response from a probe destination traceroute probes all nodes on a path towards a given destination – TTL-scoped probes obtain ICMP error messages from routers on the path – ICMP messages includes the IP address of intermediate routers as its source Merging end-to-end path traces yields the network map Internet Topology Discovery 4 S DABC Destination TTL=1 IP A TTL=2 IP B TTL=3 IP C TTL=4 IP D Vantage Point Details
Outline Introduction Anonymous router resolution – Problem – Previous approaches Anonymity types Anonymity resolution via graph-based induction (GBI) Conclusions 5 CS 790g: Complex Networks
Anonymous routers do not respond to traceroute probes and appear as in traceroute output – Same router may appear as in multiple traces. Internet Topology Discovery 6 y: S – L – H – x x: H – L – S – y y: S – – H – x x: H – – S – y S L H y x S L H y x y S 11 22 H x Current daily raw topology data sets include ~ 20 million path traces with ~ 20 million occurrences of s along with ~ 500K public IP addresses The raw topology data is far from representing the underlying sampled network topology Problem
7 Internet2 backbone Traces x - H - L - S - y x - H - A - W - N - z y - S - L - H - x y - S - U - K - C - N - z z - N - C - K- H - x z - N - C - K - U - S - y S L U K C H A W N y x z CS 790g: Complex Networks Problem
Internet2 backbone S L U K C H A W N y x z Traces x - - L - S - y x - - A - W - - z y - S - L - - x y - S - U - - C - - z z - - C - - - x z - - C - - U - S - y CS 790g: Complex Networks 6 Problem
Internet Topology Discovery 9 UKCN LHAW S d e f Sampled network d e f S U L C A W Resulting network Traces d - - L - S - e d - - A - W - - f e - S - L - - d e - S - U - - C - - f f - - C - - - d f - - C - - U - S - e Problem
Basic heuristics – IP: Combine anonymous nodes between same known nodes [Bilir 05] Limited resolution – NM: Combine all anonymous neighbors of a known node [Jin 06] High false positives More theoretic approaches – Graph minimization approach [Yao 03] Combine s as long as they do not violate two accuracy conditions: (1) Trace preservation condition and (2) distance preservation condition High complexity O(n 5 ) – n is number of s – ISOMAP based dimensionality reduction approach [Jin 06] Build an n x n distance matrix then use ISOMAP to reduce it to a n x 5 matrix Distance: (1) hop count or (2) link delay High complexity O(n 3 ) – n is number of nodes 10 UK C N L HA W S x y z Sampled network x y z S U L C A W After resolution x y z S U L C A W H x y z S U L C A W Resulting network CS 790g: Complex Networks Previous Approaches
Outline Introduction Anonymous router resolution – Problem – Previous approaches Anonymity types Anonymity resolution via graph-based induction (GBI) Conclusions 11 CS 790g: Complex Networks
Anonymity Types Type 1: Do not send any ICMP responses Type 2: Rate limit ICMP responses Type 3: Do not send ICMP responses when congested Type 4: Filtered ICMP responses at border routers Type 5: ICMP responses with private source IP address 12 CS 790g: Complex Networks
Graph Based Induction (GBI) - Our Approach Graph based induction – A graph data mining technique Find frequent substructures in a graph data Commonly used in mining biological and chemical graph data Use of GBI for anonymous router resolution – Observe common graph structures due to anonymous routers – Develop localized algorithms with manageable computational and storage overhead – Trace Preservation Condition Merge anonymous nodes as long as they cause no loops in path traces 13 CS 790g: Complex Networks
Common Structures 14 A x C y2 A x C Parallel -substring y1 y3 y1 y3 DA wx C y E z DA wx C y E z Star A C x y D w F v E z A C x y D w F v E z Complete Bipartite A C x y D w E z A C x y D w E z Clique CS 790g: Complex Networks
Parallel -substring Algorithm For each -substrings (a, i,c), represent it as a tuple (a||c, i ) – a||c is the tuple identifier and a<c Read path traces and build the sorted list L of two tuples Subsequently read tuples are compared to the ones in the list based on tuple identifiers and duplicates are excluded from L Handling anonymity due to ICMP rate limiting or congestion A second scan of path traces looking for substrings of the form (a,b,c) corresponding to (a, i,c) in L 15 a c b a c b CS 790g: Complex Networks
Clique Generate a new graph G* = (V*,E*) – For each -substring of type (a, e, b), V* ← V* U {a, b} E* ← E* U {e(a,b)} First identify 4-cliques and grow them by adding nodes that are connected to at least 4 nodes of the structure – Helps in tolerating few missing links in large cliques Then, process all 3-cliques 16 a c d e a c d e a c d e CS 790g: Complex Networks
Complete Bipartite First search for a small size, i.e., K 2,3, complete bipartite structure in G* and then grow it to a larger one – Take each pair of nodes and look whether they are in a K 2,3 – Identifying a K 2,3, look for larger complete bipartite graphs K 2,m and then K n,m that contain the identified K 2,3. Then, process all K 2,2 ’s 17 A C D F E A C D F E In G C D F E In G* In G A CS 790g: Complex Networks
Star Combine anonymous neighbors of a known node under trace preservation condition – Starting from ones with smallest number of anonymous neighbors 18 DA w C y E z DA w C y E z Note: Operate on G and not on G* CS 790g: Complex Networks
Outline Introduction Anonymous router resolution – Problem – Previous approaches Anonymity types Anonymity resolution via graph-based induction (GBI) Conclusions CS 790g: Complex Networks 19
Summary Internet Topology Discovery20 DA C E GBI DA C E Underlying DA C E Collected DA C E Neighbor Matching Responsiveness reduced in the last decade NP-hard problem Graph Based Induction Technique Practical approach for anonymous router resolution Identifies common structures Handles all anonymity types Helpful in resolving multiple anonymous routers in a locality Uses subnet info to reduce the false postives
References M. H. Gunes and K. Sarac. Resolving anonymous routers in internet topology measurement studies. In IEEE INFOCOM, Apr S. Bilir, K. Sarac, and T. Korkmaz. Intersection characteristics of end-to-end Internet paths and trees. IEEE International Conference on Network Protocols (ICNP), Boston, MA, USA, November A. Broido and K. Claffy. Internet topology: Connectivity of IP graphs. Proceedings of SPIE ITCom Conference, Denver, CO, USA, August B. Cheswick, H. Burch, and S. Branigan. Mapping and visualizing the Internet. ACM USENIX,San Diego, CA, USA, June B. Yao, R. Viswanathan, F. Chang, and D. Waddington. Topology inference in the presence of anonymous routers. IEEE INFOCOM, San Francisco, CA, USA, March P. Tan, M. Steinbach, and V. Kumar. Introduction to data mining. Addison-Wesley, Reading, MA, USA, X. Jin, W.-P. K. Yiu, S.-H. G. Chan, and Y. Wang. Network topology inference based on end-to-end measurements. IEEE Journal on Selected Areas in Communications special issue on Sampling the Internet, 24(12):2182{2195, Dec D. Cook and L. Holder. Mining graph data. John Wiley & Sons, T. Matsuda, H. Motoda, and T.Washio. Graph-based induction and its applications. Advanced Engineering Informatics, 16(2):135{1434, April Michihiro Kuramochi, George Karypis, "Frequent Subgraph Discovery," Data Mining, IEEE International Conference on, pp. 313, First IEEE International Conference on Data Mining (ICDM'01), Michihiro Kuramochi, George Karypis, "An Efficient Algorithm for Discovering Frequent Subgraphs," IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp , September, Inokuchi, A., Washio, T., and Motoda, H Complete Mining of Frequent Patterns from Graphs: Mining Graph Data.Mach. Learn. 50, 3 (Mar.2003), DOI= Inokuchi, A., Washio, T., and Motoda, H A General Framework for Mining Frequent Subgraphs from Labeled Graphs.Fundam. Inf. 66, 1-2 (Nov. 2004),
QUESTIONS