Stable and Practical AS Relationship Inference with ProbLink

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Measuring the Deployment of IPv6: Topology, Routing, and Performance Amogh Dhamdhere, Matthew Luckie, Bradley Huffaker, kc claffy (CAIDA / UC San Diego)
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
Quantitative Analysis of BGP Route Leaks Benjamin Wijchers Benno Overeinder.
Inferring Autonomous System Relationships in the Internet Lixin Gao.
Something We Always Wanted to Know about ASs: Relationships and Taxonomy Dmitri Krioukov X. Dimitropoulos, M. Fomenkov, B. Huffaker, Y.
Part II: Inter-domain Routing Policies. March 8, What is routing policy? ISP1 ISP4ISP3 Cust1Cust2 ISP2 traffic Connectivity DOES NOT imply reachability!
Progress in inferring business relationships between ASs Dmitri Krioukov 4 th CAIDA-WIDE Workshop.
Transportation mode detection using mobile phones and GIS information Leon Stenneth, Ouri Wolfson, Philip Yu, Bo Xu 1University of Illinois, Chicago.
On the Constancy of Internet Path Properties Yin Zhang, Nick Duffield AT&T Labs Vern Paxson, Scott Shenker ACIRI Internet Measurement Workshop 2001 Presented.
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
Near-Deterministic Inference of AS Relationships Udi Weinsberg A thesis submitted toward the degree of Master of Science in Electrical and Electronic Engineering.
Root cause analysis of BGP routing dynamics Matt Caesar, Lakshmi Subramanian, Randy H. Katz.
Evaluation of Results (classifiers, and beyond) Biplav Srivastava Sources: [Witten&Frank00] Witten, I.H. and Frank, E. Data Mining - Practical Machine.
University of Massachusetts, Amherst 1 On the Evaluation of AS Relationship Inferences Jianhong Xia and Lixin Gao Department of Electrical and Computer.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
GS 3 GS 3 : Scalable Self-configuration and Self-healing in Wireless Networks Hongwei Zhang & Anish Arora.
DARPA NMS PI Meeting November 14, 2002 Understanding BGP in Action Dan Massey USC/ISI.
1 Software Testing Techniques CIS 375 Bruce R. Maxim UM-Dearborn.
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
New Directions and Half-Baked Ideas in Topology Modeling Ellen W. Zegura College of Computing Georgia Tech.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
BGP ANYCAST Simulations Using GTNetS (work in progress) Talal Jaafar Georgia Tech & CAIDA.
Popularity versus Similarity in Growing Networks Fragiskos Papadopoulos Cyprus University of Technology M. Kitsak, M. Á. Serrano, M. Boguñá, and Dmitri.
How Secure are Secure Inter- Domain Routing Protocols? SIGCOMM 2010 Presenter: kcir.
On AS-Level Path Inference Jia Wang (AT&T Labs Research) Joint work with Z. Morley Mao (University of Michigan, Ann Arbor) Lili Qiu (University of Texas,
TDTS21: Advanced Networking Lecture 7: Internet topology Based on slides from P. Gill and D. Choffnes Revised 2015 by N. Carlsson.
David Wetherall Professor of Computer Science & Engineering Introduction to Computer Networks Hierarchical Routing (§5.2.6)
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Aemen Lodhi (Georgia Tech) Amogh Dhamdhere (CAIDA)
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
Characterizing the Uncertainty of Web Data: Models and Experiences Lorenzo Blanco, Valter Crescenzi, Paolo Merialdo, Paolo Papotti Università degli Studi.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
Advancements in the Inference of AS Relationships Xenofontas Dimitropoulos (Fontas) (CAIDA/GaTech) Dmitri Krioukov Bradley Huffaker k claffy George Riley.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
CSE534- Fundamentals of Computer Networking Lecture 12-13: Internet Connectivity + IXPs (The Underbelly of the Internet) Based on slides by D. Choffnes.
Scaling Properties of the Internet Graph Aditya Akella, CMU With Shuchi Chawla, Arvind Kannan and Srinivasan Seshan PODC 2003.
HLP: A Next Generation Interdomain Routing Protocol Lakshminarayanan Subramanian, Matthew Caesar, Cheng Tien Ee, Mark Handley, Morley Mao, Scott Shenker,
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
The Canopies Algorithm from “Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching” Andrew McCallum, Kamal Nigam, Lyle.
Inferring AS Relationships. The Problem  One view  AS relationships  BGP route tables  The other view  BGP route tables  AS relationships  Available.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Doing Don’ts: Modifying BGP Attributes within an Autonomous System Luca Cittadini, Stefano Vissicchio, Giuseppe Di Battista Università degli Studi RomaTre.
1 On the Impact of Route Monitor Selection Ying Zhang* Zheng Zhang # Z. Morley Mao* Y. Charlie Hu # Bruce M. Maggs ^ University of Michigan* Purdue University.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Semi-Supervised Clustering
Machine Learning – Classification David Fenyő
Inferring Autonomous System Relationships in the Internet Lixin Gao Dept. of Electrical and Computer Engineering University of Massachusetts, Amherst.
Software Testing.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
L. Cittadini, G. Di Battista, M. Rimondini, S. Vissicchio
Summary Presented by : Aishwarya Deep Shukla
Dipartimento di Ingegneria «Enzo Ferrari»,
ISP and Egress Path Selection for Multihomed Networks
TT-Join: Efficient Set Containment Join
Amogh Dhamdhere (CAIDA/UCSD) Constantine Dovrolis (Georgia Tech)
Combining Species Occupancy Models and Boosted Regression Trees
Measured Impact of Crooked Traceroute
Naïve Bayes CSE487/587 Spring /17/2018.
Routing Protocols (RIP, OSPF, BGP).
Revision (Part II) Ke Chen
IT351: Mobile & Wireless Computing
iSRD Spam Review Detection with Imbalanced Data Distributions
COS 461: Computer Networks
CS 188: Artificial Intelligence Spring 2006
Kostas Kolomvatsos, Christos Anagnostopoulos
Learning and Memorization
Learning to Detect Human-Object Interactions with Knowledge
Presentation transcript:

Stable and Practical AS Relationship Inference with ProbLink Yuchen Jin1 , Colin Scott2, Amogh Dhamdhere3, Vasileios Giotsas4, Arvind Krishnamurthy1, Scott Shenker2 1University of Washington, 2UC Berkeley, 3CAIDA, 4Lancaster University

Autonomous System (AS) and BGP ASes: Google Comcast UW

Customer-to-provider (Cust-Prov) Peer-to-peer (Peer-Peer) AS Relationships ASes keep relationships confidential! Peering AS 1 AS 2 AS 3 AS 4 AS 5 Provider $$$ Customer Customer-to-provider (Cust-Prov) Customer AS pays the provider AS for transit Peer-to-peer (Peer-Peer) Settlement free

Motivation How does AS-Rank perform for actual applications? AS relationship inference application domains Understanding Internet evolution Identifying malicious ASes Detecting network congestion A research problem which has been studied for ~2 decades Gao (2001) Subramanian et al. (2002) Di Battista et al. (2003) Dimitropoulos et al. (2007) AS-Rank (2013) 99% accuracy (1% error rate) How does AS-Rank perform for actual applications?

Case Study: Route Leak Detection On November 5, 2012, Google’s services cannot be accessible in Asia for half an hour. AS Path: 3491, 15169 Prefix: 8.8.8.0/24 AS Path: 3491, 23947, 15169 Prefix: 8.8.8.0/24 AS 3491 PCCW Provider Peer-Peer Data AS Path: 23947, 15169 Prefix: 8.8.8.0/24 AS Path: 23947, 15169 Prefix: 8.8.8.0/24 Data AS Path: 15169 Prefix: 8.8.8.0/24 Customer AS 15169 Google AS 23947 Moratel Peer-Peer

Case Study: Route Leak Detection Valley-free assumption: Path consists of Cust-Prov links, followed by zero or one Peer-Peer link, followed by Prov-Cust links. (uphill – cross – downhill) Valley-free: Valley-free violation: Peer-Peer B C PCCW Prov-Cust Cust-Prov Moratel Google A Peer-Peer

Route Leak Detection Detection method: Check valley-free violations AS relationship validation dataset: relationships encoded using the Community attribute in BGP paths E.g., AS209 (CenturyLink) uses the Community 13570 to tag routes received from customers

Route Leak Detection Using AS-Rank Detect route leaks using AS-Rank: Low precision: only 20% of the route leaks detected using AS-Rank were real route leaks Low recall: 78% of the real leaks were missed

Outline AS-Rank does not meet the demands of actual applications. We develop a simple inference algorithm CoreToLeaf that achieves accuracy comparable to AS-Rank. Most links in the validation dataset are relatively easy to infer. We identify different subsets of the validation dataset that are hard to infer. We develop ProbLink - a probabilistic AS relationship inference algorithm. Evaluation

A Simple Algorithm: CoreToLeaf Infer a transit-free clique (i.e., Tier-1) ASes. For each path that traverses a Tier-1: Tier-1 Cust-Prov Prov-Cust A B 3356 C D BGP path1 Conflict can occur! Cust-Prov Prov-Cust E F 209 D C BGP path2 Label all remaining unclassified links as Peer-Peer.

CoreToLeaf vs. AS-Rank Implications: Most links are easy to infer April 1st, 2017 Algorithm Validation dataset Conflict (%) Route Leak Detection Precision Recall CoreToLeaf 97.0 97.3 0.12 19.8 22.1 AS-Rank 98.4 98.6 8.1 5.6 Implications: Most links are easy to infer Need to focus on hard links

Unlabeled Stub-clique Extract Hard Links Use gradient boosting decision tree to calculate feature importance Category CoreToLeaf error (%) AS-Rank Max node degree < 100 13.7 8.6 Observed by 50-100 VPs 4.7 9.3 Non-VP & Non-Tier1 5.3 9.0 Unlabeled Stub-clique 95.5 33.4 Conflict 100 8.1 April 1st, 2017 Fraction of hard links in the validation dataset are 3X fewer than that in the overall links The validation dataset is skewed to easy links

Outline AS-Rank does not meet the demands of actual applications. We develop a simple inference algorithm CoreToLeaf that achieves accuracy comparable to AS-Rank. Most links in the validation dataset are relatively easy to infer. We identify different subsets of the validation dataset that are hard to infer. We develop ProbLink - a probabilistic AS relationship inference algorithm. Evaluation

Feature Design The structure of paths that use the link Triplet feature The structure of paths that do not use the link Non-path feature Connectivity properties of the link Distance to clique Number of VPs observing a link Co-located IXP/peering facility

Triplet Feature Intuition: Most triplets are valley-free compliant, but we should tolerate some valleys. Likelihood of valleys is derived from data t1 t2 t3 A B C D t ∈ {Cust-Prov, Prov-Cust, Peer-Peer} Definition: Attribute probabilistic values for the relationships of the first and the last links given the relationship of the middle link – P(t1, t3|t2)

Non-path Feature Intuition: If none of the Peer-Peer/Prov-Cust links coming into an AS are followed by a specific link, then the link is likely not a Prov-Cust link. Definition: Attribute probabilistic values for the number of previous Peer-Peer/Prov-Cust links given the relationship of the next link. – P(# Peer-Peer + # Prov-Cust | t) X No path contains: X-B-C Y-B-C Z-B-C Y Prov-Cust t Z B C Peer-Peer

Connectivity Features AS1 AS2 Distance to clique feature: P(dist(AS1), dist(AS2) | t) Intuition: High-tier ASes are closer to clique ASes than low-tier ASes Peer-Peer links are typically between the ASes in the same tier Vantage point feature: P(# VPs observing AS1 -- AS2| t) Intuition: Prov-Cust links are more likely to be seen by more VPs Co-located IXP and co-located peering facility feature: (extracted from PeeringDB) P(# IXPs/facilities| t) Intuition: The more IXPs or facilities two ASes are co-located in, the more likely they are peering with each other

ProbLink Algorithm Use CoreToLeaf to come up with an initial labeling Repeat: Compute the conditional probability distribution of each feature based on current labels E.g., P(# VPs observing a link| t) Predict new label for each link using Naïve Bayes and conditional probability distributions Stop upon convergence

Outline AS-Rank does not meet the demands of actual applications. We develop a simple inference algorithm CoreToLeaf that achieves accuracy comparable to AS-Rank. Most links in the validation dataset are relatively easy to infer. We identify different subsets of the validation dataset that are hard to infer. We develop ProbLink - a probabilistic AS relationship inference algorithm. Evaluation

Unlabeled Stub-clique Accuracy Comparison How well can ProbLink perform across years? Compare inference error rates to that of AS-Rank: The whole validation dataset: 1.7X improvement Hard links: 1.8X to 6.1X improvement Category AS-Rank error (%) ProbLink Improvement Observed by 50-100 VPs 8.8 1.5 5.9X Non-VP & Non-Tier1 4.4 1.7 2.6X Unlabeled Stub-clique 33.6 5.5 6.1X Conflict 6.8 3.8 1.8X Max node degree < 100 8.6 2.0X

Stability Analysis AS-Rank is sensitive to snapshot selection ProbLink is consistently stable 30 consecutive 1-day snapshots

Route leak detection method: Check valley-free violations Precision Recall ProbLink 81.1% AS-Rank 19.8% CoreToLeaf 8.1% ProbLink 76.2% AS-Rank 22.1% CoreToLeaf 5.6%

Practical Applications Improvements over AS-Rank: Route leak detection Precision: 4.1X Recall: 3.4X Complex relationship inference Reveals 27% more complex relationships Predicting the impact of selective advertisement Increases the precision by 34%

Conclusion High accuracy in validation dataset does not translate to high application-level performance Most links in the validation dataset are easy to infer Constructed hard links sets and use them as benchmarks Developed ProbLink and allow for integration of many noisy but useful attributes Demonstrated that ProbLink is more effective and stable for real applications than previous techniques