BGP-lens: Patterns and Anomalies in Internet Routing Updates B. Aditya Prakash 1, Nicholas Valler 2, David Andersen 1, Michalis Faloutsos 2, Christos Faloutsos 1 1 Carnegie Mellon University 2 UC-Riverside KDD 2009, Paris
Introduction Border Gateway Protocol (BGP) – Internet Routing Protocol – Router sending messages to each other – Keeps path information up-to-date Ideal Setting - no BGP updates Really – many updates – link failures, router restarts, malicious behavior 2 TimepeerASoriginASprefix :39:42ATTSPRINT / :39:43VERIZONAOL / :39:46WASHATLA /24 …. Each Row is an update
Introduction contd. Question: Find patterns/anomalies? Challenges: – Millions of updates sent over network – Data has multiple dimensions – Noisy Measurements – Impossible for human to sift through updates 3 Automated Tool needed!
The Data TimepeerASoriginASprefix :39:42ATTSPRINT / :39:43VERIZONAOL / :39:46WASHATLA /24 …. Data from Datapository.net Abilene Network 4 18 million update messages – over two years!
Our Approach Look at a simple time-series Focus on just the time # of updates received every b seconds (bin size) Specific Problem we are tackling – Given such time-series – Report patterns and anomalies Also find suspicious entities (paths, ASes etc.) 5 Time :39: :39: :39: :40:01 …. TimepeerASoriginASprefix :39:42ATTSPRINT / :39:43VERIZONAOL / :39:46WASHATLA /24 …. b secs time Bin: 01 2 … Count: 42 6 …
Real data: Washington Router Very Bursty! Traditional Tools like FFT, auto-regression don’t work 6 # of Updates Bin number (‘Time’) Bin Size = 600s
Outline Introduction and Problem Statement Techniques – Temporal Analysis – Frequency Analysis BGP-lens at work Conclusions 7
Temporal Analysis First Cut: Take log-linear plot – emphasizes small values over high values 8 Bin size: 10s
9 But: Bin size is important!
10 ‘Clotheslines’ Bin size: 600s
Clotheslines Q1: Why Clotheslines? – Near consecutive updates over long time-period – Can be Route Flapping advertise/withdraw same path frequently important to identify Q2: How to automate this discovery? 11
Proposal: Marginals to Rescue PDF of volume of updates – Number of time-bins with volume Extremes == Height of the clotheslines! 12
Marginals to Rescue PDF of volume of updates – Number of time-bins with volume 13
Algorithm - Clotheslines For marginals plot use the median filtering approach to determine ‘outliers’; For each time interval found, report the most consistent IPs/ASes etc. High Level Idea only – details in paper! 14
Outline Introduction and Problem Statement Techniques – Temporal Analysis – Frequency Analysis BGP-lens at work Conclusions 15
16 Low Freq. High Freq. High energyLow energy ‘Tornado’ does not touch down time -> Signal
In real data… 17 E2
18 E2 ~ 20,000 updates! ~ 8 hrs
Why Prolonged Spike? Bursts of short duration Can represent malicious behavior – Or simple router restarts! Exact cause hard to find – but important for system-administrators 19
Algorithm – Prolonged Spikes Basic idea: find tornados from scalogram Find suitable starting point at higher levels Extend downward as much as possible The finest scale where tornado stops – the shortest time period to look for a prolonged spike Again, details in paper! 20
Scalability 21
BGP-lens: User Interface 22 # of suspicious events sysadmin wants to check duration: length of events to be checked (think daily vs weekly vs monthly) optional
Outline Introduction and Problem Statement Techniques – Temporal Analysis – Frequency Analysis BGP-lens at work Conclusions 23
BGP-lens at Work We found real events too. examples- Event 1: 50-clothesline – Prefix and Origin-AS pointed to Alabama Supercomputing Net – When contacted sysadmins attributed changes to route flapping “the route for /24 was appearing and disappearing in [the] IGP routing table... [which] may have caused BGP to flap.” – Anomaly went undetected and unresolved for 30 days! 24
Results from real data 25 Event 2 Prolonged Spike – May 12 th 2006 – 8hr spike – Most persistent IPs/ASes Primary and middle schools in a large district in a country – Two more spikes Jan18-19, 2006 and Aug 1
Conclusions Studied huge real data (~18 million updates) Developed two new techniques – effective spots subtle phenomena like clotheslines and prolonged spikes – scalable BGP-lens: a user-friendly tool provides reasonable defaults provides easy-to-use knobs leads like IPs/ASes 26
Thank You! Any questions? – We thank NSF, USA for their support. Author-Reel! 27
Extra - Frequency Analysis Data is self-similar! – we used the entropy-plot measure – also called the b-model [26] – Corresponds to b-model of – Multi-resolution techniques needed! 28
Extra - FFT 29
Extra – Marginals for 10sec 30
Extra – Prolonged Spike Algorithm 31