BGP-lens: Patterns and Anomalies in Internet Routing Updates B. Aditya Prakash1, Nicholas Valler2, David Andersen1, Michalis Faloutsos2, Christos Faloutsos1 1Carnegie Mellon University 2UC-Riverside KDD 2009, Paris
Introduction Each Row is an update Border Gateway Protocol (BGP) Internet Routing Protocol Router sending messages to each other Keeps path information up-to-date Ideal Setting - no BGP updates Really – many updates link failures, router restarts, malicious behavior Time peerAS originAS prefix 2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24 2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24 2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24 ….
Automated Tool needed! Introduction contd. Question: Find patterns/anomalies? Challenges: Millions of updates sent over network Data has multiple dimensions Noisy Measurements Impossible for human to sift through updates Automated Tool needed!
The Data 18 million update messages – over two years! Time peerAS originAS prefix 2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24 2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24 2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24 …. Data from Datapository.net Abilene Network 18 million update messages – over two years!
Our Approach Look at a simple time-series Focus on just the time # of updates received every b seconds (bin size) Specific Problem we are tackling Given such time-series Report patterns and anomalies Also find suspicious entities (paths, ASes etc.) Time 2005-02-17 12:39:42 2005-02-17 12:39:43 2005-02-17 12:39:46 2005-02-17 12:40:01 …. Time peerAS originAS prefix 2005-02-17 12:39:42 ATT SPRINT 204.29.119.0/24 2005-02-17 12:39:43 VERIZON AOL 204.29.80.0/24 2005-02-17 12:39:46 WASH ATLA 204.29.79.0/24 …. time b secs Bin: 0 1 2 … Count: 4 2 6 …
Real data: Washington Router Bin Size = 600s Very Bursty! # of Updates Traditional Tools like FFT, auto-regression don’t work Bin number (‘Time’)
Outline Introduction and Problem Statement Techniques BGP-lens at work Temporal Analysis Frequency Analysis BGP-lens at work Conclusions
Temporal Analysis Bin size: 10s First Cut: Take log-linear plot emphasizes small values over high values
But: Bin size is important!
Bin size: 600s ‘Clotheslines’
Clotheslines Q1: Why Clotheslines? Q2: How to automate this discovery? Near consecutive updates over long time-period Can be Route Flapping advertise/withdraw same path frequently important to identify Q2: How to automate this discovery?
Proposal: Marginals to Rescue PDF of volume of updates Number of time-bins with volume Extremes == Height of the clotheslines!
Marginals to Rescue PDF of volume of updates Number of time-bins with volume
Algorithm - Clotheslines Details! For marginals plot use the median filtering approach to determine ‘outliers’; For each time interval found, report the most consistent IPs/ASes etc. High Level Idea only – details in paper!
Outline Introduction and Problem Statement Techniques BGP-lens at work Temporal Analysis Frequency Analysis BGP-lens at work Conclusions 1-2 sentences leading to wavelets (self-similar, same on many scales, used tests)
time -> High energy Low energy ‘Tornado’ does not touch down Low Freq. High energy Low energy ‘Tornado’ does not touch down High Freq. time -> Signal Basic tool – wavelet scalogram plots values of wavelet coefficients in the scale-space domain scale vs time lower frequencies on the top darker color -> high energy in the co-efficient at finer scale – fast short cycles at coarser scale – slow long cycles
In real data… E2
E2 ~ 8 hrs ~ 20,000 updates!
Why Prolonged Spike? Bursts of short duration Can represent malicious behavior Or simple router restarts! Exact cause hard to find – but important for system-administrators
Algorithm – Prolonged Spikes Details! Basic idea: find tornados from scalogram Find suitable starting point at higher levels Extend downward as much as possible The finest scale where tornado stops the shortest time period to look for a prolonged spike Again, details in paper!
Scalability
BGP-lens: User Interface optional # of suspicious events sysadmin wants to check Ready to be used as is scans multiple levels ranks deviations duration: length of events to be checked (think daily vs weekly vs monthly)
Outline Introduction and Problem Statement Techniques BGP-lens at work Temporal Analysis Frequency Analysis BGP-lens at work Conclusions
BGP-lens at Work We found real events too . examples- Event 1: 50-clothesline Prefix and Origin-AS pointed to Alabama Supercomputing Net When contacted sysadmins attributed changes to route flapping “the route for 207.157.115.0/24 was appearing and disappearing in [the] IGP routing table ... [which] may have caused BGP to flap.” Anomaly went undetected and unresolved for 30 days!
Results from real data Event 2 Prolonged Spike May 12th 2006 – 8hr spike Most persistent IPs/ASes Primary and middle schools in a large district in a country Two more spikes Jan18-19, 2006 and Aug 1
Conclusions Studied huge real data (~18 million updates) Developed two new techniques effective spots subtle phenomena like clotheslines and prolonged spikes scalable BGP-lens: a user-friendly tool provides reasonable defaults provides easy-to-use knobs leads like IPs/ASes
Thank You! Any questions? www.cs.cmu.edu/~badityap Author-Reel! We thank NSF, USA for their support. Author-Reel!
Extra - Frequency Analysis Data is self-similar! we used the entropy-plot measure also called the b-model [26] Corresponds to b-model of 75-25 Multi-resolution techniques needed!
Extra - FFT
Extra – Marginals for 10sec
Extra – Prolonged Spike Algorithm