Identifying problematic inter-domain routing issues Olaf Maennel, Anja Feldmann Saarland University, Saarbücken, Germany
Motivation BGP scalability?!! BGP convergence times??? A lot of open questions, that need understanding! What happens really in the Internet?
TOOL: “Character” Data munching automatic processing of raw data providing an intermediate level Characterizing BGP updates identification of update events
your function (or "Check" functions) TOOL: “Character” your function (or "Check" functions) results FileFinder - Package RAW-DATA
route change events Identification of routing updates type of changes, flapping, session resets, … Processing of updates in the context of related (same prefix) surrounding (near in time) How “character” works Input: table dump1 – all updates – table dump2
Output: route_btoa All updates like Merit’s "route_btoa –m" Timestamp Updated Prefix 1011363829|A|195.66.224.112|3549| 80.96.15.0/24|3549 3300 702 8708| 1011387198|W|195.66.224.112|3549| 80.96.15.0/24| | 1011387339|A|195.66.224.112|3549| 80.96.15.0/24|3549 701 702 8708| 1011387369|A|195.66.224.112|3549| 80.96.15.0/24|3549 3300 702 8708| 1010976980|W|195.66.224.112|3549|80.96.150.0/24| | 1010977007|A|195.66.224.112|3549|80.96.150.0/24|3549 209 1755 15471| AS Path All updates like Merit’s "route_btoa –m"
Example data sets RIPE’s RRC00: Jan 14, 2002 01:00 – Jan 20, 2002 01:10
Output: route_btoa Classification of each update is appended: Timestamp Updated Prefix 1011363829|A|195.66.224.112|3549| 80.96.15.0/24|3549 3300 702 8708| 1011387198|W|195.66.224.112|3549| 80.96.15.0/24| | 1011387339|A|195.66.224.112|3549| 80.96.15.0/24|3549 701 702 8708| 1011387369|A|195.66.224.112|3549| 80.96.15.0/24|3549 3300 702 8708| 1010976980|W|195.66.224.112|3549|80.96.150.0/24| | 1010977007|A|195.66.224.112|3549|80.96.150.0/24|3549 209 1755 15471| AS Path Classification of each update is appended:
Output: What has changed? #update change to last update |:|24.|199 |AA-DIFF|ASPath-way Community|3549|3320->3300|8708|origin | |:|25.|23369|AW-DIFF| | | | | | |:|26.|141 |WA-DIFF|ASPath-way Community|3549|3300->701 |702 |transit| |:|27.|30 |AA-DIFF|ASPath-way Community|3549|701->3300 |702 |transit| |:|1. |-1 |AW-DIFF| | | | | | |:|2. |27 |WA-DIFF|ASPath-way Community|3549|3300->209 |1755|transit| time since last update What has changed?
Type of changes
Output: AS Path changes last ‘stable’ AS |:|24.|199 |AA-DIFF|ASPath-way Community|3549|3320->3300|8708|origin | |:|25.|23369|AW-DIFF| | | | | | |:|26.|141 |WA-DIFF|ASPath-way Community|3549|3300->701 |702 |transit| |:|27.|30 |AA-DIFF|ASPath-way Community|3549|701->3300 |702 |transit| |:|1. |-1 |AW-DIFF| | | | | | |:|2. |27 |WA-DIFF|ASPath-way Community|3549|3300->209 |1755|transit| from where to where? rejoining AS
percentage of prefixes still reachable Output: Old AS Path AS on the “old” Path 3549__95%_ 3320__47%_ 5483_*15%* 8708__78%_| 2 |0. |22.|#8|flapping| 3549__95%_ 3300__65%_ 702__61%_ 8708_**3%*| 5 |3. |20.|#6| | 3549__95%_ 3300__65%_ 702__63%_ 8708__36%_| 5 |21.|21.|#1| | 3549__95%_ 701__66%_ 702__64%_ 8708__53%_| 3 |0. |24.|#9| | 3549__96%_ 3300__67%_ 1755__54%_ 15471_*21%*| * |* |* |* | | 3549__96%_ 3300__67%_ 1755__54%_ 15471__33%_| * |* |* |* | | percentage of prefixes still reachable
Sets of updates for a prefix with same attributes new change 1. duplicate 2. flapping 3. reconvergence 4. n-way change >4
Output: “n-way flapping” distance to last equal update reconvergence | 2 |0. |22.|#8|flapping|208326|85% |<- | | (8708)__72%_ 5483 | 5 |3. |20.|#6| | |8% |-1 | | (8708)__79%_ 702 | 5 |21.|21.|#1| | |8% |-2 | | (8708)__78%_ 702 | 3 |0. |24.|#9| | |8% |flap-3|23540| (8708)__78%_ 702 | * |* |* |* | | |100%| | |(15471)**95%* 1755 percentage of other prefixes by the originating AS identified as flapping first and last occurrence in update series flapping time to last flap
Categorization of changes
Probability distribution of distance between flaps
Time between equal updates
Session resets peering connection breakdown - a whole table must be exchanged Update storms are propagated through the internet… How big is the problem?
Output: possible session resets AS number (8708)__72%_ 5483**66%* 3320**28%* 3549___0%_| 2 |3320 5483| | (8708)__79%_ 702___5%_ 3300___3%_ 3549___0%_| | | | (8708)__78%_ 702___5%_ 3300___3%_ 3549___0%_| | |peak| (8708)__78%_ 702___5%_ 701___1%_ 3549___0%_| | |peak| (15471)**95%* 1755___0%_ 3549___0%_ 3300___0%_| 1 |15471 | | Percentage of updated vs. all associated prefixes with an AS.
Identification of session resets All prefixes updated
Output: possible session resets number of ASs involved (8708)__72%_ 5483**66%* 3320**28%* 3549___0%_| 2 |3320 5483| | (8708)__79%_ 702___5%_ 3300___3%_ 3549___0%_| | | | (8708)__78%_ 702___5%_ 3300___3%_ 3549___0%_| | |peak| (8708)__78%_ 702___5%_ 701___1%_ 3549___0%_| | |peak| (15471)**95%* 1755___0%_ 3549___0%_ 3300___0%_| 1 |15471 | | ASs involved
Updates due to session resets
Duration of session resets
Output: Classification further changes? |2|3320 5483| | 7.0|instable |... | | | | 5.9|instable |... | | |peak|16.2|instable |... | | |peak|16.2|re-stable change|... |1|15471 | | 1.3|instable |... |1|15471 | | 1.4|instable |... further suggestions?! peak identification update rate per second
Update burst Like packet flows Bursts consists of several updates same prefix short time window
Burst duration
Updates in burst
Output Character Classification of updates Statistical information Missing updates / verification
Ongoing work RTG – a realistic Routing Table (and update) Generator generation of tables and updates with ‘real-world’ characteristics Use RTG to benchmark router performance
If you are interested, please visit our website: Conclusion If you are interested, please visit our website: http://www.net.uni-sb.de/~olafm Thank you !