Download presentation
Presentation is loading. Please wait.
Published byMaria Anthony Modified over 9 years ago
1
Magellan: A Tool for Unicast Fault Isolation Cengiz Alaettinoglu Packet Design LLC Ramesh Govindan Information Sciences Institute John Mehringer Information Sciences Institute
2
Motivation Why can't I reach www.cnn.com? Why is the Internet soooo slow today? It was fine yesterday!
3
Goals User's perspective What is of interest to user Internet wide routing monitoring not just an AS History of route changes not just a snapshot Fault diagnosis link/router failure/repair
4
Challenges Scaling Directed search by correlating destinations Shared learning Automated heuristics for fault isolation Route change Location of link/router failure/repair Oscillations Others?
5
Data Collection Select target's interesting to the user tcpdump/libpcap Weighting / aging (not implemented) Initial path to targets traceroute Monitoring paths Carefully constructed ICMP probes
6
Snapshot
8
Monitoring Construct a routing graph Nodes: routers Links: (to, from, source, destination, hop, statistics...) Probe each link Send two ICMP Echo Request packets to destination For ttl = hop - 1, hop, verify incident routers, to, from
9
Scheduling Probes WRR schedule a probe for each link Limits the rate of probe packets Weights: some links are more important/interesting Distance to link No of destinations using it History of volatility Exponentially averaged
10
Test Result Positive Do nothing Negative Determine new path Incremental traceroute from the link upstream and downstream Determine cause Automatic heuristics based
11
Active Fault Isolation Link failure Probe the link using other destinations that uses it Correlate results Router failure Generalize on link failure Oscillations History of old routes Back and forth between a set of routes
12
Magellan Components Magellan Nam Perl Script Visualization Offline or real-time Great for debugging/tuning
13
Snapshot Link or router failure I want the nam buttons, etc...
14
Effectiveness thru Measurement Picked 500 popular web sites Yahoo, msn, aol, cnn,... www.web100.com Monitored routes to these destinations for 7 days
15
Measurements Number of Link Probes: 839694 Probe per second: 1.39 / second Total Failures: 2078 Router Failures: 334 Link Failures: 951 Unknown cause: 793 Transients Number of Oscillations: 541
16
No of Path Changes
17
Effect of Path Length
18
Dominant Path
19
Cumulative Dominant Path
20
Future work: Distributed Magellan Magellan 1 Magellan 2 Weight to probe inversely proportional to ratio of distances Shared learning
21
Related Work Topology Maps Router/AS level interconnections Mercator, skitter, AT&T Not all links are usable (routing policy/metrics) Routing Topology Effect of policy/metrics Npd Vern Paxson's work Focus is on measurement
22
Conclusions Unicast fault isolation User's perspective Automated heuristics History of changes http://www.isi.edu/scan
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.