Download presentation
Presentation is loading. Please wait.
1
Infrastructure-based Resilient Routing
Ben Y. Zhao, Ling Huang, Jeremy Stribling, Anthony Joseph and John Kubiatowicz University of California, Berkeley ICSI Lunch Seminar, January 2004
2
Motivation Network connectivity is not reliable
Disconnections frequent in the Internet (UMichTR98,IMC02) 50% of backbone links have MTBF < 10 days 20% of faults last longer than 10mins IP-level repair relatively slow Wide-area: BGP 3 mins Local-area: IS-IS 5 seconds Next generation wide-area network applications Streaming media, VoIP, B2B transactions Low tolerance of delay, jitter and faults Distinquish between wide-area and local area, but new net apps are mostly wide-area November 8, 2018
3
The Challenge Routing failures are diverse
Many causes Misconfigurations, cut fiber, planned downtime, software bugs Occur anywhere with local or global impact: Single fiber cut can disconnect AS pairs One event can lead to complex protocol interactions Isolating failures is difficult End user symptoms often dynamic or intermittent WAN measurement research is ongoing (Rocketfuel, etc) Observations: Fault detection from multiple distributed vantage points In-network decision making necessary for timely responses November 8, 2018
4
Talk Overview Motivation A structured overlay approach
Mechanisms and policy Evaluation Some questions November 8, 2018
5
An Infrastructure Approach
Our goals Resilient overlay to route around failures Respond in milliseconds (not seconds) Our approach (data & control plane) Nodes are observation points (similar to Plato’s NEWS service) Nodes are also points of traffic redirection (forwarding path determination and data forwarding) No edge node involvement Fast response time, security focused on infrastructure Fully transparent, no application awareness necessary November 8, 2018
6
Why Structured Overlays
Resilient Overlay Networks (MIT) Fully connected mesh Each node has full knowledge of network Fast, independent calculation of routes Nodes can construct any path, maximum flexibility Cost of flexibility Protocol needs to choose the “right” route/nodes Per node O(n) state Monitors n - 1 paths O(n2) total path monitoring is expensive D Say MIT’s resilient overlay networks S November 8, 2018
7
The Big Picture Locate nearby overlay proxy
Internet Locate nearby overlay proxy Establish overlay path to destination host Overlay traffic routes traffic resiliently November 8, 2018
8
Structured Peer to Peer Overlay
Traffic Tunneling Legacy Node B Legacy Node A B P’(B) A, B are IP addresses register register Proxy Proxy put (hash(B), P’(B)) P’(B) get (hash(B)) put (hash(A), P’(A)) Structured Peer to Peer Overlay Not a unique engineering approach, similar approach at i3 Store mapping from end host IP to its proxy’s overlay ID Similar to approach in Internet Indirection Infrastructure (I3) November 8, 2018
9
Pros and Cons Leverage small neighbor sets
Less neighbor paths to monitor: O(n) O(log(n)) Reduction in probing bandwidth Faster fault detection Actively maintain static route redundancy Manageable for “small” # of paths Redirect traffic immediately when a failure is detected Eliminate on-the-fly calculation of new routes Restore redundancy in background after failure Fast fault detection + precomputed paths = more responsiveness Cons: overlay imposes routing stretch (mostly < 2) Mention the cost is increased IP hops / latency November 8, 2018
10
In-network Resiliency Details
Active periodic probes for fault-detection Exponentially weighted moving average link quality estimation Avoid route flapping due to short term loss artifacts Loss rate Ln = (1 - ) Ln-1 + p Simple approach taken, much ongoing research Smart fault-detection / propagation (Zhuang04) Intelligent and cooperative path selection (Seshardri04) Maintaining backup paths Create and store backup routes at node insertion Query neighbors after failures to restore redundancy Ask any neighbor at or above routing level of faulty node e.g. ABCD sees ABDE failed, can ask any AB?? node for info Simple policies to choose among redundant paths November 8, 2018
11
First Reachable Link Selection (FRLS)
Use link quality estimation to choose shortest “usable” path Use shortest path with minimal quality > T Correlated failures Reduce with intelligent topology construction Goal: leverage redundancy available I haven’t drawn all the possible paths in the example This is not perfect because of possible correlated failures, but can leverage existing work on failure independent overlay construction If there’s just 1 link, nobody can win November 8, 2018
12
Evaluation Metrics for evaluation Experimental platforms
How much routing resiliency can we exploit? How fast can we adapt to faults (responsiveness)? Experimental platforms Event-based simulations on transit stub topologies Data collected over multiple 5000-node topologies PlanetLab measurements Microbenchmarks on responsiveness November 8, 2018
13
Exploiting Route Redundancy (Sim)
Simulation constructs shortest paths to emulate IP routes Now break links, then look at reachability Simulation of Tapestry, 2 backup paths per routing entry Transit-stub topology shown, results from TIER and AS graphs similar November 8, 2018
14
Responsiveness to Faults (PlanetLab)
300 660 = 0.2 = 0.4 Enlarge fonts, Highlight 660 and 300 and lines 20 runs for each point Two reasonable values for filter constant Response time scales linearly to probe period November 8, 2018
15
Link Probing Bandwidth (Planetlab)
Bandwidth increases logarithmically with overlay size Medium sized routing overlays incur low probing bandwidth November 8, 2018
16
Conclusion Trading flexibility for scalability and responsiveness
Structured routing has low path maintenance costs Allows “caching” of backup paths for quick failover Can no longer construct arbitrary paths But simple policy exploits available redundancy well Fast enough for most interactive applications 300ms beacon period response time < 700ms ~300 nodes, b/w cost = 7KB/s November 8, 2018
17
Ongoing Questions Is this the right approach?
Is there a lower bound on desired responsiveness? Is this responsive enough for VoIP? If not, is multipath routing the solution? What about deployment issues? How does inter-domain deployment happen? A third-party approach? (Akamai for routing) November 8, 2018
18
Thanks to Dennis Geels / Sean Rhea for their work on BMark
Related Work Redirection overlays Detour (IEEE Micro 99) Resilient Overlay Networks (SOSP 01) Internet Indirection Infrastructure (SIGCOMM 02) Secure Overlay Services (SIGCOMM 02) Topology estimation techniques Adaptive probing (IPTPS 03) Internet tomography (IMC 03) Routing underlay (SIGCOMM 03) Many, many other structured peer-to-peer overlays Thanks to Dennis Geels / Sean Rhea for their work on BMark I3 is more general, targeting generic redirection We have a very simple link quality estimation mechanism We designed these mechanisms with these overlays in mind, applicable to nearly all of these in mind - All have ability to choose from multiple routes, all have log(n) neighbors November 8, 2018
19
Backup Slides November 8, 2018
20
Another Perspective on Reachability
Portion of all pair-wise paths where no failure-free paths remain A path exists, but neither IP nor FRLS can locate the path Portion of all paths where IP and FRLS both route successfully FRLS finds path, where short-term IP routing fails November 8, 2018
21
Constrained Multicast
Used only when all paths are below quality threshold Send duplicate messages on multiple paths Leverage route convergence Assign unique message IDs Mark duplicates Keep moving window of IDs Recognize and drop duplicates Limitations Assumes loss not from congestion Ideal for local area routing 2225 2299 2274 2286 2046 2281 2530 ? ? ? 1111 November 8, 2018
22
Latency Overhead of Misrouting
November 8, 2018
23
Bandwidth Cost of Constrained Multicast
November 8, 2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.