Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protection and Restoration Definitions A major application for MPLS.

Similar presentations


Presentation on theme: "Protection and Restoration Definitions A major application for MPLS."— Presentation transcript:

1 Protection and Restoration Definitions A major application for MPLS

2 The problem Network resources will fail –Nodes and links IGP will re-converge –But this may take some time 10s of seconds –Fast convergence has a price May make IGP more sensitive/unstable –I may have sensitive traffic that can not afford interruptions Voice, Consumer TV Do something for the time until IGP re- converges

3 Terminology Restoration –Bring traffic back to normal Backup –Alternative resources to be used when there is a failure Protection –Determine and allocate the backup resources before the failure –When there is a failure just activate them –Can be very fast Repair –Determine, allocate and activate the backup resources after the failure –Will be slower

4 Failure Modes Single vs. multiple link failures –If duration of link failure is short, can assume that there will be only a single link failure –Much harder to deal with multiple link failures Node vs. link failures –Can assume that links will fail more frequently than nodes –Node failures are harder to handle

5 Backup resources Can be multiple types –Links –Paths –Trees –Cycles –Whole topologies In order to avoid network overload after a failure need to have some extra capacity for backup resources Problem is how to engineer them so as not to make the network too expensive –Minimize the amount of backup capacity that is reserved

6 More jargon 1:1 –1 working, 1 backup –Wastes a lot of bandwidth for the backups 1:N –N working and 1 backup –Assume that only 1 working will fail –Then 1 backup is enough – save bandwidth Revertive: –when the failure is fixed, revert to the primary SRLG: Shared Risk Link Group –A set of network links that fails together –E.g fibers that are in the same conduit A bulldozer will cut all of them together

7 Other issues How to detect the failure fast –BFD is one general solution –There are medium specific solutions OAM for ATM Alarms for SONET Preferable if they exist –Protocol mechanisms (RSVP HELLOs, OSPF HELLOs, etc) How to activate the backup –I.e how to make traffic use an alternate path, or a tree

8 Backbone failure analysis Sprint backbone ca. March 2002 –Link in class website Monitor IS-IS traffic Data only for link failures, not node failures Failure Duration –50% failures last less than 1 min –40% failures last between 1 and 20 min Maintenance –50% of failures during maintenance windows Mean time between failure (MTBF) –Mean time between failures varies a lot across links “good” and “bad” links –3 bad links account for 25% of the failures

9 More analysis Unplanned failure breakdown –Shared link failures = 30% Router related = 16.5% Optical related = 11.5% –Individual link failures = 70% Node failures less common that single link failures About 16.5% of failures affect more than 1 link

10 Handling failures with IP Easy case –ECMP, no need to do anything extra during failure –But it may not repair all failures –Coverage: what percentage of the possible failures can be repaired In general activating backup resources is hard with IP –Packets will follow the IP route table/FIB –Forwarding is hop-by-hop –Even if I compute a backup link for a failure, I have no control what will happen after the next hop May have routing loops

11 IP protection Backup next-hop –Each node computes a backup nexthop for each destination so that I will not have routing loops –It may not have 100% coverage For more general solutions I need tunneling –Must force packets to reach their destination –Without crossing the failed resource Tunnel to the node after the failed link Tunnel to an intermediate node –IP tunneling is an expensive operation It is packet encapsulation

12 Not-Via addresses Consider router A, with interfaces A1, A2, A3 –A1 connects to interface B1 or router B, –A2 connects to interface C2 of router C –B1 has a second address B1-not-via-A –All routers compute paths to B1-not-via-A by removing router A from topology and running SPF –When router A fails, if C wants to reach B sends packets to address B1-not-via-A Encapsulates the packets 100% coverage Can handle node and link failures Still needs encapsulation

13 Multi-topology protection New approach Have multiple subsets of the topology –IGP protocols already support multi-topology routing –Switch to a different topology when there is a failure By modifying the header of the packet Or even using an MPLS label Allows for more flexible routing of traffic after a failure

14 Using MPLS MPLS can conveniently direct traffic where I want Ideal for setting up backup resources –Mostly backup paths Can be used to repair both IP and MPLS failures (I.e. LSP failure) LSP protection can be –Path –Local

15 Path protection For each LSP (primary) have a backup LSP –It is already established (with RSVP) but it is not carrying any traffic Primary and backup LSPs should be link and node disjoint When there a failure the source of the LSP will start sending traffic to the backup Source needs to be notified for the failure –May take some time for the repair of the traffic Can work in both 1:1 and 1:N modes

16 Local protection When a link or node fails the node upstream from the failure repairs the traffic –Traffic is put into a back LSP that does not go over the failed resource –Backup LSP merges with the primary LSP Repairing router does not send a PATHerr upstream –Instead notify upstream nodes that it is repairing the failure It is very fast Can work in 1:1 and 1:N modes Can be –Node Bypass a failed node –Link Bypass a failed link

17 Link local protection The node upstream of the failed link initiates the protection –Point of local repair (PLR) Backup LSP will merge back to the primary one –At the next-hop (Nhop) of the PLR Can work in 1:1 and 1:N modes –Usually a single backup LSP protects multiple primary LSPs –Else scalability is not good

18 Node local protection When a node fails, assume its links have failed too The node upstream of the failed node initiates the protection –Point of local repair (PLR) Backup LSP will merge back to the primary one –At the next-next-hop (NNHop) of the PLR What label does the NNHop use for the primary LSP? –Need RSVP’s help to find out Will need multiple backup LSPs for each node –At least one for each NNHop –Can optionally configure more

19 Label stacking Each time I send traffic into an LSP I push a label on the packets Packets in the primary LSP already have a label –I create a label stack –Top label is popped by the router just before the merge point A catch –At the merge point, packet arrives from an interface different than the expected one –Must have global (platform) label space

20 Need some RSVP support If the LSP is protected do not send a errors upstream/downstream when there is a failure –Instead notify upstream nodes that repair is in progress During failure the PATH,RESV for the primary LSP must continue –Send them through the backup LSP For node protection need to know the label the NNHop is using for the primary –Use the record label option for the LSP –All the labels used in all the hops are recorded in the RESV message

21 LSP protecting IP Can use the above techniques to also protect IP traffic If a link fails all the traffic that would go through the link is sent over the backup LSP Similar for node failures –But in this case, do I know the nnhop for IP? In general, If I have MPLS in my network all my traffic will be inside MPLS tunnels anyway

22 Observations If node degree is d and I have N nodes then –I need at least O(Nd) tunnels for link protection –And at least O(Nd^2) for node protection Of course I can not protect from failures of the ingress or egress node The assumption is that failures will be short lived –Traffic may be unbalanced during the failure –Links can get overloaded

23 The resource allocation problem How do I setup the backup tunnels so that –I do not overload any link after a failure –I minimize the amount of extra bandwidth that will need to be reserved for the backups It is a form of traffic engineering (TE) –We will see more on TE later on Has been studied a lot –In optical and telephone networks –And recently in MPLS type networks Solutions can be –On-line (as the requests arrive) –Off-line

24 Example Kodialam, Lakshman, 2001 –Local link and node protection –Assume I know the b/w demands of all LSPs –Assume that only one link or node can fail at a time Find a set of backup paths that minimizes the amount of bandwidth for both primary and backup LSPs –Backup LSPs can share bandwidth on some links What do I know about the links? –How much bandwidth is used by each LSP Complete but expensive to maintain –How much bandwidth is available Almost zero information –How much bandwidth is used by backup LSPs Little bit better than zero


Download ppt "Protection and Restoration Definitions A major application for MPLS."

Similar presentations


Ads by Google