Eusebi Calle, Jose L Marzo, Anna Urra. L. Fabrega International Conference on Communications (ICC 2004) Enhancing fault management performance of two-step QoS routing algorithms in GMPLS networks Eusebi Calle, Jose L Marzo, Anna Urra. L. Fabrega eusebi@eia.udg.es
Contents Background (Fault Management) Universitat de Girona Contents Background (Fault Management) The failure probability and impact Two-step and one-step routing methods Experimental results Summary and conclusions
1. Fault Management 1.1 MPLS/GMPLS fault management. Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.1 MPLS/GMPLS fault management. Working LSP Backup LSP PML Node PSL Node Protection Switch LSR (PSL) : switches protected traffic from the working path to the corresponding backup path. Protection Merge LSR (PML) : merges their traffic into a single outgoing LSP, or, if it is itself the destination, passes the traffic on to the higher layer protocols. 1 3 5 7 9 4 2 6 8 FIS : Fault Indication Signal
1. Fault Management 1.2 Classes of impairments Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.2 Classes of impairments IETF RFC3469 Path Failure (PF) ... Path Degraded (PD)... Link Failure (LF) is an indication from a lower layer that the link over which the path is carried has failed. If the lower layer supports detection and reporting of this fault, i.e. any fault that indicates link failure for example SONET Loss of Signal (LoS), this may be used by the MPLS recovery mechanism. Link Degraded (LD) ... SINGLE LINK FAILURES Working LSP Backup LSP 1 3 5 7 9 4 2 6 8
1. Fault Management 1.3 The M:N model Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.3 The M:N model M is the number of backup LSPs used to protect N working LSPs 1:1: 1 working LSP is protected/restored by one backup LSP. M:1: 1 working LSP is protected/restored by M backup LSPs. 1:N: 1 backup LSP is used to protect/restore N working LSPs (shared backups). M:N : N working LSPs are restored by M backup LSPs 1:0 : No protection (for instance, Best effort traffic) 1+1: Traffic is sent concurrently on both the working LSP and the backup LSP. 1:1 M:1 1:N M:N 0:1 1+1 Backup Paths Working Paths
1. Fault Management 1.4 a) Path provisioning classification Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.4 a) Path provisioning classification Path Provisioning Computed on demand Pre-computed Established on demand Pre-established Resource pre-allocated Resource allocated on demand 1.4 b) Resource allocation classification Resource allocation Dedicated (1:1 or 1+1) Shared (1:N, M:N) No resources (1:0)
1. Fault Management 1.5.a) Global Backup Path Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.5.a) Global Backup Path Global Backup Path 2 4 6 8 1 3 5 7 9 Ingress node PSL Egress Node PML Working Path Advantages Path Protection ( 1 PSL, 1 PML ) Disadvantages Slow Failure Recovery Time Packet Loss
1. Fault Management 1.5.b) Reverse Backup Path Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.5.b) Reverse Backup Path Global Backup Path 2 4 6 8 1 3 5 7 9 Ingress node Egress Node Reverse Backup Path Working Path Advantages Path Protection Low Packet Loss Disadvantages Slow Failure Recovery Time Packet reordering High Resource Consumption
1. Fault Management 1.5.c) Local Backup Path Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.5.c) Local Backup Path Local Backup Path 2 4 6 8 1 3 5 7 9 Ingress node Egress Node Working Path Advantages Fast Failure Recovery Time Low Packet Loss Disadvantages High Resource Consumption (Path Protection)
1. Fault Management 1.5.d) Segment Backup Path Contents 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.5.d) Segment Backup Path Egress Node Ingress node Working Path 1 3 5 7 9 4 2 6 8 Segment Backup Path Advantages Disadvantages
1. Fault Management 1.5.e) 1+1 Protection Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 1. Fault Management 1.5.e) 1+1 Protection Egress Node Ingress node Path 2 1 3 5 7 9 4 2 6 8 Path 1 Advantages Path Protection Very Low Packet Loss Disadvantages Fast Failure Recovery Time High Resource Consumption
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.1. Enhanced fault recovery methods for protected traffic services in GMPLS networks Drawbacks and lacks No protection considerations -> Secondary routing objective (No specific backup routing information) High complexity (in terms of computation time) High resource consumption (1+1) No traffic differentiation No physical network considerations (availability and reliability) Failure impact (fault recovery time, packet loss…) Objectives Protection as a main routing objective Low complexity Low resource consumption Traffic differentiation Failure Probabilities Reducing Failure Impact
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.3 Minimization of the failure recovery time (Failure Impact) Recovery phase Fault detection (TDET) Hold off time (THOF) Notification time (TNOT) New Backup creation (TBR + TBS) Backup Activation (TBA) Switchover (TSW) Complete recovery (TCR) Features Depends on the technology Depends on the lower layers Depends on the Failure Notification Distance and notification method Depends on the routing and signaling method applied Depends on the backup distance and signaling cross-connection process Depends on the node technology Depends on the backup distance Time Reduction Cannot be reduced (except in the case of monitoring techniques) Setup (0-50 ms) Minimizing the Failure Notification Distance and optimizing the process Pre-establishing the backup Minimizing the backup distance and optimizing the process Cannot be reduced Minimizing the backup distance IETF CCAMP Common Control and Measurement Plane Intenet Drafts Rabbat, Sharma...
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.4 Failure Probability Label Switch Path Failure Probability Geographical Conditions Network Components Initial Link Failure Probability Current Link Failure Probability LFP LFP1 LFP2 LFP3 LFPN Failure Probability Models: MIL-HDBK-217 Bellcore/Telcordia Statistical Failure Values MTTR MTBF FR Label Switch Path …. å = N 1 i LFP FP _ LSP
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.5 Residual Failure Probability Residual Label Switch Path Failure Probability Working path 1 3 7 4 2 6 5 RFP = (1+4)= 5 = LFP Working path 1 3 7 4 2 6 5 Local Backup RFP = 1 Working path 1 3 7 4 2 6 5 Local Backups RFP = 0 Segment Backup Global Backup 1+1 LFP = 1·10-4 LFP = 4·10-4
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.6 Case Study High Failure Probability Low Failure Probability Separated Links to be protected Working path 1 3 7 4 2 6 5 Local Global Backup Working path 1 3 7 4 2 6 5 Local Backup Local Backups Global Backup Segment Consecutive links to be protected
2. Reducing failure probability and impact Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 2. Reducing failure probability and impact 2.7 GMPLS Protection with traffic differentiation Protection assignment for class types based on the network failure probability and failure impact Protected Traffic services High-resilience requirement traffic services: Traffic that is very sensible to network faults (like EF diffserv traffic). Residual Failure probability and Failure Impact values should be set up at zero. 1+1 or local backup paths can be used in order to accomplish these values. Medium-resilience requirement traffic services: Traffic that is sensible to network faults (like AF1 or AF2 diffserv traffic). However, resource consumption should be taken into account to route the working and backup paths. Residual failure probabilities and failure impact values should be bounded in order to achieve the desirable QoS with appropriate resource consumption. Segment and global backups can be used to protect these services. Non-Protected Traffic services None-resilience requirement traffic services. No protection requirements are needed (BE traffic).
3. Two-step vs One-step routing algorithms Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 3. Two-step vs One-step routing algorithms 3.1 Two-step versus One-step routing Working path 1 3 7 4 2 5 One-step routing Backup Path 6 8 Trap Topologies (MHA + Global Protection) 1 3 7 4 2 6 8 5 Shortest Working Path Two-step routing Low Failure Probability High Failure 1 3 7 4 2 6 8 5 Working path Smart Protection Fast Recovery Time Low Packet Loss Low Resource Consumption Advantages Low Failure Probability Backup Path
Failure Probability Distribution Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 4. Experimental results 4.1 Failure Probability Analysis (*) Failure Probability evaluation. Traffic differentiation. Dynamic Traffic Failure Probability evaluation. Traffic differentiation. Incremental Traffic 1 2 3 4 5 6 Non Protected Traffic Protected Traffic No Traffic Differentiation Time LSP failure probability 10 - 4 Failure Probability Distribution Number of LSP Residual Failure Probability evaluation. Request Rejection Ratio 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 1 2 3 4 5 6 7 8 9 10 No traffic differentiation Protected Traffic Request Rejection Ratio Trial Number No protection Incremental / Dynamic exp. Traffic Differentiation Modified WSP
4. Experimental results 4.2 Residual Failure Probability Analysis Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 4. Experimental results 4.2 Residual Failure Probability Analysis Residual Failure Probability evaluation. Segment Backups and traffic differentiation. Residual Failure Probability evaluation. Local Backups and traffic differentiation. Time Time Local and segment protection Dynamic traffic, Traffic Differentiation, Modified WSP Similar RFP, Local (more resource requirements) but minor failure impact.
5. Summary and conclusions Contents 1. Fault Management 2. Reducing failure probability/ impact 3. Two-step versus One-step routing 4. Experimental results 5. Summary and Conclusions 5. Summary and conclusions 5.1 Summary and conclusions Failure Impact Minimum Failure Notification Minimum resource consumption (Segment + Probabilities) Minimum Residual Failure Probabilities Network Availability and Reliability Failure probability evaluation models Resource Consumption Protected-Traffic Services Enhanced routing algorithms Two-step routing methods Quality of protection degree
Eusebi Calle, Jose L Marzo, Anna Urra Thank you ! International Conference on Communications (ICC 2004) Enhancing fault management performance of two-step QoS routing algorithms in GMPLS networks Eusebi Calle, Jose L Marzo, Anna Urra eusebi@eia.udg.es