Protocol implementation Next-hop resolution Reliability and graceful restart.

Slides:



Advertisements
Similar presentations
Generalized Multiprotocol Label Switching: An Overview of Signaling Enhancements and Recovery Techniques IEEE Communications Magazine July 2001.
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
1 Course Number Presentation_ID © 2001, Cisco Systems, Inc. All rights reserved. Cisco 7500 High Availability.
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Courtesy: Nick McKeown, Stanford
MPLS additions to RSVP Tunnel identification Tunnel parameter negotiation Routing policy distribution Routing debugging information Scalability improvements.
Lecture 3 Responsivness vs. stability Brief refresh on router architectures Protocol implementation Quagga.
MPLS H/W update Brief description of the lab What it is? Why do we need it? Mechanisms and Protocols.
MPLS and Traffic Engineering
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
Transaction Processing IS698 Min Song. 2 What is a Transaction?  When an event in the real world changes the state of the enterprise, a transaction is.
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
Routing.
A General approach to MPLS Path Protection using Segments Ashish Gupta Ashish Gupta.
MPLS L3 and L2 VPNs Virtual Private Network –Connect sites of a customer over a public infrastructure Requires: –Isolation of traffic Terminology –PE,
Announcements List Lab is still under construction Next session we will have paper discussion, assign papers,
Chapter 27 Q and A Victor Norman IS333 Spring 2015.
ROUTING PROTOCOLS Rizwan Rehman. Static routing  each router manually configured with a list of destinations and the next hop to reach those destinations.
SMUCSE 8344 MPLS Virtual Private Networks (VPNs).
Fundamentals of Networking Discovery 2, Chapter 6 Routing.
VLAN Trunking Protocol (VTP) W.lilakiatsakun. VLAN Management Challenge (1) It is not difficult to add new VLAN for a small network.
Link State Routing Protocol W.lilakiatsakun. Introduction (1) Link-state routing protocols are also known as shortest path first protocols and built around.
Routing Concepts Warren Toomey GCIT. Introduction Switches need to know the link address and location of every station. Doesn't scale well, e.g. to several.
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
6: Routing Working at a Small to Medium Business.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
1 Fault Tolerance in the Nonstop Cyclone System By Scott Chan Robert Jardine Presented by Phuc Nguyen.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Lec4: TCP/IP, Network management model, Agent architectures
Overview of implementations openBGP (and openOSPF) –Active development Zebra –Commercialized Quagga –Active development XORP –Hot Gated –Dead/commercialized.
1 Internet Routing. 2 Terminology Forwarding –Refers to datagram transfer –Performed by host or router –Uses routing table Routing –Refers to propagation.
CCNA 3 Week 2 Link State Protocols OSPF. Copyright © 2005 University of Bolton Distance Vector vs Link State Distance Vector –Copies Routing Table to.
Protection and Restoration Definitions A major application for MPLS.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Lecture 2 Agenda –Finish with OSPF, Areas, DR/BDR –Convergence, Cost –Fast Convergence –Tools to troubleshoot –Tools to measure convergence –Intro to implementation:
TCOM 509 – Internet Protocols (TCP/IP) Lecture 06_a Routing Protocols: RIP, OSPF, BGP Instructor: Dr. Li-Chuan Chen Date: 10/06/2003 Based in part upon.
RSVP and implementation Details for the lab. RSVP messages PATH, RESV –To setup the LSP PATHtear, RESVtear –To tear down an LSP PATHerr, RESVerr –For.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.
6: Routing Working at a Small to Medium Business.
1 Version 3.1 Module 6 Routed & Routing Protocols.
1 7-Jan-16 S Ward Abingdon and Witney College Dynamic Routing CCNA Exploration Semester 2 Chapter 3.
Introducing a New Concept in Networking Fluid Networking S. Wood Nov Copyright 2006 Modern Systems Research.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
Refresh Interval Independent facility FRR draft-chandra-mpls-enhanced-frr-bypass-00 Chandra Ramachandran Yakov Rekhter.
A Snapshot on MPLS Reliability Features Ping Pan March, 2002.
Label Distribution Protocols LDP: hop-by-hop routing RSVP-TE: explicit routing CR-LDP: another explicit routing protocol, no longer under development.
1 Protection in SONET Path layer protection scheme: operate on individual connections Line layer protection scheme: operate on the entire set of connections.
3 rd December 0770 th IETF Meeting ospf-lite draft-thomas-hunter-reed-ospf-lite-00.txt Matthew Ramon Thomas
Single Area OSPF Module 2, Review How routing information is maintained Link-state routers apply the Dijkstra shortest path first algorithm against.
Distance Vector and Link State Routing Pragyaditya Das.
Analysis on Two Methods in Ingress Local Protection.
Advanced Computer Networks
Lec4: Introduction to Dynamic Routing Protocol
Introduction to Dynamic Routing Protocol
Multi Node Label Routing – A layer 2.5 routing protocol
Link State Routing protocol
Link-State Routing Protocols
Dynamic Routing Protocols part2
Introduction to Dynamic Routing Protocol
Routing.
Introduction to Dynamic Routing Protocol
Link-State Routing Protocols
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
Link-State Routing Protocols
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
COS 461: Computer Networks
More on RSVP implementation
Routing.
Presentation transcript:

Protocol implementation Next-hop resolution Reliability and graceful restart

What is a next-hop The destination of the packets I am sending –Not the same as the interface –An ethernet interface will have many nodes behind it –Directly connected next hop is 1 hop away E.g. RSVP sends a PATH message to the next downstream node –Next hop may be directly connected (strict ERO) –Or not (loose ERO) OSPF sends an LS update to the other end of a link or a neighbor on an eithernet –Always directly connected BGP has an iBGP-next hop for each of its paths –Not directly connected

Next-hop If the next hop is not directly connected the way to reach it depends on the IGP –May change when IGP routing changes –Will have to use a different interface to reach it –Need to keep track of these changes Next hop resolution

Periodic resolution –may take a bit more time But next-hops will not be too many Or will they? Tunnels, VLANs … –Quagga uses this approach Through the IPV4_LOOKUP_NEXTHOP command Registration/notification –RSVP would tell zebra which nexthops it is interested in –Zebra will notify RSVP when something changes in the IGP path to it Better scaling for RSVP Difficult to ensure good scaling inside zebra –Various protocols may register 1000s of next hops More complex code in zebra

Network Reliability Availability: How many nines? –99.999% is 5.26 min down time/year – % is 31.5 sec down time/year Telephone networks are between 5 and 6 nines –Internet will have to get there –Currently at 4 nines? (vendors claim 5) –Very important with the new types of traffic Voip, Ipvt What can go wrong (% of failures for US telephone network ca. 1992): –Hardware failures (19%) –Software failures (14%) –Human errors (49%) –Vandalism/Terrorism –Acts of nature (11%) –Overload (6% but had the largest impact on customers)

Hardware failures Link failures –Protocols can cope with that Re-route, may be slow More aggressive repair methods –we will see them later Router failures –Can not do much just add redundancy Power supplies, fans, disks, etc –Line-card failure is similar to a link failure –Control processor failure is more serious Always have two of them Primary and backup

Modern Router architectures Dual controllers –For running the control plane Multiple line-cards –Can operate without the controllers –Router can forward traffic even when the control plane crashes –Called non-stop forwarding or head-less operation

Software failures When primary fails start using backup –Switchover Must be as fast as possible –Things in the network change in the meanwhile –Need to minimize this window What happens with the control software –Need to keep primary and backup instance in sync –How tight is this synchronization?

Tight synchronization Both primary and backup are active, keep them in sync by: Send them both the same input (I.e. duplicate control packets) –Fastest possible switchover –Expensive, may need to duplicate packets –Does not work for TCP based protocols The primary keeps sending state updates to the backup –May need to send too many messages Being totally in-sync is not easy –Needs transactional communication

Loose synchronization Backup is idle –But we keep configuration up to date –Each configuration change on the primary is mirrored on the backup Backup instance is started when the primary fails –Switchover will take longer Much-much simpler –Configuration changes are much less Variation: –Keep only the RIB process in sync in both primary and backup

Non-stop forwarding Key concept –forwarding happens in the line cards –Even if control processor fails forwarding can continue –Non stop forwarding, head-less operation Old Common sense: when router s/w crashes do not use the router –But with head-less operation it is ok to continue using routers that their s/w crashed –Assuming their s/w will be operational again soon

Special Case Planned restart –For s/w upgrade These are a significant percentage of downtime –For refresh Memory is leaking but s/w still operational Restart to get a clean start I can use graceful restart

Graceful Restart Other routers in the network will keep using a neighbor router –Even if is looks like its control plane has failed –Assuming it will come back soon Needs coordination –The failed router needs to do some special processing when it comes back –It has to tell its neighbors first that it supports graceful restart Zero impact on the network –The failed router will have the chance to restart its s/w and come back –Nobody in the rest of the network will know that something happened

How does it work Used for all protocols by now –OSPF, BGP, RSVP-TE… The neighbor will discover that the router is dead or it has restarted –HELLO timeout, different information in the HELLOs etc… –But will ignore it for a certain time period If the failed router comes back within this period –It will re-sync its state (database exchange for OSPF, resend all the LSPs for RSVP, …) –And all is back to normal

Example RSVP Use HELLOs Special recovery label messages Restarting router needs to remember the labels it allocated before the crash –Where? Shared memory recover them from the forwarding plane –Why? Must use the same labels again Must make sure it does not use an allocated label for some other LSP

Example OSPF Trick is to re-establish the adjacencies after a failure Remember the set of neighbors –Shared memory or in the backup controller After restart do not originate any LSAs Just re-establish adjacencies and re-sync database

Graceful restart catches All routers in the network should implement this to work Mostly for planned restarts: –S/w upgrades –Refreshes (if a router runs low on memory) –But it is possible to use for crashes too! It can not work if something changes in the network while the restart is going on –There may be routing loops

Router self-monitoring Automatically restart failed or stuck processes A separate monitor process –Keeps an eye on other processes –If there is a failure the failed process is restarted Of course it may fail again –Heart-beats to determine liveness –Failure may not necessarily be a crash Could be a software bug that causes an infinite loop or very-very slow processing

Why is it important Remember the PoP structure –Need dual routers for reliability –If I had a single router that was extra-reliable I could save a lot of money

Issues Strict Isolation –VMs –Other methods Global resource coordination –For example memory