Lowest-Cost Routing EE 122: Intro to Communication Networks Fall 2010 (MW 4-5:30 in 101 Barker) Scott Shenker TAs: Sameer Agarwal, Sara Alspaugh, Igor Ganichev, Prayag Narula http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxson and other colleagues at Princeton and UC Berkeley
Announcements Revision to lecture schedule Advanced topics on routing Revision to homework schedule 3a: get midterm questions right 3b: new topics Changes to class structure 5 minute “technology break” Administrivia right after break Group problem solving (when possible) Sit next to smart people
Goals of Today’s Lecture Routing overview: Routing vs. forwarding Routing topics Link-state routing (Dijkstra’s algorithm) Distance-vector routing (Bellman-Ford)
Forwarding: “data plane” Forwarding vs. Routing Forwarding: “data plane” Directing a data packet to an outgoing link Individual router using a forwarding table Routing: “control plane” Computing paths the packets will follow Routers talking amongst themselves Jointly creating a forwarding table
Why Does Routing Matter? Routing provides connectivity! Without routing, the network doesn’t function Routing finds “good” paths Propagation delay, throughput, packet loss Routing allows network to tolerate failures Limits packet loss during disruptions Routing can also provide “Traffic Engineering” Balance traffic over the routers and links Avoid congestion by directing traffic to lightly-loaded links (Not covered today) Quality of the path affects user performance
Three Lectures on Routing Today: Lowest-cost routing Simple algorithms, basic issues Wednesday: Policy-based routing Interdomain routing Monday: Advanced topics Traffic engineering Improved resilience What future routing algorithms might look like….
Routing Requires Knowing Network Centralized global state Single entity knows the complete network structure Can calculate all routes centrally Problems with this approach? Distributed global state Every router knows the complete network structure Independently calculates routes Distributed global computation Every router knows only about its neighboring routers Participates in global joint calculation of routes Link State Routing E.g. Algorithm: Dijkstra E.g. Protocol: OSPF Distance Vector Routing E.g. Algorithm: Bellman-Ford E.g. Protocol: RIP Global centralized state - not scalable, not robust to failures of the central entity Distributed global state - how to ensure that each node has a consistent picture of the entire network Distributed no-global state - ensure consistent routes, looping.convergence time
Modeling a Network Modeled as a graph Goal of Routing Routers nodes Link edges Possible edge costs Hop Delay Congestion level …. Goal of Routing Determine “good” path from source to destination “Good” usually means the lowest “cost” path Where cost is usually hop-count or latency A E D C B F 2 1 3 5
From Model to Reality In reality, attach prefixes to nodes Calculate routing tables in terms of prefixes But ignore this for now….. Just calculate paths between routers
Why Isn’t All Routing Lowest-Cost? Lowest-cost routing assumes all nodes evaluate paths the same way i.e., use same “cost” metric Interdomain routing: Different domains care about different things Can exercise general “policy” goals Requires very different route computation Talk about on Wednesday….
Link State Routing Each router has complete network picture Topology Link costs How does each router get the global state? Each router reliably floods information about its neighbors to every other router (more later) Each router independently calculates the shortest path from itself to every other router Using, for example, Dijkstra’s Algorithm
Link State: Control Traffic Each node floods its local information Each node ends up knowing the entire network topology node Host A Host B Host E Host D Host C N1 N2 N3 N4 N5 N7 N6
Link State: Node State Host A Host B Host E Host D Host C N1 N2 N3 N4
Dijkstra’s Shortest Path Algorithm INPUT: Network topology (graph), with link costs OUTPUT: Least cost paths from one node to all other nodes Produces “tree” of routes (why?)
Notation c(i,j): link cost from node i to j; cost infinite if not direct neighbors; ≥ 0 D(v): current value of cost of path from source to destination v p(v): predecessor node along path from source to v, that is next to v S: set of nodes whose least cost path definitively known A E D C B F 2 1 3 5 Source
Dijsktra’s Algorithm c(i,j): link cost from node i to j D(v): current cost source v p(v): predecessor node along path from source to v, that is next to v S: set of nodes whose least cost path definitively known 1 Initialization: 2 S = {A}; 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v); 6 else D(v) = ; 7 8 Loop 9 find w not in S such that D(w) is a minimum; 10 add w to S; 11 update D(v) for all v adjacent to w and not in S: 12 if D(w) + c(w,v) < D(v) then // w gives us a shorter path to v than we’ve found so far 13 D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S;
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A D(B),p(B) 2,A D(C),p(C) 5,A D(D),p(D) 1,A D(E),p(E) D(F),p(F) 1 Initialization: 2 S = {A}; 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v); 6 else D(v) = ; … A E D C B F 2 1 3 5
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A D(B),p(B) 2,A D(C),p(C) 5,A D(D),p(D) 1,A D(E),p(E) D(F),p(F) … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; A E D C B F 2 1 3 5
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD D(B),p(B) 2,A D(C),p(C) 5,A D(D),p(D) 1,A D(E),p(E) D(F),p(F) … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; 5 3 B C 2 5 A 2 1 F 3 1 2 D E 1
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD D(B),p(B) 2,A D(C),p(C) 5,A 4,D D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; 5 3 B C 2 5 A 2 1 F 3 1 2 D E 1
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD ADE D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) 4,E … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; 5 3 B C 2 5 A 2 1 F 3 1 2 D E 1
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD ADE ADEB D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) 4,E … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; 5 3 B C 2 5 A 2 1 F 3 1 2 D E 1
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD ADE ADEB ADEBC D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) 4,E … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; 5 3 B C 2 5 A 2 1 F 3 1 2 D E 1
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD ADE ADEB ADEBC ADEBCF D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) 4,E … 8 Loop 9 find w not in S s.t. D(w) is a minimum; 10 add w to S; update D(v) for all v adjacent to w and not in S: If D(w) + c(w,v) < D(v) then D(v) = D(w) + c(w,v); p(v) = w; 14 until all nodes in S; A E D C B F 2 1 3 5
Example: Dijkstra’s Algorithm Step 1 2 3 4 5 start S A AD ADE ADEB ADEBC ADEBCF D(B),p(B) 2,A D(C),p(C) 5,A 4,D 3,E D(D),p(D) 1,A D(E),p(E) 2,D D(F),p(F) 4,E A E D C B F 2 1 3 5 To determine path A C (say), work backward from C via p(v)
The Forwarding Table Running Dijkstra at node A gives the shortest path from A to all destinations We then construct the forwarding table Destination Link B (A,B) C (A,D) D E F A E D C B F 2 1 3 5 Separate routing and and forwarding tables - not always the case. For efficiency, hardware implementation
Complexity How much processing does running the Dijkstra algorithm take? Assume a network consisting of N nodes Each iteration: check all nodes w not in S N(N+1)/2 comparisons: O(N2) More efficient implementations: O(N log(N))
Obtaining Global State Flooding Each router sends link-state information out its links The next node sends it out through all of its links except the one where the information arrived Note: need to remember previous msgs & suppress duplicates! X A C B D (a) X A C B D (b) X A C B D (c) X A C B D (d)
Flooding the Link State Reliable flooding Ensure all nodes receive link-state information Ensure all nodes use the latest version Challenges Packet loss Out-of-order arrival Solutions Acknowledgments and retransmissions Sequence numbers
When to Initiate Flooding Topology change Link or node failure Link or node recovery Configuration change Link cost change Potential problems with making cost dynamic! Periodically Refresh the link-state information Typically (say) 30 minutes Corrects for possible corruption of the data
Oscillating Load-Dependent Routing Assume link cost = amount of carried traffic All traffic sent to A A D C B 1 1+e e initially A D C B 2+e 1+e 1 … recompute routing A D C B 2+e 1+e 1 … recompute A D C B 2+e e 1+e 1 … recompute A D B Very Hard to Avoid Oscillations! C
Detecting Topology Changes Beaconing Periodic “hello” messages in both directions Detect a failure after a few missed “hellos” Performance trade-offs Detection speed Overhead on link bandwidth and CPU Likelihood of false detection “hello”
Convergence Getting consistent routing information to all nodes E.g., all nodes having the same link-state database Consistent forwarding after convergence All nodes have the same link-state database All nodes forward packets on shortest paths The next router on the path forwards to the next hop
Convergence Delay Time elapsed before every router has a consistent picture of the network Sources of convergence delay Detection latency Flooding of link-state information Recomputation of forwarding tables Performance during convergence period Lost packets due to blackholes and TTL expiry Looping packets consuming resources Out-of-order packets reaching the destination Very bad for VoIP, online gaming, and video
Reducing Convergence Delay Faster detection Smaller hello timers Link-layer technologies that can detect failures Faster flooding Flooding immediately Sending link-state packets with high-priority Faster computation Faster processors on the routers Incremental Dijkstra algorithm Faster forwarding-table update Data structures supporting incremental updates
Transient Disruptions Inconsistent link-state database Some routers know about failure before others The shortest paths are no longer consistent Can cause transient forwarding loops A E D C B F A E D C B F Loop! A and D think that this is the path to C E thinks that this is the path to C
Scaling Link-State Routing Overhead of link-state routing Flooding link-state packets throughout the network Running Dijkstra’s shortest-path algorithm Becomes unscalable when 100s of routers Introducing hierarchy through “areas” Area 2 Area 1 Area 0 area border router Area 3 Area 4
Link-State Routing Is Conceptually Simple Each router keeps track of its incident links Link cost, and whether the link is up or down Each router broadcasts the link state To give every router a complete view of the graph Each router runs Dijkstra’s algorithm Compute shortest paths, then construct forwarding table Example protocols Open Shortest Path First (OSPF) Intermediate System – Intermediate System (IS-IS) Challenges: scaling, transient disruptions Any ideas for improvement?
Question Why use different routing algorithms at L2 and L3? Is Link-State “plug-and-play”? Could we make it “plug-and-play?”
Questions Before We Proceed? 5 Minute Break Questions Before We Proceed?
Feedback on Course Course moving: Lectures should be: Too slowly: 13% Too quickly: 30% OK: 57% Lectures should be: Harder: 18% Easier: 26% Same: 56% Homework should be: Harder: 33% Easier: 13% Same: 54% Include History/Politics: Yes: 76% Include worked ex’s: Yes: 82% Project is: Too easy: 11% Too hard: 33% OK: 56% Support in Section Yes: 80% Don’t go: 12%
Selected Comments Use newsgroups Weekly homeworks Relate project to course Don't skip break Test too easy Bspace is evil “Cancel the project and final, buy us dinner” “Make lectures less boring”
More Scalable Routing Algorithms? Avoid need for global state consistency Just focus on computing routes Distribute the computation, not the state….
Distance Vector Routing Each router knows the links to its neighbors Does not flood this information to the whole network Each router has provisional “shortest path” E.g.: Router A: “I can get to router B with cost 11 via next hop router D” Routers exchange this information with their neighboring routers Again, no flooding the whole network Routers update their idea of the best path using info from neighbors This iterative process converges with set of shortest paths
Information Flow in Distance Vector Host A Host B Host E Host D Host C N1 N2 N3 N4 N5 N7 N6
Information Flow in Distance Vector Host A Host B Host E Host D Host C N1 N2 N3 N4 N5 N7 N6
Information Flow in Distance Vector Host A Host B Host E Host D Host C N1 N2 N3 N4 N5 N7 N6
Why Is This Different From Flooding?
Bellman-Ford Algorithm INPUT: Link costs to each neighbor Not full topology OUTPUT: Next hop to each destination and the corresponding cost Does not give the complete path to the destination
Bellman-Ford - Overview Each router maintains a table Row for each possible destination Column for each directly-attached neighbor to node Entry in row Y and column Z of node X best known distance from X to Y, via Z as next hop = DZ(X,Y) Each local iteration caused by: Local link cost change Message from neighbor Notify neighbors only if least cost path to any destination changes Neighbors then notify their neighbors if necessary wait for (change in local link cost or msg from neighbor) recompute distance table if least cost path to any dest has changed, notify neighbors Each node: Note: for simplicity in this lecture examples we show only the shortest distances to each destination
Bellman-Ford - Overview Each router maintains a table Row for each possible destination Column for each directly-attached neighbor to node Entry in row Y and column Z of node X best known distance from X to Y, via Z as next hop = DZ(X,Y) Neighbor (next-hop) Node A B C 2 8 3 7 D 4 Note: for simplicity in this lecture examples we show only the shortest distances to each destination 3 B D 2 1 1 A C 7 Destinations DC(A, D)
Bellman-Ford - Overview Each router maintains a table Row for each possible destination Column for each directly-attached neighbor to node Entry in row Y and column Z of node X best known distance from X to Y, via Z as next hop = DZ(X,Y) Node A B C 2 8 3 7 D 4 Note: for simplicity in this lecture examples we show only the shortest distances to each destination 3 B D 2 1 1 A C 7 Smallest distance in row Y = shortest Distance of A to Y, D(A, Y)
Distance Vector Algorithm (cont’d) 1 Initialization: 2 for all neighbors V do 3 if V adjacent to A 4 D(A, V) = c(A,V); else D(A, V) = ∞; send D(A, Y) to all neighbors loop: 8 wait (until A sees a link cost change to neighbor V /* case 1 */ 9 or until A receives update from neighbor V) /* case 2 */ 10 if (c(A,V) changes by ±d) /* case 1 */ 11 for all destinations Y that go through V do 12 DV(A,Y) = DV(A,Y) ± d 13 else if (update D(V, Y) received from V) /* case 2 */ /* shortest path from V to some Y has changed */ 14 DV(A,Y) = DV(A,V) + D(V, Y); /* may also change D(A,Y) */ 15 if (there is a new minimum for destination Y) 16 send D(A, Y) to all neighbors 17 forever c(i,j): link cost from node i to j DZ(A,V): cost from A to V via Z D(A,V): cost of A’s best path to V
Example:1st Iteration (C A) Node A Node B B C 2 8 ∞ 7 D A C D 2 ∞ 1 3 3 B D 2 1 1 A C 7 DC(A, B) = DC(A,C) + D(C, B) = 7 + 1 = 8 DC(A, D) = DC(A,C) + D(C, D) = 7 + 1 = 8 Node C Node D loop: … 13 else if (update D(A, Y) from C) 14 DC(A,Y) = DC(A,C) + D(C, Y); 15 if (new min. for destination Y) 16 send D(A, Y) to all neighbors 17 forever A B D 7 ∞ 1 B C A ∞ 3 1
Example: 1st Iteration (B A) Node A Node B B C 2 8 3 7 D 5 A C D 2 ∞ 1 3 3 B D 2 1 1 A C 7 DB(A, C) = DB(A,B) + D(B, C) = 2 + 1 = 3 DB(A, D) = DB(A,B) + D(B, D) = 2 + 3 = 5 Node C Node D loop: … 13 else if (update D(A, Y) from B) 14 DB(A,Y) = DB(A,B) + D(B, Y); 15 if (new min. for destination Y) 16 send D(A, Y) to all neighbors 17 forever A B D 7 ∞ 1 B C A ∞ 3 1
Example: End of 1st Iteration Node A Node B B C 2 8 3 7 D 5 A C D 2 3 ∞ 9 1 4 3 B D 2 1 1 A C 7 Node C Node D End of 1st Iteration All nodes knows the best two-hop paths A B D 7 3 ∞ 9 1 4 B C A 5 8 3 2 4 1
Example: 2nd Iteration (A B) Node A Node B B C 2 8 3 7 D 5 A C D 2 3 ∞ 5 1 4 7 3 B D 2 1 1 A C 7 DA(B, C) = DA(B,A) + D(A, C) = 2 + 3 = 5 DA(B, D) = DA(B,A) + D(A, D) = 2 + 5 = 7 Node C Node D loop: … 13 else if (update D(B, Y) from A) 14 DA(B,Y) = DA(B,A) + D(A, Y); 15 if (new min. for destination Y) 16 send D(B, Y) to all neighbors 17 forever A B D 7 3 ∞ 9 1 4 B C A 5 8 3 2 4 1
Example: End of 2nd Iteration Node A Node B B C 2 8 3 7 D 4 A C D 2 3 11 5 1 4 7 3 B D 2 1 1 A C 7 Node C Node D End of 2nd Iteration All nodes knows the best three-hop paths A B D 7 3 6 9 1 4 12 B C A 5 4 3 2 1
Example: End of 3rd Iteration Node A Node B B C 2 8 3 7 D 4 A C D 2 3 6 5 1 4 7 3 B D 2 1 1 A C 7 Node C Node D End of 2nd Iteration: Algorithm Converges! A B D 7 3 5 9 1 4 11 B C A 5 4 3 2 1
Distance Vector: Link Cost Changes loop: 8 wait (until A sees a link cost change to neighbor V 9 or until A receives update from neighbor V) / 10 if (c(A,V) changes by ±d) /* case 1 */ 11 for all destinations Y that go through V do 12 DV(A,Y) = DV(A,Y) ± d 13 else if (update D(V, Y) received from V) /* case 2 */ 14 DV(A,Y) = DV(A,V) + D(V, Y); 15 if (there is a new minimum for destination Y) 16 send D(A, Y) to all neighbors 17 forever 1 B 4 1 A C 50 Node B A C 4 6 9 1 A C 1 6 9 A C 1 6 9 A C 1 3 “good news travels fast” Node C A B 50 5 54 1 A B 50 5 54 1 A B 50 2 51 1 A B 50 2 51 1 Link cost changes here Algorithm terminates time
DV: Count to Infinity Problem loop: 8 wait (until A sees a link cost change to neighbor V 9 or until A receives update from neighbor V) / 10 if (c(A,V) changes by ±d) /* case 1 */ 11 for all destinations Y that go through V do 12 DV(A,Y) = DV(A,Y) ± d 13 else if (update D(V, Y) received from V) /* case 2 */ 14 DV(A,Y) = DV(A,V) + D(V, Y); 15 if (there is a new minimum for destination Y) 16 send D(A, Y) to all neighbors 17 forever 60 B 4 1 A C 50 A C 4 6 9 1 A C 60 6 9 1 A C 60 6 9 1 A C 60 8 9 1 Node B “bad news travels slowly” Node C A B 50 5 54 1 A B 50 5 54 1 A B 50 7 101 1 A B 50 7 101 1 … time Link cost changes here
Distance Vector: Poisoned Reverse If B routes through C to get to A: B tells C its (B’s) distance to A is infinite (so C won’t route to A via B) Will this completely solve count to infinity problem? A C 1 4 50 B 60 Node B A C 4 6 9 1 A C 60 6 9 1 A C 60 6 9 1 A C 60 51 9 1 A C 60 51 9 1 Node C A B 50 5 ∞ 1 A B 50 5 ∞ 1 A B 50 ∞ 1 A B 50 ∞ 1 A B 50 ∞ 1 Explain full table version of DV in section from the other set of slides. time Link cost changes here; C updates D(C, A) = 60 as B has advertised D(B, A) = ∞ Algorithm terminates
Routing Information Protocol (RIP) Simple distance-vector protocol Nodes send distance vectors every 30 seconds … or, when an update causes a change in routing Link costs in RIP All links have cost 1 Valid distances of 1 through 15 … with 16 representing infinity Small “infinity” smaller “counting to infinity” problem RIP is limited to fairly small networks E.g., campus
Question Can we solve the count-to-infinity problem? Do we need to solve the count-to-infinity problem?
Summary Routing is a distributed algorithm Different from forwarding React to changes in the topology Compute the shortest paths Two main shortest-path algorithms Dijkstra link-state routing (e.g., OSPF, IS-IS) Bellman-Ford distance-vector routing (e.g., RIP) Convergence process Changing from one topology to another Transient periods of inconsistency across routers Next time: BGP Reading: K&R 4.6.3