Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 IP: Internet Protocol Datagram format IPv4 addressing r 4.4 Routing: Concepts and Algorithms Introduction Math Detour – Graph Theory Routing via Broadcast – PI, PIF Connectivity Test – CT Distributed Routing Bellman – Ford – Distance Vector Link State Optimal Routing Network Model Math Detour – Convexity Flow Deviation Math Detour – Inequality Constraints Optimal Routing – Necessary and Sufficient Conditions r 4.5 Routing in the Internet Hierarchical Routing RIP OSPF BGP r 4.6 Broadcast and multicast routing Routing4-1
Routing Overview r Recap: Forwarding: data plane, router uses a forwarding table, “simple” Direct data packets from input interface to output interface Routing: control plane, router creates the forwarding table, “smart” Compute paths (and forwarding tables) by collaboration with other routers Name (who) Address (where) Route (how) r Concepts Optimization goal / path selection criteria Least cost / minimal hops / shortest delay / best network utilization Network topology: nodes, links, characteristics Topology learning, change discovery and update, failure detection and mitigation Routing2
Routing Overview (2) r Optimization Goals User performance (cost, delay, throughput, loss, jitter, hop count, …) Network performance (congestion, stability, overheads, load-balanced, …) Adaptability (mitigate failures, minimize loss, re-optimize) r Challenges Distributed (cooperation of many nodes, not only neighbors) Robustness (correct, reliable, stable, convergence) Performance (it is a core function of the network) Fairness E.g., maximal utilization ≠ fair Routing3
Routing Algorithm classification Global vs. Decentralized r Global: routers have all info on all links Complete topology Complete link cost, delay, etc. “Link state” algorithms r Decentralized: routers know links to neighbors Properties known only of physically- connected links Iterative process of computation, exchange of info with neighbors “Distance vector” algorithms Static vs. Dynamic r Static: Routes change slowly over time r Dynamic: Routes change more quickly Periodic update In response to link cost changes Routing4
Example: Routing & Network Performance r Example for throughput: all links have 10 units capacity r Low load: 5 units from 1 to 6 and from 2 to 6 via 3 and 5 respectively: good routing both via 4 : bad routing (more congestion – higher delay) r High load: 5 units from 1 to 6, 15 units from 2 to 6 if no flow splitting is allowed, ( at least ) 5 units are rejected if splitting is allowed, we can handle all requirements, e.g. by sending the traffic from 1 via 3, half of the traffic from 2 via 4 and the other half via 5 r Maximum Load: both flows via 4, max. flow = 10, bad choice no splitting, max. =20, better routing with splitting, max = 30 Routing5
u y x wv z Graph abstraction r Graph: G(N,E) N = set of nodes (routers) = { u, v, w, x, y, z } E = set of links & costs (u,w), cost(u,w) = 5, (u,v), cost(u,v) = 2, (u,x), cost(u,x) = 1, (v,w), cost(v,w) = 3, (v,x), cost(v,x) = 2, (w,y), cost(w,y) = 1, (w,z), cost(w,z) = 5, (w,x), cost(w,x) = 3, (x,y), cost(x,y) = 1, (y,z), cost(y,z) = 2 r Cost Could represent different things, e.g., always 1 (# of hops), proportional to delay, money, inversely related to bandwidth / to congestion Path cost defined as sum of costs on path links Possible routing goal – least cost path E.g., what is least cost path from u to z ? Remark: Graph abstraction is useful in other network contexts Example: P2P, where N is set of peers and E is set of TCP connections Routing6
Graphs – definitions & concepts r G(N,E) / G(V,E) / G(N,A) / G(N,L) N / V - collection of nodes or vertices E / A / L - collection of edges, arches, or links; a link is denoted by (v 1, v 2 ) or by e i r Routes Walk – a arbitrary collection of successive links Path - Loop-less walk ( no node appears more than once ) Cycle - walk that starts and ends at the same node r Connected graph There is a path between every pair of nodes; otherwise, a non-connected graph r Sub-graph Part of a given graph (some nodes, some links); G’(N’,E’), N’ N, E’ E r Tree Connected, loop-less graph Spanning Tree – a tree that is a sub-graph of G and contains all nodes of G Routing7
Shortest Path Routing r Input: graph, G(V,E), link costs, cost: E ℝ + Costs are static (do not change with load) Costs are positive Path cost is sum of link costs, cost(P) = e P cost(e) No loops (always add cost anyway) Other routing goals, such as bandwidth constraints, may be based on bottleneck (min) r Goal: least cost path to each destination Destination-based forwarding We focus on single destination from all sources Single source to all destinations is similar (how?) u w v x t y s z Routing8
Optimal sub-structure r If a shortest path from u to v P uv goes through node w, then the paths P uw and P wv are also shortest paths Proof – otherwise replace them with shortest paths to obtain better path from u to v. Key property for efficient SP algorithm (remark: for greedy algorithms) – combine shortest paths to create longer ones r Special case: if a shortest path from u to v P uv starts with the link (u,w), then P wv is also a shortest paths If we know all the shortest paths from u ’s neighbors {P wv } (u,w) E then cost(P uv ) = min (u,w) E [ cost(u,w) + cost(P wv ) ], and the shortest path goes through w which achieves the minimum Routing9
Bellman-Ford Algorithm r Define distance function to v for a source node u V D u (v) = cost of shortest path from u to v, i.e., cost(P uv ) r Initialization, for each source node u V D 0 u = { D 0 u (v) = 0 if v=u; D 0 u (v)=∞ if v≠u } r Update DV operation for node u V D i+1 u (v) = min (u,w) E [ cost(u,w) + D i w (v) ] and set min. w as next hop towards v Only if better (less) than D i u (v) r Stop if D i+1 (v) u = D i (v) u for all u V r Notes i is number of hops used dynamic program with entries per (i,u) Routing10 D u (z) = min{ cost(u,v) + D v (z), cost(u,w) + D w (z) } u w v x t y s z
Example of Bellman-Ford Routing11
Bellman-Ford (analysis) r Distance vector can never increase with iterations D i+1 u (v) ≤ D i u (v) r D i u (v) is the cost of the shortest path from u to v with at most i hops Can prove with induction At most |V|-1 iterations At most H iterations, where H is max hop count r Computing D i u (v) in each iteration O( number of outgoing links from u) O(|E| ), aggregate per iteration over all source nodes in V r Overall O(|V|·|E|), O(H·|E|) Routing12
Bellman-Ford Algorithm (distributed) r Each node u maintains distance vectors (DVs) its own distance vector D u = { D u (v) } v V (distances to all other nodes) distance vector D w for each neighbor w (from w to all other nodes) r Initialization: D u = { D u (u) = 0; D u (v)=∞ v≠u } Send D u to all neighbors (assume updates are reliable and in-order) r Update D w (v) ( v≠u ) or update cost(u,w) If [ cost(u,w) + D w (v) ] < D u (v) then update D u (v), set w as next hop Send D u (v) to all neighbors Otherwise, if current next hop is not w then do nothing (no improvement) Otherwise, ( D u (v) got worse….) If saved all D w (v) then can find new optimum, and update all Otherwise, update all and hope they send you something better r Periodically send D u to all neighbors Routing13
Bellman-Ford Algorithm (synchronized) r Distributed, but simulates centralized r Use some synchronization mechanism Maintain iteration # as and makes sure all nodes agree on it Iteration does not change until all nodes finish what they need to do Iteration does not change until updates reach all nodes r Still maintains distance vectors (DVs) and same initialization Send D u to all neighbors once per iteration r Update D w (v) Wait for updates from all neighbors (sent in previous iteration) Find best neighbor (same as centralized) Send D u to all neighbors r Update cost(u,w) Reset iterations and start over…. r Note: Bellman-Ford is correct without synchronization but can take time to converge Routing14
Example of update step Table at node J Routing15
stability r unbounded number of steps ∞ means “a bounded large number” r to overcome the problem : Run in parallel a protocol with weights=1 and stop when variables reach |N| Routing16
Loops r Problem: Detecting Situations when old information is irrelevant and should not be used S a b Routing17
Loops Routing S a b c 1 1
Poisoned reverse r If Z routes through Y to get to X then: Z sends to all neighbors, except Y, the true estimated distance to X Z sends to Y an infinite distance (so Y won’t route to X via Z) r Obviously, this solves only 2-node loops r Problem: entries for unreachable destinations Routing19
Distance Vector Protocol r Distributed Asynchronous Bellman-Ford is also called Distance Vector Protocol r The Distributed Asynchronous Bellman-Ford forms the basis for the Routing Information Protocol (RIP) used in Internet and other Networks Routing20
Dijkstra’s shortest path algorithm r Uses same update process as Bellman-Ford D i+1 u (v) = min (u,w) E [ cost(u,w) + D i w (v) ] r But, each iterates over nodes in specific order Maintain estimated D u (v) for all nodes Always selects node with minimal D u (v) Searches for paths with increasing cost (rather than hop count) r Each iteration finalizes D u (v) for one node (the selected min) That is, D u (v) will not change in later iterations Implies at most |V| iterations Routing21
Dijsktra’s Algorithm r Initialization: P = {v} D u (v) = cost(u,v) iff (u,v) E, otherwise ∞ r Loop 1. find w not in P with the smallest D w (v) 2. add w to P 3. update D u (v) = min [D u (v), cost(u,w) + D w (v)] for all u s.t. (u,w) E and u not in P 4. until all nodes in P 22
Dijsktra’s Algorithm -- analysis r Correctness D w (v) ≤ D u (v) if w P and u not in P Prove by induction D w (v) = cost of shortest path from w to u only through nodes in P Prove by induction, show that holds when node added to P r Complexity Key: efficiently find w not in P with the smallest D w (v) update is O(|V|) O(|V|) iterations 23
Example of Dijkstra Routing24
Link State Routing Protocol r Basis for the OSPF ( Open Shortest Path First) algorithm Each node periodically update all other nodes on network information Sequence number Identity Identity of neighbors Cost to each neighbor Time-to-live Each node maintains an updated picture of the network topology and costs Each node performs a Dijkstra algorithm to find forwarding table r Questions: How to update? – by flooding What if nodes don’t agree on information? Routing25
Comparison of LS and DV algorithms LS r O(nE) msgs to learn Each nodes knows all network Can be reduced to O(E) msgs r O(n log n + E) algorithm Simplified needs O(n 2 ) may have oscillations r Node malfunctions? Might advertise incorrect link cost Computes only its own table DV r O(nE) msgs to run Each nodes only talks to neighbors O(h degree ) msgs per node r O(h) iterations Might have routing loops Count-to-infinity problem r Node malfunctions? Might advertise incorrect path cost Node’s table used by others => error propagate through network Routing26
Spanning Trees r Tree Connected, loop-less graph r Spanning Tree a tree a sub-graph of G contains all nodes of G r Minimal sub-graph that contains all nodes Minimal in terms of links What about paths hop count? Minimal in terms of cost? Total cost? Path cost? Routing27
Spanning Tree Algorithm r Given a connected graph G(V,E) 1. Select an arbitrary node v, T( {v}, ) 2. If V T stop. T is the Spanning Tree. 3. Select a link (i,j), such that i T, j T; add (i,j) to T 4. Go to 2 r Proof: There always exists a link for selection in step 3. ( G is connected ) T is always a tree (connected, loopless) The final T is spanning tree r Properties for any connected graph G(V,E) G has a spanning tre |E| |V| -1; |E| = |V| -1 iff G is a tree Routing28
Routing in Networks of Bridged LAN’s r Network made of LAN segments and bridges LANs are the “nodes” Bridges are the “links” r The routing (forwarding by bridges) is done on a Spanning Tree, because: Need a connected graph Reach each LAN from each LAN No loops LAN segment is a broadcast medium – all traffic forwarded to entire segment – including any attached bridges Routing29
Minimum Spanning Tree (MST) r Many spanning trees r Goal: find a spanning tree with minimal overall cost Assume each link (u,v) has a known cost c uv Definition: “tree cost” = sum of tree link costs. Cost(T) = e T c e r Formal problem defintion: Given a graph G(V,E), with link costs {c ij }, select a tree T that Is a spanning tree of G cost(T) cost(T’) for any other spanning tree T’ of G r Definitions: segment = sub-graph of a MST outgoing link = link with exactly one end in the segment Routing30
Minimum Spanning Tree - continued Lemma: Given a segment F of an MST, let e=(i,j) be the outgoing link from F, with minimum cost. Then F {e} is also a segment of an MST Proof: r By definition F belongs to some MST M; if e M then done; otherwise r Consider M {e} This is connected but not a tree, so contains a loop The loop includes the link e – an outgoing link from F, so is through a node of e not in F Traversing the loop from that node in the other direction We must reach another outgoing link from F – denote that link by a r Consider M’ = M {e} - {a} M’ has no loops, so M’ is a spanning tree (just from counting links) F is a segment of M’, F {e} is a segment of M’ r Examine cost(M’) By definition cost(e) cost(a) thus, cost(M’) cost(M) r M’ is an MST Routing31 F M e a
MST algorithms r Prim-Dijkstra Select any node as a first segment. Keep enlarging the segment, by selecting the minimum outgoing edge. r Kruskal start with |V| segments, each composed of one node. Select a minimum cost link and combine two segments. Convenient for distributed computation. Routing32
Example MST algorithms Kruskal Topology Prim Dijkstra A B Routing33
Routing via Broadcast r Problem: Source node wants to transmit same information to all nodes in the network (broadcast) Best performance: on Spanning Tree, if one is available, if not, easiest to “flood” r Flooding: each node sends information to all its neighbors r Advantages: No need to know network current topology or costs Reliable and fast: Every connected node will get the information at the earliest possible time. r Assumptions and notations: every node knows its adjacent topology (neighbors) every node assigns local id’s to its adjacent links MSG(info) - message carrying the information info Routing34
Protocol PI ( Propagation of Information ) r Every node i performs the following (source receives START): initialization: m 0 upon START or receipt of MSG(info) if m=0 then m 1 Accept info Send MSG(info) to all neighbors r Notes: Completely distributed; every node works independently with own schedule The variable m marks that message was accepted Makes sure that nodes will transfer and accept information no more than once. Last line can be changed to: send MSG(info) to all neighbors except the one MSG was received from Routing35
Properties of PI r During the operation of the protocol, exactly one message travels on each link in each direction r All nodes connected to the source will accept the info, exactly once r For each node i, let p i be the neighbor from which node i receives the first MSG: The collection {(i,p i )} forms a Spanning Tree It is the shortest (delay) path tree from source to all node r If several nodes have the same info and start PI asynchronously, their PI’s converge and all properties above hold, Except that the collection {(i,pi)} will form a Spanning Forest (collection of non-overlapping trees whose union spans all nodes) r The source does not know the termination time, i.e. some time when it can be sure that all nodes have received the information. It knows the information will be received and accepted by all nodes, but it doesn’t know when. Routing36
Broadcast with termination feedback (PIF) r Idea: Use PI-type to broadcast information and to form Spanning Tree Use Spanning Tree to collect termination information r Method: Node that accepts info, forwards it to all neighbors, except p i Upon receipt of MSG from all neighbors, send to p i and node is done Info backwards on the Spanning Tree ( towards source ). Termination indication = source is done r Notations: m i = flag that indicates participation in protocol N i (j) = flag that indicates that the node has received MSG from neighbor j p i = neighbor from which MSG was received first receive MSG from nil = START Routing37
PIF algorithm r Every node i performs the following ( source receives START): initialization: m i 0, p i nil, N i (j) 0 upon receipt of MSG from j N i (j) 1 If m i =0 then(propagation) –m i 1; p i j –Accept info and send MSG(info) to all neighbors except j If N i (k) for all neighbors k of i then (feedback) –Send MSG(info) back to p i –m i 0; N i (k) 0 for all neighbors k of i(reset) r Properties: All connected nodes will accept message exactly once One MSG on each link in each direction Collection {(i, p i )} is a spanning tree all nodes All nodes complete protocol and reset (m i 0) i completes protocol before p i, source completes last (termination indication) Routing38
source PIF Example Routing39
Example source Routing40
Connectivity Test r Goal: learn network nodes r Protocol CT1: every node sends its id in PI nodes wake up when they get the first message, at which time they start their own PI r Properties: can be started asynchronously at several nodes every node will receive the identity of every other connected node disconnected nodes will not know of each other a node cannot determine a termination time, i.e. a time when it knows for sure the identities of all connected nodes Routing41
CT2 r Idea: use PIF’s instead of PI’s r Protocol CT2: A node starts its own PIF when: gets a START upon receiving the first message ( of some PIF started by another node ) Note: PIF per node must track m, p, N per source PIF r Properties: can be started asynchronously at several nodes every node will receive the identity of every other connected node disconnected nodes will not know of each other a node i can determine a termination time, i.e. a time when it knows for sure the identities of all connected nodes. When i completes its own PIF Follows from the fact that the propagation phase of PIF j starts before j enters feedback for PIF i. Routing42
CT3 r Idea: same as CT1, except that nodes broadcast their neighbors’ identities along with their own identity Assume node knows identities of neighbors r Properties: all “good” properties of CT1 Termination signal: When a PI j from j arrives, i now knows all j’s neighbors i now expects PI from each of these neighbors When all expected identities have arrived can terminate r Additional variations on CT1, CT2, CT3: encoded information to reduce MSG contents reduction of number of messages pruning of MSG contents Routing43