1 Tutorial on Network Layers 2 and 3 Radia Perlman Intel Labs

Slides:



Advertisements
Similar presentations
Interconnection: Switching and Bridging CS 4251: Computer Networking II Nick Feamster Fall 2008.
Advertisements

Communication Networks Recitation 3 Bridges & Spanning trees.
Overview of TRILL Active-Active Goals, Challenges, and Proposed Solutions Radia Perlman 1November 2013.
CPSC Network Layer4-1 IP addresses: how to get one? Q: How does a host get IP address? r hard-coded by system admin in a file m Windows: control-panel->network->configuration-
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
TRILL issue: Using Pseudonode Nicknames for Ingress RBridge Radia Perlman Hongjun Zhai Fangwei Hu 1November 2011.
Courtesy: Nick McKeown, Stanford
5/31/05CS118/Spring051 twisted pair hub 10BaseT, 100BaseT, hub r T= Twisted pair (copper wire) r Nodes connected to a hub, 100m max distance r Hub: physical.
Dec 6, 2007CS573: Network Protocols and Standards1 Transparent Bridging Network Protocols and Standards Winter
CSE331: Introduction to Networks and Security Lecture 7 Fall 2002.
CSEE W4140 Networking Laboratory Lecture 8: LAN Switching Jong Yul Kim
1 Interconnecting LAN segments Repeaters Hubs Bridges Switches.
IP Address 0 network host 10 network host 110 networkhost 1110 multicast address A B C D class to to
1 LAN switching and Bridges Relates to Lab 6. Covers interconnection devices (at different layers) and the difference between LAN switching (bridging)
CSE390 Advanced Computer Networks Lecture 7: Bridging (From Hub to Switch by Way of Tree) Based on slides from D. Choffnes Northeastern U. Revised Fall.
Rbridges: Transparent Routing Radia Perlman
LAN switching and Bridges
1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.
1 LAN switching and Bridges Relates to Lab 6. Covers interconnection devices (at different layers) and the difference between LAN switching (bridging)
1 Computer Networks LAN Bridges and Switches. 2 Where are we?
Network Layer4-1 NAT: Network Address Translation local network (e.g., home network) /24 rest of.
1 Token Passing: IEEE802.5 standard  4 Mbps  maximum token holding time: 10 ms, limiting packet length  packet (token, data) format:  SD, ED mark start,
Chapter 4: Managing LAN Traffic
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Multicast routing.
TRansparent Interconnection of Lots of Links (TRILL) March 11 th 2010 David Bond University of New Hampshire: InterOperability.
1 Routing Protocols. 2 Distributed Routing Protocols Rtrs exchange control info Use it to calculate forwarding table Two basic types –distance vector.
Unicast Routing Protocols  A routing protocol is a combination of rules and procedures that lets routers in the internet inform each other of changes.
1 CS 4396 Computer Networks Lab LAN Switching and Bridges.
1 Internet Protocol. 2 Connectionless Network Layers Destination, source, hop count Maybe other stuff –fragmentation –options (e.g., source routing) –error.
1 Transparent Bridging Advanced Computer Networks.
Token Passing: IEEE802.5 standard  4 Mbps  maximum token holding time: 10 ms, limiting packet length  packet (token, data) format:  SD, ED mark start,
Base Protocol Spec Radia Perlman
Review the key networking concepts –TCP/IP reference model –Ethernet –Switched Ethernet –IP, ARP –TCP –DNS.
1 Multilevel TRILL draft-perlman-trill-rbridge-multilevel-00.txt Radia Perlman Intel Labs March 2011.
Review: –Ethernet What is the MAC protocol in Ethernet? –CSMA/CD –Binary exponential backoff Is there any relationship between the minimum frame size and.
1 Myths, Missteps, and Folklore in Network Protocols Radia Perlman Presentation at George Mason University Communications and Networking.
Internetworking Internet: A network among networks, or a network of networks Allows accommodation of multiple network technologies Universal Service Routers.
15.1 Chapter 15 Connecting LANs, Backbone Networks, and Virtual LANs Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Scaling Broadcast Ethernet Some slides used with.
Rbridges: Transparent Routing Radia Perlman
Transparent Interconnection of Lots of Links(TRILL) Speaker: Hui-Hsiung Chung Date:2011/12/28 1.
TRILL remaining issues Radia Perlman
CSE 461 University of Washington1 Topic How do we connect nodes with a switch instead of multiple access – Uses multiple links/wires – Basis of modern.
Lecture Topics: 11/27 Networks Layered Model Ethernet IP.
1 Data Link Layer Lecture 23 Imran Ahmed University of Management & Technology.
Internet Protocols. ICMP ICMP – Internet Control Message Protocol Each ICMP message is encapsulated in an IP packet – Treated like any other datagram,
1 Version 3.1 Module 6 Routed & Routing Protocols.
1 12-Jan-16 OSI network layer CCNA Exploration Semester 1 Chapter 5.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 Module 10 Routing Fundamentals and Subnets.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
March th IETF - Prague1 TRILL Working Group Changes from draft-trill-rbridge-protocol-02.txt to draft-trill-rbridge-protocol-03.txt Dinesh Dutt,
1 LAN switching and Bridges Relates to Lab Outline Interconnection devices Bridges/LAN switches vs. Routers Bridges Learning Bridges Transparent.
Computer Networks 0110-IP Gergely Windisch
CSE 421 Computer Networks. Network Layer 4-2 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside.
Chapter 4 Network Layer Computer Networking: A Top Down Approach 6th edition Jim Kurose, Keith Ross Addison-Wesley March 2012 CPSC 335 Data Communication.
CS 3700 Networks and Distributed Systems
ICMP ICMP – Internet Control Message Protocol
Chapter 4 Data Link Layer Switching
Month 2002 doc.: IEEE /xxxr0 November 2004 Routing and Rbridges
CS 3700 Networks and Distributed Systems
Intra-Domain Routing Jacob Strauss September 14, 2006.
LAN switching and Bridges
CS 457 – Lecture 10 Internetworking and IP
CS 4700 / CS 5700 Network Fundamentals
Dynamic Routing and OSPF
LAN switching and Bridges
CS 4700 / CS 5700 Network Fundamentals
LAN switching and Bridges
Connectors, Repeaters, Hubs, Bridges, Switches, Routers, NIC’s
Presentation transcript:

1 Tutorial on Network Layers 2 and 3 Radia Perlman Intel Labs

2 Why? Demystify this portion of networking, so people don’t drown in the alphabet soup Think about these things critically N-party protocols are “the most interesting” Lots of issues are common to other layers You can’t design layer n without understanding layers n-1 and n+1

3 What can we do in 1 ½ hours? Understand the concepts Understand various approaches, and tradeoffs, and where to go to learn more A little of the history: without this, it’s hard to really “grok” why things are the way they are

4 Outline layer 2 issues: addresses, multiplexing, bridges, spanning tree algorithm layer 3: addresses, neighbor discovery, connectionless vs connection-oriented –Routing protocols Distance vector Link state Path vector Layer 2 ½... as if 2 vs 3 weren’t confusing enough

5 Why this whole layer 2/3 thing? Myth: bridges/switches simpler devices, designed before routers OSI Layers –1: physical

6 Why this whole layer 2/3 thing? Myth: bridges/switches simpler devices, designed before routers OSI Layers –1: physical –2: data link (nbr-nbr, e.g., Ethernet)

7 Why this whole layer 2/3 thing? Myth: bridges/switches simpler devices, designed before routers OSI Layers –1: physical –2: data link (nbr-nbr, e.g., Ethernet) –3: network (create entire path, e.g., IP)

8 Why this whole layer 2/3 thing? Myth: bridges/switches simpler devices, designed before routers OSI Layers –1: physical –2: data link (nbr-nbr, e.g., Ethernet) –3: network (create entire path, e.g., IP) –4 end-to-end (e.g., TCP, UDP)

9 Why this whole layer 2/3 thing? Myth: bridges/switches simpler devices, designed before routers OSI Layers –1: physical –2: data link (nbr-nbr, e.g., Ethernet) –3: network (create entire path, e.g., IP) –4 end-to-end (e.g., TCP, UDP) –5 and above: boring

10 Definitions Repeater: layer 1 relay

11 Definitions Repeater: layer 1 relay Bridge: layer 2 relay

12 Definitions Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay

13 Definitions Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3?

14 Definitions Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3? –The “right” definition: layer 2 is neighbor- neighbor. “Relays” should only be in layer 3!

15 Definitions Repeater: layer 1 relay Bridge: layer 2 relay Router: layer 3 relay OK: What is layer 2 vs layer 3? True definition of a layer n protocol: Anything designed by a committee whose charter is to design a layer n protocol

16 Layer 3 (e.g., IPv4, IPv6, DECnet, Appletalk, IPX, etc.) Put source, destination, hop count on packet Addresses are assigned so that a bunch of addresses can be summarized with a prefix Just like postal addresses: –Country –State –City

17 Layer 3 packet data Layer 3 header sourcedesthops

18 Ethernet packet data Ethernet header sourcedest

19 Ethernet (802) addresses Assigned in blocks of 2 24 Given 23-bit constant (OUI) plus g/i bit all 1’s intended to mean “broadcast” OUI global/local admin group/individual

Ethernet addresses are “flat” Which means that Ethernet addresses have nothing to do with where a device is It looks like there is structure there, but the whole point of the OUI is to assign a fixed address at time of manufacture 20

Ethernet “religion” was autoconfiguration Assigning addresses are the manufacturer’s problem Then the customer just plugs things together and they work 21

22 It’s easy to confuse “Ethernet” with “network” Both are multiaccess clouds But Ethernet does not scale. It can’t replace IP as the Internet Protocol –Flat addresses –No hop count –Missing additional protocols (such as neighbor discovery) –Perhaps missing features (such as fragmentation, error messages, congestion feedback)

Original Ethernet Design CSMA/CD –CS: “carrier sense” listen before talking so you don’t interrupt –MA: “multiple access” shared medium –CD: “collision detect” listen even while talking in case someone else started talking at “the same time” –Do exponential random backoff if collision

CSMA/CD pretty much dead So what is Ethernet today? 24

25 So where did bridges come from?

26 So where did bridges come from? Early 1980’s…Ethernet new and highly hyped People thought it was “the new way to do networking” People built applications directly on Ethernet (leaving out layer 3)

27 Problem Statement Need something that will sit between two Ethernets, and let a station on one Ethernet talk to another AC

28 Basic idea Listen promiscuously Learn location of source address based on source address in packet and port from which packet received Forward based on learned location of destination

29 What’s different between this and a repeater? no collisions with learning, can use more aggregate bandwidth than on any one link no artifacts of LAN technology (# of stations in ring, distance of CSMA/CD)

30 But loops are a disaster No hop count Exponential proliferation B1 B2 B3 S

31 But loops are a disaster No hop count Exponential proliferation B1 B2 B3 S

32 But loops are a disaster No hop count Exponential proliferation B1 B2 B3 S

33 But loops are a disaster No hop count Exponential proliferation B1 B2 B3 S

34 But loops are a disaster No hop count Exponential proliferation B1 B2 B3 S

35 What to do about loops? Just say “don’t do that” Or, spanning tree algorithm –Bridges gossip amongst themselves –Compute loop-free subset –Forward data on the spanning tree –Other links are backups

A X

A X

38 Algorhyme I think that I shall never see A graph more lovely than a tree. A tree whose crucial property Is loop-free connectivity. A tree which must be sure to span So packets can reach every LAN. First the Root must be selected By ID it is elected. Least cost paths from Root are traced In the tree these paths are placed. A mesh is made by folks like me. Then bridges find a spanning tree. Radia Perlman

39 Bother with spanning tree? Maybe just tell customers “don’t do loops” First bridge sold...

40 First Bridge Sold AC

A X Suboptimal routes

42 So Bridges were a kludge, digging out of a bad decision Why are they so popular? –plug and play –simplicity –high performance Will they go away? –because of idiosyncracy of IP, need it for lower layer.

43 Note some things about bridges Certainly don’t get optimal source/destination paths Temporary loops are a disaster –No hop count –Exponential proliferation But they are wonderfully plug-and-play

44 Switches Ethernet used to be bus Easier to wire, more robust if star (one huge multiport repeater with pt-to-pt links If store and forward rather than repeater, and with learning, more aggregate bandwidth Can cascade devices…do spanning tree We’re reinvented the bridge!

45 Basic idea of a packet Destination address Source address data

46 Hdrs inside hdrs R1 R2R3   S D As transmitted by S? (L2 hdr, L3 hdr) As transmitted by R1? As received by D?

47 Hdrs inside hdrs R1 R2R3   S D S: Layer 2 hdrLayer 3 hdr Dest=  Source=  Dest=D Source=S

48 Hdrs inside hdrs R1 R2R3   S D R1: Layer 2 hdrLayer 3 hdr Dest=  Source=  Dest=D Source=S

49 Hdrs inside hdrs R1 R2R3   S D R2: Layer 2 hdrLayer 3 hdr Dest=D Source=S

50 Hdrs inside hdrs R1 R2R3   S D R3: Layer 2 hdrLayer 3 hdr Dest=  Source=  Dest=D Source=S

51 What designing “layer 3” meant Layer 3 addresses Layer 3 packet format (IP, DECnet) –Source, destination, hop count, … A routing algorithm –Exchange information with your neighbors –Collectively compute routes with all rtrs –Compute a forwarding table

52 Network Layer connectionless fans designed IPv4, IPv6, CLNP, IPX, AppleTalk, DECnet Connection-oriented reliable fans designed X.25 Connection-oriented datagram fans designed ATM, MPLS

53 Pieces of network layer interface to network: addressing, packet formats, fragmentation and reassembly, error reports routing protocols autoconfiguring addresses/nbr discovery/finding routers

54 Connection-oriented Nets S A R1 R2 R3 R4 R5 D (3,51)=(7,21) (4,8)=(7,92) (4,17)=(7,12) (2,12)=(3,15) (2,92)=(4,8) (1,8)=(3,6) (2,15)=(1,7) VC=8, 92, 8,

55 Connection-oriented networks X.25: also have sequence number and ack number in packets (like TCP), and layer 3 guarantees delivery ATM: datagram, but fixed size packets (48 bytes data, 5 bytes header)

56 MPLS (multiprotocol label switching) Connectionless, like MPLS, but arbitrary sized packets Add 32-bit hdr on top of IP pkt –20 bit “label” –Hop count (hooray!)

57 Hierarchical connections (stacks of MPLS labels) R1 R2 S1 S8 S6 S9 S5 S2 S4 S3 D2 D1 D8 D2 D9 D3 D5 D4 Routers in backbone only need to know about one flow: R1-R2

58 MPLS Originally for faster forwarding than parsing IP header later “traffic engineering” classify pkts based on more than destination address

59 Connectionless Network Layers Destination, source, hop count Maybe other stuff –fragmentation –options (e.g., source routing) –error reports –special service requests (priority, custom routes) –congestion indication Real diff: size of addresses

60 Addresses 802 address “flat”, though assigned with OUI/rest. No topological significance layer 3 addresses: locator/node : topologically hierarchical address interesting difference: –IPv4, IPv6, IPX, AppleTalk: locator specific to a link –CLNP, DECnet: locator “area”, whole campus

61 Hierarchy One prefix per linkOne prefix per campus 2* 25* 28* 292* 22* 293* 2*

62 Hierarchy within Locator Assume addresses assigned so that within a circle everything shares a prefix Can summarize lots of circles with a shorter prefix 27* 23* 2428* 2* 279* 272*

63 New topic: Routing Algorithms

64 Distributed Routing Protocols Rtrs exchange control info Use it to calculate forwarding table Two basic types –distance vector –link state

65 Distance Vector Know –your own ID –how many cables hanging off your box –cost, for each cable, of getting to nbr j k m n cost 3 cost 2 cost 7 I am “4”

66 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

67 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

68 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

69 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

70 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

71 j k m n cost 3 cost 2 cost 7 I am “4” distance vector rcv’d from cable j distance vector rcv’d from cable k distance vector rcv’d from cable m distance vector rcv’d from cable n your own calculated distance vector your own calculated forwarding table m 6 j 5 m 0 0 k 8 j 6 k/j cost 3 cost 2 cost 7 19 n 3? j? ? ?

72 Looping Problem ABC

73 Looping Problem ABC 0 12 Cost to C

74 Looping Problem ABC 0 12 Cost to C direction towards C direction towards C

75 Looping Problem ABC 0 12 Cost to C What is B’s cost to C now?

76 Looping Problem ABC 0 12 Cost to C 3

77 Looping Problem ABC 0 12 Cost to C 3 direction towards C direction towards C

78 Looping Problem ABC 0 12 Cost to C 3 4 direction towards C direction towards C

79 Looping Problem ABC 0 12 Cost to C direction towards C direction towards C

80 Looping Problem worse with high connectivity QZBACNMV H

81 Split Horizon: one of several optimizations Don’t tell neighbor N you can reach D if you’d forward to D through N ABC

82 Split Horizon: …but it won’t work with loops of more than 2 nodes AB C D

83 Link State Routing meet nbrs Construct Link State Packet (LSP) –who you are –list of (nbr, cost) pairs Broadcast LSPs to all rtrs (“a miracle occurs”) Store latest LSP from each rtr Compute Routes (breadth first, i.e., “shortest path” first—well known and efficient algorithm)

84 ABC DEF G A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1

85 Computing Routes Edsgar Dijkstra’s algorithm: –calculate tree of shortest paths from self to each –also calculate cost from self to each –Algorithm: step 0: put (SELF, 0) on tree step 1: look at LSP of node (N,c) just put on tree. If for any nbr K, this is best path so far to K, put (K, c+dist(N,K)) on tree, child of N, with dotted line step 2: make dotted line with smallest cost solid, go to step 1

86 Look at LSP of new tree node A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) G(5)

87 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) G(5)

88 Look at LSP of newest tree node A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) G(5) E(4)G(3)

89 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(4)G(3)

90 Look at LSP of newest tree node A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8)

91 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8)

92 Look at LSP of newest tree node A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8) D(5)

93 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8) D(5)

94 Look at newest tree node’s LSP A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8) D(5)

95 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8) D(5)

96 Look at newest node’s LSP A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) A(8) D(5) A(7)

97 Make shortest TENT solid A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) D(5) A(7)

98 We’re done! A B/6 D/2 B A/6 C/2 E/1 C B/2 F/2 G/5 D A/2 E/2 E B/1 D/2 F/4 F C/2 E/4 G/1 G C/5 F/1 C(0) B(2) F(2) E(3)G(3) D(5) A(7)

Another interesting detail of link state Pseudonodes Since routing algorithm is proportional to the number of links If an Ethernet with 100s of nodes were considered fully connected, the link state database would be too large 99

100 Instead of: Use pseudonode Pseudonodes

Designated Routers Elect a router to be the master of the link It names the pseudonode –In IS-IS, a node’s ID is 7 bytes: 6 bytes of system ID (usually the MAC address of one of its ports), plus an extra byte. E.G., R1 is DR, names link R1.25 All routers (including R1) claim a link to R1.25 R1 (pretending to be the pseudonode), claims connectivity to each of the routers on the link 101

102 Distance vector vs link state Memory: distance vector wins (but memory is cheap) Computation: debatable Simplicity of coding: simple distance vector wins. Complex new-fangled distance vector, no Convergence speed: link state Functionality: link state; custom routes, mapping the net, troubleshooting, sabotage-proof routing

103 Specific Routing Protocols Interdomain vs Intradomain Intradomain: –link state (OSPF, IS-IS) –distance vector (RIP) Interdomain –BGP

104 BGP (Border Gateway Protocol) “Policies”, not just minimize path “Path vector”: given reported paths to D from each nbr, and configured preferences, choose your path to D –don’t ever route through domain X, or not to D, or only as last resort Other policies: don’t tell nbr about D, or lie to nbr about D making path look worse

Interesting use of BGP Lifeguard: Locating Internet Failures WEffectively and Generating Usable Alternative Routes Dynamically Work at University of Washington: –Ethan Katz-Bassett, David Choffnes, Colin Scott, Arvind Krishnamurthy, Tom Anderson If want others to avoid ASx, claim ASx is already in the path!

106 Path vector/Distance vector Distance vector –Each router reports to its neighbors {(D,cost)} –Each router chooses best path based on min (reported cost to D+link cost to nbr) Path vector –Each rtr R reports {(D,list of AS’s in R’s chosen path to D)…} –Each rtr chooses best path based on configured policies

107 BGP Configuration path preference rules which nbr to tell about which destinations how to “edit” the path when telling nbr N about prefix P (add fake hops to discourage N from using you to get to P)

108 So, world is confusing, what with layer 2 and layer 3

109 So, world is confusing, what with layer 2 and layer 3 So let’s invent layer 2 ½!

110 What’s wrong with bridges? Suboptimal routing Traffic concentration Temporary loops real dangerous (no hop count, exponential proliferation)‏ Fragile –If lose packets (congestion?), turn on port

111 Why not replace bridges with IP routers? Subtle reason: IP needs address per link. Layer 3 doesn’t have to work that way –CLNP / DECnet Bottom level of routing is a whole cloud with the same prefix Routing is to endnodes inside the cloud Enabled by “ES-IS” protocol, where endnodes periodically announce themselves to the routers Also in ES-IS: routers announce themselves to endnodes…

112 Hierarchy One prefix per linkOne prefix per campus 2* 25* 28* 292* 22* 293* 2*

113 A bit of history 1992…Internet could have adopted CLNP Easier to move to a new layer 3 back then –Internet smaller –Not so mission critical –IP hadn’t yet (out of necessity) invented DHCP, NAT, so CLNP gave understandable advantages CLNP still has advantages over IPv6 (e.g., large multilink level 1 clouds)

114 TRILL working group in IETF TRILL= TRansparent Interconnection of Lots of Links Use layer 3 routing, and encapsulate with a civilized header But still look like a bridge from the outside

115 Goal Design so that change can be incremental With TRILL, replace any subset of bridges with RBridges –still looks to IP like one giant Ethernet –the more bridges you replace with RBridges, better bandwidth utilization, more stability

116 Run link state protocol So all the RBridges know how to reach all the other RBridges But don’t know anything about endnodes

117 Why link state? Since all switches know the complete topology, easy to compute lots of trees deterministically (we’ll get to that later) Easy to piggyback “nickname allocation protocol” (we’ll get to that later)

118 Routing inside campus First RB encapsulates to last RB –So header is “safe” (has hop count) –Inner RBridges only need to know how to reach destination RBridge Still need tree for unknown/multicast –But don’t need spanning tree protocol – compute tree(s) deterministically from the link state database

119 Rbridging R7 R1 R3 R4 R6 R2 R5 a c

120 Details What the encapsulated packet looks like How R1 knows that R2 is the correct “last RBridge”

121 Encapsulated Frame (Ethernet) outer header TRILL header original frame dest (nexthop) srce (Xmitter) Ethertype=TRILL first RBridge last RBridge TTL TRILL header specifies RBridges with 2-byte nicknames

122 2-byte Nicknames Saves hdr room, faster fwd’ing Dynamically acquired Choose unused #, announce in LSP If collision, IDs and priorities break tie Loser chooses another nickname Configured nicknames higher priority

123 How does R1 know that R2 is the correct “last RBridge”? If R1 doesn’t, R1 sends packet through a tree When R2 decapsulates, it remembers (ingress RBridge, source MAC)

124 How does R1 know that R2 is the correct “last RBridge”? Original design –R1 responsible for learning its attached endnodes, and advertising to the other RBs

125 How does R1 know that R2 is the correct “last RBridge”? Original design –R1 responsible for learning its attached endnodes, and advertising to the other RBs People more familiar with layer 2 wanted –R2 (decapsulating RB) sees “first RB=R1, source MAC =a” and learns that “a” is attached to R1

126 Compromise Mandatory for last RB to learn from data Optional for an RB to advertise its endnodes Optional to learn from advertisements Advertised info “more definitive” than seeing source address in data packets Reasoning –In some cases learning local endnodes is more definitive

127 What if R1 doesn’t know that R2 is the correct “last RBridge”? If R1 doesn’t, R1 sends packet through a tree When R2 decapsulates, it remembers (ingress RBridge, source MAC)

Trees Calculate based on link state database (not by running the spanning tree protocol) Original design: One tree WG wanted to multipath multidestination frames So TRILL calculates some # of trees, and ingress RB selects which tree 128

129 Use of “first” and “last” RBridge in TRILL header For Unicast, obvious –Route towards “last” RBridge –Learn location of source from “first” RBridge For Multicast/unknown destination –Use of “first” to learn location of source endnode to do “RPF check” on multicast –Use of “last” To allow first RB to specify a tree Campus calculates some number of trees

Multiple trees 130 R1 specifies which tree (yellow, red, or blue) R1

RPF check RPF=reverse path forwarding For safety on multidestination frames…do sanity check: Could this frame have arrived on this tree, on this port, from this ingress RB? 131

Filtering of Multidestination Frames Sometimes a frame need not be “spanning”, i.e., it need not be delivered everywhere Filtering is optional Two things that can help limit the spread (no RBs along a branch of the tree need to see the pkt) –VLAN –IP multicast group 132

Summary Tree distribution Each RB, say RB1, calculates which ports are in tree X (for each of the several trees) For tree X, for each port in tree X –RPF info: Which ingress RBs on that port –Filtering info: Which VLANs, which IP multicast addresses, that this port leads to If RPF=true, and that branch leads to receivers, transmit on that port 133

Some of the Future Work Taming various types of broadcast traffic (ARP, NETBIOS) –Cache ARP replies and negative responses –Query a directory which stores (IP, MAC, switch nickname) Could be ingress switch, hypervisor, or end node Pseudonode nickname OAM Increasing the number of VLANs 134

135 Algorhyme v2 I hope that we shall one day see A graph more lovely than a tree. A graph to boost efficiency While still configuration-free. A network where RBridges can Route packets to their target LAN. The paths they find, to our elation, Are least cost paths to destination. With packet hop counts we now see, The network need not be loop-free. RBridges work transparently. Without a common spanning tree. Ray Perlner

136 Wrap-up folklore of protocol design things too obvious to say, but everyone gets them wrong

137 Forward Compatibility Reserved fields –spare bits –ignore them on receipt, set them to zero. Can maybe be used for something in the future TLV encoding –type, length, value –so can skip new TLVs –maybe have range of T’s to ignore if unknown, others to drop packet

138 Forward Compability Make fields large enough –IP address, packet identifier, TCP sequence # Version number –what is “new version” vs “new protocol”? same lower layer multiplex info –therefore, must always be in same place! –drop if version # bigger

139 Fancy version # variants Might be security threat to trick two Vn nodes into talk V(n-1) So maybe have “highest version I support” in addition to “version of this packet” Or just a bit “I can support higher” (we did this for IKEv2) Maybe have “minor version #”, for compatible changes. Old node ignores it

140 Version # Nobody seems to do this right IP, IKEv1, SSL unspecified what to do if version # different. Most implementations ignore it. SSL v3 moved version field! –v2 sets it to 0.2. v3 sets (different field) to 3.0. –v2 node will ignore version number field, and happily parse the rest of the packet

141 Avoid “flag days” Want to be able to migrate a running network ARPANET routing: ran both routing algorithms (but they had to compute the same forwarding table) –initially forward based on old, compute both –one by one: forward based on new –one-by-one: delete old

142 Parameters Minimize these: –someone has to document it –customer has to read documentation and understand it How to avoid –architectural constants if possible –automatically configure if possible

143 Settable Parameters Make sure they can’t be set incompatibly across nodes, across layers, etc. (e.g., hello time and dead timer) Make sure they can be set at nodes one at a time and the net can stay running

144 Parameter tricks IS-IS –pairwise parameters reported in “hellos” –area-wide parameters reported in LSPs Bridges –Use Root’s values, sent in spanning tree msgs

145 Summary If things aren’t simple, they won’t work Good engineering requires understanding tradeoffs and previous approaches. It’s never a “waste of time” to answer “why is something that way” Don’t believe everything you hear Know the problem you’re solving before you try to solve it!