Instructor: Rob Nash Readings: Chapter 3, P&D
We have a limited number of hosts so far Also, a limited geographical distance ◦ As broadcast can only take us so far We can connect two distant nodes (or networks) via point-to-point connections ◦ But we don’t service any nodes in between We’d like to build a global network, so we must consider hosts that aren’t directly connected.
“Nature seems […] to reach many of her ends by long circuitous routes.” – Rudolph Lotze “Packets are able to reach many different ends by (sometimes) long circuitous routes” ◦ But imagine this dilemma for a second: ◦ How are packets able to navigate an unknown topology? Ether is simple: send to everybody, but again doesn’t scale
Your phone isn’t directly connected to all other phone users Rather, you’re connected to a switch An operator will provide the “directly connected” illusion by configuring a (temporary) link for use in the call In the same vein, computer networks have packet switches ◦ For use in forwarding/switching packets Routing is the process of building a forwarding table (4)
Very broadly defined here as either: Connection-oriented: Like a telephone call, with temporary state stored at each switch ◦ X.25 ◦ ATM Connectionless: Like the postal service, with even less recourse for problems (no RTS, etc.) ◦ IP, UDP Also, we’ll focus on two specific examples of switching ◦ Ethernet & ATM
Forwarding is a table lookup ◦ Given the input port and ID, what is the output port and outgoing ID? Routing is the algorithm that builds the table ◦ A distributed algorithm by nature of the domain ◦ Should be fair ◦ Consider offering a QoS ◦ This has evolved over the history of networks LAN Switching is an evolution of Ethernet Bridging with performance augmentations
CSS432: Switching and Fowarding 7 Switch Function: ◦ Connects two or more network segments ◦ Forwards packets from input port to output port ◦ Selects a port based on address in packet header Input ports T3 STS-1 T3 STS-1 Switch Output ports
Covers a large geographic area (> 2500m in Ethernet) Support large numbers of hosts (>1024 hosts in Ethernet) Maintain performance (>two packets through a switch) ◦ And for n input ports each with buffer b, we can provide n x b queuing simultaneously Contrast this to Ethernet, where two hosts will compete for the line
Point-to-Point Ethernet MAC Rings A switch adds the star topology to our set ◦ Also, the ability to interconnect any of the above networking technologies As switches may be connected to hosts, or other switches
Switched networks are more scalable than shared- media networks ◦ Directly due to their ability to support many hosts at full speed (limited to memory capacity) And, we can use a switch to combine two disparate networks ◦ A SONET STS-3 link with and a few T3s ◦ Each port runs the appropriate link layer protocol Switching (or forwarding): receiving incoming packets on an input port and selecting the appropriate output port on which to forward the data
How does a switch make its decision? ◦ This depends on the approach {connectionless, etc} ◦ In general, look at the header of the packet for an identifier (could be a local id, could be an IP addr) Use this to make your decision by looking up the ID in a table, and forward accordingly We’ll start simply with the datagram approach
We can provide unique identifiers to each host on the network (e.g., an address) We also will be interested in providing identifiers to label each input and output port in a switch
Each packet contains enough information to enable any switch to forward it How? Just including the complete destination address in every packet. Each switch will use the destination address as the key in the lookup No connection state (thus no setup) All packets are forwarded independently Node failure and reroute is possible
CSS432: Switching and Fowarding Switch 3 Host B Switch 2 Host A Switch 1 Host C Host D Host E Host F Host G Host H DestPort A3 B0 C3 D3 E2 F1 G0 H0 Table at Switch 2
In a simple and static environment, one network operator may know the topology ◦ And, manually install this in switches in the network In a distributed and dynamic environment, no one operator knows the complete topology ◦ Multiple pathways, failing nodes, etc. ◦ This harder problem is routing (Section 4.2) ◦ For now: routing is an assumed background process, and forwarding is a simple lookup
Hosts can send packets at any time (and to anywhere) ◦ No setup or teardown ◦ All switches can immediately forward this packet, assuming a correct routing table Hosts don’t know (or care) about the health of the intermediary network or destination node ◦ You could send a packet to a machine that just lost power ◦ Or, you could send a packet through a network whose switches just lost power Failures may not catastrophically effect communications if alternate routes exists around failed nodes (and the network updates its tables)
A connection-oriented approach ◦ With a setup, communicate, and teardown phase ◦ This may seem like TCP over IP, but we’ll see this is implemented on top of the connectionless approach Setup: establishing connection state and path through the network ◦ Each subsequent packet will follow this path Forwarding tables use VCIs – Virtual Circuit Identifiers – that help uniquely identify connections at a local switch
CSS432: Switching and Fowarding 18 Each switch maintains a VC table The Input Port & VCI uniquely determine a connection Switch 3 Host B Switch 2 Host A Switch 1 VCI = 5 VCI = 11 VCI = 7 VCI = 4 Port (in)VCIPort (out)VCI Port (in)VCIPort (out)VCI Port (in)VCIPort (out)VCI 0734 Switch 1 Switch 2 Switch 3
PVCs – “permanent Virtual Circuits”, which are long-lived (or network operator configured) table entries Signaled: a host may set up or delete a VC dynamically and autonomously
Oracle: How do switches to know what outgoing VCI they should use? ◦ This data is literally downstream of the current switch! Answer: We fill this data in “in reverse”, after we’ve built a path from A to B. ◦ Then, a setup/connection packet from B to A is sent informing each upstream hop of the VCI it should use We signal to set up (reserve a VCI entry) and signal to reclaim these resources when done
At least one RTT delay before any payload is communicated… ◦ Why? Setup packets differ from payload packets ◦ Since setup contains the full GuID for the destination ◦ So, per-packet overhead is reduced relative to the datagram approach When we do get to send data, much network topology is known in advance ◦ There is a receiver and route to that receiver, and the receiver is ready to accept data
Resources are reserved in advance to avoid contention SWP is used in between node pairs along the circuit Flow control is used to prevent congestion, and new circuits are declined if not enough resources at a switch
Popular with telephony companies in the 80s Physical medium : POTS links or ISDN ◦ ISDN integrates speech and data on the same line ◦ Pre-DSL From Wiki: “X.25 is today to a large extent replaced by less complex protocols, especially the Internet protocol (IP) “ Internet protocol
We see the datagram approach is minimal and doesn’t reserve resources in advance ◦ But, it also cannot make the same guarantee that X.25 can We can implement a QoS concept using the connection model, as we set the service level per connection ◦ QoS here: a performance or resource guarantee My packets shouldn’t be delayed (queued) too long My packets will always be accommodated at each switch
Frame Relay is used for VPN construction (4.1.8) ATM is used to link telephony systems across wide areas in a point-to-point configuration
Consider a pair of Ethernets you’d like to connect We could just place a “repeater” terminal that collects all packets on one net and broadcasts them to the other ◦ Shout louder! ◦ This forms an extended LAN ◦ The simplest version does no optimization Note that a “bridge” here could be a host, but it meets our definition of a switch.
Consider a shared-media example Consider the star topology offered by switching ◦ Note that each host has its own dedicated link In the MAC example, link contention is an issue ◦ In the switching example, I can send as much as the switch can forward (or buffer) on my own link
CSS432: Switching and Fowarding 28 Connecting two or more LANs ◦ Repeater L1 – Physical Layer Limitations: <= 2500m and <= 1024 nodes ◦ Bridge (or LAN switch) L2 – link layer No physical limitations Fowarding frames using MAC address Static configuration + partial dynamic configuration (Spanning Tree Protocol) ◦ Router L3 – Network Layer Routing IP packets using IP address Dynamic configuration A Bridge BC XY Z Port 1 Port 2
CSS432: Switching and Fowarding29 Learning Bridges Do not forward when unnecessary Ex. A frame sent from A to B Maintain forwarding table HostPort A1 B1 C1 X2 Y2 Z2 Learn table entries based on source address Ex. An entry for A is registered upon receiving a frame from A Ex. When receiving a frame from B, don’t forward to Port 2 Table is an optimization; need not be complete Entries are expired after a specific period of time A Bridge BC XY Z Port 1 Port 2 Based on datagram switching
How could a network come to have cycles in it? ◦ Perhaps it’s a multi-site distributed net where no one administrator knows the complete topology ◦ Introduced by accident? ◦ More likely: introduced for redundancy! However, Learning Bridges can fail if a cycle exists, so we need a strategy to address graph cycles.
Algorithm deactivates ports to remove cycles ◦ The spanning tree determines which bridges to use, and which bridges should “sit out” Note that a bridge may forward on some ports, but not others Formalized in the IEEE Specification ◦ Bridges adopt this distributed algorithm (as we’ll see) Concept: remove edges from your graph until no cycles exist (the tree is a subset of the graph) ◦ Oddity: vertices in this graph are both hosts and switches
When the network has settled, certain bridges will be designated to forward packets over their IO ports based on their distance to the root (or ID number if a tie) Other bridges or ports will simply be disabled Each bridge decides the ports over which it will and will not forward frames
Elect the smallest ID as the root ◦ Roots always forward over all ports Each bridge computes the distance between it and the root ◦ Usually a per-hop count Trades this information with its neighbors, keeping track of “best” paths ◦ Ie, shortest hop count in this context ◦ Bridges that offer the best paths become designated Finally all bridges determine the root feeder, which is the only bridge that forwards to the root ◦ Chosen so it is closest to the root
CSS432: Switching and Fowarding34 STP Overview Each bridge has unique id (e.g., B1, B2, B3) Select a bridge with smallest id as root Select a bridge on each LAN closest to root as designated bridge (use id to break ties) B3 A C E D B2 B5 B B7 K F H B4 J B1 B6 G I Each bridge forwards frames over each LAN if it is a designated bridge root 1 hop B5 < B7 1 hop B4 < B6 1 hop 2 hops
CSS432: Switching and Fowarding35 STP Details (use p. 191) Initially, each bridge believes it is the root When learn not the root, stop generating configuration messages in steady state, only the root generates messages When learn not a designated bridge, stop forwarding configuration messages in steady state, only designated bridges forward configuration messages If any bridge does not receive configuration message after a period of time, it starts generating configuration messages claiming to be the root. B3 A C E D B2 B5 B B7 K F H B4 J B1 B6 G I Bridges exchange configuration messages (Y, d, X) Y: the id of root to be d: #hops from X to Y X: the sending bridge id (1, 0, 1) (1, 1, 2) (1, 1, 5) (1, 0, 1)
36 STP: ◦ It won’t forward frames over alternative paths for the sake of: Routing around a congested bridge Routing along a shorter path like one from a node on B to another node on K ◦ Scales linearly, and uses broadcast mechanism Bridges in general: ◦ Not scalable (“tens of”) STP Broadcast (forwarding all broadcast/multicast frames in the current practice) ◦ Homogenous networks only (uses network’s frame header) Ethernet to Ethernet Token ring to Token ring ATM to ATM Idea: Partition LANS using coloring/tiling to limit the number Of network segments that will broadcast
“It is never safe to design network software under the assumption that it will run over a single Ethernet segment.” “Bridges happen.” ◦ Drop frames if congested (rare on Ethernet alone) ◦ Frames could be reordered in an extended LAN Not in a singular Ethernet segment
Many ways to build economy & high-end switches ◦ More advanced fabrics are implemented in high-end (core) switches The high level concepts overlap, however One idea: Get a box and a few NICs (DMA) ◦ Not a bad experimentation setup for new protocols ◦ Or cross-protocol examination Not so hot for performance Another idea: Custom Hardware ◦ A shared-memory switch memory with dual ports Crossbar switch Switches that attempt to self-route ( , Batcher & Banyan)
CSS432: Switching and Fowarding 39 Advantage: flexible because a workstation has a CPU. Example ◦ 33MHz 32bit I/O bus 1Gbps for one way from NIC to main memory 500Mbps for a round trip between NIC and main memory Enough to support five 100Mbps Ethenet ◦ What if a packet is very small like 64byes The workstation has 500,000 packets per second (pps). Throughput: 500,000 x 64 x 8 = 256Mbps NIC I/O ctlr CPU Main memory I/O Bus LAN A LAN B LAN C Workstation
CSS432: Switching and Fowarding 40 A simple design Shared bus or memory becomes a bottleneck. (Max. 16 bus masters) Output Port Input Port Shared memory Shared bus Control processor DMA from port to port
CSS432: Switching and Fowarding 41 Without a collision, all inputs delivered to each output All inputs may go to the same output which causes a collision in the output buffer.
Connection-oriented packet switching ◦ Uses signaling (Protocol Q.2931) WAN, but more recently LANs Runs on various physical mediums ◦ SONET ◦ Shared Media such as Wireless ◦ Shared-Media like Ethernet (with LANE) Packets are called cells, which are fixed length ( Bytes)
LAN packets V.S. ATM cells ◦ Consider also CISC v.s. RISC In this light, certain features of ATM shine Observations for a short and simple approach: ◦ Its easier to build HW to do simple (short) jobs ◦ The processing of data is simpler when fixed format RISC ISA commonly has only a few instruction formats Off topic: & Dec.Intel.Xerox Ethernet standard Meaning: Compatibility can be simpler with a common format ◦ Simple and short data {frames, instructions} can often be “trained” or “pipelined”
Observation: homogenous packet length lends to homogenous switching structures ◦ Short and uniform structures can make the task of exploiting parallelism easier Either at the hardware level See simultaneous multithreading Or along protocol stack (simultaneous packet processing, self-organizing streams, etc.) ◦ Uniformity at higher levels tends to promote uniform hardware designs Since this is not custom, often cheaper to build this fast, scalable hardware
Fixed length instructions help to align, fetch, prefetch, optimize, synchronize, reorder etc. ◦ See the original 360 and Robert Tomasulo Variable length instructions are more complex by design, ◦ possibly requiring multiple cycles to fetch a longer instruction And/or more trips across the bus to and from memory All said and done, Ethernet LANs are just as convincing due to their speed, cost, success & adoption rate
Error detection is implemented at endpoints ◦ End-to-end but not at each switch (i.e., at data link layer) Congestion control ◦ Admission control If switches are completely reserved, decline connections
Fixed-size cells can make this easier One Approach: use some SONET overhead to point to the start of the cell in the payload Another Approach: CRC every 5 bytes ◦ If you see no error, you’re likely at an ATM header Repeat this approach looking for the same results every 53 bytes See p.199 for the frame format
Not exaustive ATM offers Qos features ATM offers flow control, LANs are “best effort” ATMS are conservative resource-wise ◦ Connectionless protocols are minimalist ATM can guarantee resources ahead of time ◦ Useful esp. for voice-grade guarantees Fixed length V.S. variable length packets No broadcast (natively) V.S. only broadcast
Layers were built ontop of ATM to support other styles of networks and services ◦ AAL 1-2 is for voice grade guaranteed bit rates ◦ AAL 3-4 is for packet data over ATM This requires S&R, since MTU for Ethernet >> 53B
When packets are being discarded frequently due to lack of resources ◦ arrivalRate > sendRate + bufferSpace for some t