INF5050 – Protocols and Routing in Internet (Friday ) Presented by Tor Skeie Subject: IP-router architecture
Nick McKeown2 This presentation is based on slides from Nick McKeown, with updates Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University Stanford High Performance Networking group:
Nick McKeown3 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
Nick McKeown4 What is Routing? R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination DD
Nick McKeown5 What is Routing? R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination D D D D Data Options (if any) Destination Address Source Address Header ChecksumProtocolTTL Fragment Offset Flags Fragment ID Total Packet LengthT.ServiceHLenVer 20 bytes
Nick McKeown6 Points of Presence (POPs) A B C POP1 POP3 POP2 POP4 D E F POP5 POP6 POP7 POP8
Nick McKeown7 (140 Gb/s) Where High Performance Routers are Used R10 R11 R4 R13 R9 R5 R2 R1 R6 R3 R7 R12 R16 R15 R14 R8 (2.5 Gb/s) (140 Gb/s)
Nick McKeown8 What a Router Looks Like Cisco CRS-3 (CRS-1 16 slot single-shelf on picture) Juniper M320 (M160 on picture) 2.14m 0.60m 0.91m Capacity: 4.48Tb/s Power: 12.3kW Weight: 723kg 0.88m 0.65m 0.44m Capacity: 160Gb/s Power: 3.5kW Capacity is sum of rates of linecards
Nick McKeown9 A fully configured CRS-3 has the capacity of 322 Tb/s "The Cisco CRS-3 triples the capacity of its predecessor, the Cisco CRS-1 Carrier Routing System, with up to 322 Terabits per second, which enables the entire printed collection of the Library of Congress to be downloaded in just over one second; every man, woman and child in China to make a video call, simultaneously; and every motion picture ever created to be streamed in less than four minutes.”
Alcatel 7670 RSP Juniper TX8/T640 TX8 Chiaro Avici TSR Some Multi-rack Routers
Nick McKeown11 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr 1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory 1M packets Off-chip DRAM
Nick McKeown12 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table DataHdrDataHdrDataHdr Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory DataHdrDataHdrDataHdr
Nick McKeown13 Why do we Need Faster Routers? 1.To prevent routers becoming the bottleneck in the Internet. 2.To increase POP capacity, and to reduce cost, 1. size and 2. power.
Nick McKeown14 Why we Need Faster Routers 1: To prevent routers from being the bottleneck 0, Fiber Capacity (Gbit/s) TDMDWDM Packet processing PowerLink Speed 2x / 18 months2x / 7 months Source: SPEC95Int & David Miller, Stanford.
Nick McKeown15 POP with large routersPOP with smaller routers Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs Ports: Price >$100k, Power > 400W. It is common for 50-60% of ports to be for interconnection.
Nick McKeown16 Why are Fast Routers Difficult to Make? 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law.
Nick McKeown17 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law. Why are Fast Routers Difficult to Make? Speed of Commercial DRAM Moore’s Law 2x / 18 months 1.1x / 18 months Moore’s Law 2x / 18 months
Nick McKeown18 Why are Fast Routers Difficult to Make? 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law. 2.Moore’s Law is too slow: Routers need to improve faster than Moore’s Law.
Nick McKeown19 Router Performance Exceeds Moore’s Law Growth in capacity of commercial routers: Capacity 1992 ~ 2Gb/s Capacity 1995 ~ 10Gb/s Capacity 1998 ~ 40Gb/s Capacity 2001 ~ 160Gb/s Capacity 2003 ~ 640Gb/s Capacity 2008 ~ 100Tb/s Capacity 2013 ~ 920Tb/s Average growth rate: 2.2x / 18 months, but the last 5 years: 2.8x / months. 2013: The Cisco CRS-X multishelf router has a capacity of 921.6Tb/s (1152 ports)
Nick McKeown20 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
Nick McKeown21 Route Table CPU Buffer Memory Line Interface MAC Line Interface MAC Line Interface MAC Typically <0.5Gb/s aggregate capacity First Generation Routers Shared Backplane Line Interface CPU Memory
Nick McKeown22 Second Generation Routers Route Table CPU Line Card Buffer Memory Line Card MAC Buffer Memory Line Card MAC Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC Buffer Memory Typically <5Gb/s aggregate capacity
Nick McKeown23 Third Generation Routers Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory Fwding Table Routing Table Fwding Table Typically <50Gb/s aggregate capacity
Nick McKeown24 Fourth Generation Routers/Switches Optics inside a router for the first time Switch Core Linecards Optical links 100s of metres Tb/s routers available/in development
Nick McKeown25 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
Nick McKeown26 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table
Nick McKeown27 IP Address Lookup Why it’s thought to be hard: 1.It’s not an exact match: it’s a longest prefix match. 2.The table is large: about 550,000 entries today, and growing. 3.The lookup must be fast: about 2ns for a 140Gb/s line.
Nick McKeown28 Longest Prefix Match is Harder than Exact Match The destination address of an arriving packet does not carry with it the information to determine the length of the longest matching prefix Hence, one needs to search among the space of all prefix lengths; as well as the space of all prefixes of a given length
Nick McKeown29 IP Lookups find Longest Prefixes / / / / / / Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.
Nick McKeown30 Address Tables are Large
Nick McKeown31 Lookups Must be Fast
Nick McKeown32 IP Address Lookup Binary tries Example Prefixes: a) b) c) d) 001 e) 0101 f) 011 g) 100 h) 1010 i) 1100 j) e f g h i j 01 a bc d
Nick McKeown33 Multi-bit Tries Depth = W Degree = 2 Stride = 1 bit Binary trie W Depth = W/k Degree = 2 k Stride = k bits Multi-ary trie W/k Time ~ W/k Storage ~ NW/k * 2 k-1 W = longest prefix N = #prefixes
Nick McKeown34 Prefix Length Distribution 99.5% prefixes are 24-bits or shorter Source: Geoff Huston, Oct 2001
Nick McKeown Direct Lookup Trie 0000…… …… bits 8 bits When pipelined, allows one lookup per memory access. Inefficient use of memory, though.
Nick McKeown36 Memory Technology (2010) TechnologySingle chip density $/chip ($/MByte) Access speed Watts/ chip Networking DRAM 64 MB$30-$50 ($0.50-$0.75) 30-80ns0.5-2W SRAM4 MB$20-$30 ($5-$8) 4-8ns1-3W TCAM – Ternary CAM 1 MB$200-$250 ($200-$250) 4-8ns15-30W Note: Price, speed and power are manufacturer and market dependent.
Nick McKeown37 Prefix Expansion with Multi-bit Tries If stride = k bits, prefix lengths that are not a multiple of k need to be expanded PrefixExpanded prefixes 0*00*, 01* 11* E.g., k = 2: Maximum number of expanded prefixes corresponding to one non-expanded prefix = 2 k-1
Nick McKeown38 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
Routing and Switching Part II
Nick McKeown40 Conceptual architecture Line cards hosting one or more ports Non-blocking switching core(s) Arbitration/Control Bi-directional ports
Nick McKeown41 Conceptual Packet Buffering Non-blocking switching core(s) Arbitration/Control Bi-directional ports Input buffering
Nick McKeown42 Line Card Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr 1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory 1M packets Off-chip DRAM
Nick McKeown43 Arbitration Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N DataHdr DataHdr DataHdr Arbitration
Nick McKeown44 Head of Line Blocking
Nick McKeown45 A Router with Input Queues Head of Line Blocking The best that any queueing system can achieve.
Nick McKeown46 The Best Performance The best that any queueing system can achieve.
Nick McKeown47 Conceptual Packet Buffering Non-blocking switching core(s) Arbitration/Control Bi-directional ports Central buffer
Nick McKeown48 Fast Packet Buffers ( Example: 40Gb/s packet buffer Size = RTT*BW = 10Gb; 40 byte packets Write Rate, R 1 packet every 8 ns Read Rate, R 1 packet every 8 ns Buffer Manager Buffer Memory Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (40ns random access time). Use DRAM? + high density means we can store data, but - too slow (40ns random access time).
Nick McKeown49 DRAM Buffer Memory Packet Caches Buffer Manager SRAM Arriving Packets Departing Packets 12 Q Small ingress SRAM cache of FIFO headscache of FIFO tails Q 2 Small ingress SRAM Q 2 DRAM Buffer Memory b>>1 packets at a time
Nick McKeown50 Conceptual Packet Buffering Non-blocking switching core(s) Arbitration/Control Bi-directional ports Output buffering
Nick McKeown51 Output buffering Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table DataHdrDataHdrDataHdr Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory DataHdrDataHdrDataHdr
Nick McKeown52 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table
Nick McKeown53 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory
Nick McKeown54 Speed-up Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N N times line rate
Nick McKeown55 Conceptual Packet Buffering Non-blocking switching core(s) Arbitration/Control Bi-directional ports Input buffering with a virtual output queue
Nick McKeown56 Virtual Output Queues
Nick McKeown57 Matching A matching on a graph is a subset of edges of the graph such that no two of them share a vertex in common. edge vertex
Nick McKeown58 Maximum Weight Matching A 1 (n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) S*(n) Longest Queue First (LFQ) 100% Throughput Starvation possible Oldest Cell First (OCF)
Nick McKeown59 A Router with Virtual Output Queues The best that any queueing system can achieve.
Nick McKeown60 Outline of Proof
Nick McKeown61 There are now many ways to achieve 100% throughput…
Nick McKeown62 The Evolution of Switching Theory: Practice: Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) 58% [Karol, 1987] IQ + VOQ, Maximum weight matching IQ + VOQ, Maximum weight matching IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. 100% [M et al., 1995] Different weight functions, incomplete information, pipelining. Different weight functions, incomplete information, pipelining. Randomized algorithms 100% [Tassiulas, 1998] 100% [Various] Various heuristics, distributed algorithms, and amounts of speedup Various heuristics, distributed algorithms, and amounts of speedup IQ + VOQ, Maximal size matching, Speedup of two. IQ + VOQ, Maximal size matching, Speedup of two. 100% [Dai & Prabhakar, 2000]
Nick McKeown63 Maximal Matching with Speedup Fluid model Traffic must satisfy a strong law of large numbers: The fluid model “washes” out the packet structure, but can still prove stability results.
Nick McKeown64 Fluid Model
Nick McKeown65 Packet Buffers Summary Packet buffers limited by memory bandwidth. Packet buffer caches use hybrid SRAM+DRAM. (State of the art: 10Gb/s line rate.) (Scales to: 40Gb/s line rate.)
Nick McKeown66 Switching Summary Routers use virtual output queues, and a centralized scheduler/arbitrator. State of the art: 10Tb/s. Scales to: 40Tb/s.
Nick McKeown67 Current Internet Router Technology Summary There are three potential bottlenecks: Address lookup, Packet buffering, and Switching. Techniques exist today for: 100+Tb/s Internet routers, with 140Gb/s linecards. But what comes next…?
Nick McKeown68 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers.
The Future
Nick McKeown70 Fast Links, Slow Routers Processing PowerLink Speed (Fiber) 2x / 2 years2x / 7 months Source: SPEC95Int; Prof. Miller, Stanford Univ.
Nick McKeown71 Fewer Instructions Instructions per packet since 1996
Nick McKeown72 Complex linecards Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard 10Gb/s linecard: Number of gates: 30M Amount of memory: 2Gbits Cost: >$20k Power: 300W Lookup Tables Switch Fabric Arbitration Optics
Nick McKeown73 External Parallelism: Multiple Parallel Routers What we’d like: R RR R The building blocks we’d like to use: R R R R NxN IP Router capacity 100s of Tb/s
Nick McKeown74 Multiple parallel routers Load Balancing RR R 1 2 … … k R R R R/k R R R
Nick McKeown75 Intelligent Packet Load-balancing Parallel Packet Switching 1 2 k 1 N rate, R 1 N Router Bufferless R/k Demultiplexor Multiplexor
Nick McKeown76 Parallel Packet Switching Advantages Single-stage of buffering No excess link capacity k power per subsystem k memory bandwidth k lookup rate
Nick McKeown77 Parallel Packet Switching Advantages Load-balancing: output links are less congested Scalability: new router can be dynamically added Redundancy
Nick McKeown78 Parallel Packet Switch Theorem If Speed-up > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a single big router.
Nick McKeown79 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
Nick McKeown80 Eliminating schedulers Two-Stage Switch [Chang et al., 2001] 1 N 1 N 1 N External Outputs Internal Inputs External Inputs First Round-RobinSecond Round-Robin Load Balancing
Nick McKeown81 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
Nick McKeown82 Do optics belong in routers? They are already there. Connecting linecards to switches. Optical processing doesn’t belong on the linecard. You can’t buffer light. Minimal processing capability. Optical switching can reduce power.
Nick McKeown83 Optics in routers Switch Core Linecards Optical links
Nick McKeown84 Replacing the switch fabric with optics Candidate technologies 1.MEMs. 2.Fast tunable lasers + passive optical couplers. 3.Diffraction waveguides. 4.Electroholographic materials.
Nick McKeown85 160Gb/s 40Gb/s Optical 2-stage Switch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Gb/s Gb/s Linecard #1 Linecard # Tb/s IP Router, 625 linecards, each operating at 160Gb/s. The Stanford Phicticious Optical Router
Nick McKeown86 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
Nick McKeown87 Evolution to circuit switching Optics enables simple, low-power, very high capacity circuit switches. The Internet was packet switched for two reasons: Expensive links: statistical multiplexing. Resilience: soft-state routing. Neither reason holds today.
Nick McKeown88 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
References
Nick McKeown90 References General 1.J. S. Turner “Design of a Broadcast packet switching network”, IEEE Trans Comm, June 1988, pp C. Partridge et al. “A Fifty Gigabit per second IP Router”, IEEE Trans Networking, N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, M. Horowitz, “The Tiny Tera: A Packet Switch Core”, IEEE Micro Magazine, Jan-Feb Fast Packet Buffers 1.Sundar Iyer, Ramana Rao, Nick McKeown “Design of a fast packet buffer”, IEEE HPSR 2001, Dallas.
Nick McKeown91 References IP Lookups 1.A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp , vol M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp P. Gupta, S. Lin, N. McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp , vol S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.
Nick McKeown92 References Switching N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas.
Nick McKeown93 References Future C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff- von Neumann switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas. Pablo Molinero-Fernndez, Nick McKeown "TCP Switching: Exposing circuits to IP" Hot Interconnects IX, Stanford University, August 2001 S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc. IEEE INFOCOM `01, April 2001, Alaska. I. Keslassy et al. ”Scaling Internet Routers Using Optics” in. Proc. ACM SIGCOMM `03, August 2003, Germany.