IEE, October 2001Nick McKeown1 High Performance Routers IEE, London October 18 th, 2001 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University Stanford High Performance Networking group:
IEE, October 2001Nick McKeown2 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
IEE, October 2001Nick McKeown3 What is Routing? R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination DD
IEE, October 2001Nick McKeown4 What is Routing? R3 A B C R1 R2 R4D E F R5 F R3E D Next HopDestination D D D D Data Options (if any) Destination Address Source Address Header ChecksumProtocolTTL Fragment Offset Flags Fragment ID Total Packet LengthT.ServiceHLenVer 20 bytes
IEE, October 2001Nick McKeown5 What is Routing? A B C R1 R2 R3 R4D E F R5
IEE, October 2001Nick McKeown6 Points of Presence (POPs) A B C POP1 POP3 POP2 POP4 D E F POP5 POP6 POP7 POP8
IEE, October 2001Nick McKeown7 Where High Performance Routers are Used R10 R11 R4 R13 R9 R5 R2 R1 R6 R3 R7 R12 R16 R15 R14 R8 (2.5 Gb/s)
IEE, October 2001Nick McKeown8 What a Router Looks Like Cisco GSR 12416Juniper M160 6ft 19” 2ft Capacity: 160Gb/s Power: 4.2kW 3ft 2.5ft 19” Capacity: 80Gb/s Power: 2.6kW
IEE, October 2001Nick McKeown9 Generic Router Architecture Lookup IP Address Update Header Header Processing DataHdrDataHdr 1M prefixes Off-chip DRAM Address Table Address Table IP AddressNext Hop Queue Packet Buffer Memory Buffer Memory 1M packets Off-chip DRAM
IEE, October 2001Nick McKeown10 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table DataHdrDataHdrDataHdr Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory DataHdrDataHdrDataHdr
IEE, October 2001Nick McKeown11 Why do we Need Faster Routers? 1.To prevent routers becoming the bottleneck in the Internet. 2.To increase POP capacity, and to reduce cost, size and power.
IEE, October 2001Nick McKeown12 Why we Need Faster Routers 1: To prevent routers from being the bottleneck 0, Fiber Capacity (Gbit/s) TDMDWDM Packet processing PowerLink Speed 2x / 18 months2x / 7 months Source: SPEC95Int & David Miller, Stanford.
IEE, October 2001Nick McKeown13 POP with smaller routers Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs POP with large routers Ports: Price >$100k, Power > 400W. It is common for 50-60% of ports to be for interconnection.
IEE, October 2001Nick McKeown14 Why are Fast Routers Difficult to Make? 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law.
IEE, October 2001Nick McKeown15 Why are Fast Routers Difficult to Make? Speed of Commercial DRAM 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law. Moore’s Law 2x / 18 months 1.1x / 18 months
IEE, October 2001Nick McKeown16 Why are Fast Routers Difficult to Make? 1.It’s hard to keep up with Moore’s Law: The bottleneck is memory speed. Memory speed is not keeping up with Moore’s Law. 2.Moore’s Law is too slow: Routers need to improve faster than Moore’s Law.
IEE, October 2001Nick McKeown17 Router Performance Exceeds Moore’s Law Growth in capacity of commercial routers: Capacity 1992 ~ 2Gb/s Capacity 1995 ~ 10Gb/s Capacity 1998 ~ 40Gb/s Capacity 2001 ~ 160Gb/s Capacity 2003 ~ 640Gb/s Average growth rate: 2.2x / 18 months.
IEE, October 2001Nick McKeown18 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
IEE, October 2001Nick McKeown19 Route Table CPU Buffer Memory Line Interface MAC Line Interface MAC Line Interface MAC Typically <0.5Gb/s aggregate capacity First Generation Routers Shared Backplane Line Interface CPU Memory
IEE, October 2001Nick McKeown20 Second Generation Routers Route Table CPU Line Card Buffer Memory Line Card MAC Buffer Memory Line Card MAC Buffer Memory Fwding Cache Fwding Cache Fwding Cache MAC Buffer Memory Typically <5Gb/s aggregate capacity
IEE, October 2001Nick McKeown21 Third Generation Routers Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory Fwding Table Routing Table Fwding Table Typically <50Gb/s aggregate capacity
IEE, October 2001Nick McKeown22 Fourth Generation Routers/Switches Optics inside a router for the first time Switch Core Linecards Optical links 100s of metres Tb/s routers in development
IEE, October 2001Nick McKeown23 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
IEE, October 2001Nick McKeown24 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table Lookup IP Address Address Table Address Table
IEE, October 2001Nick McKeown25 IP Address Lookup Why it’s thought to be hard: 1.It’s not an exact match: it’s a longest prefix match. 2.The table is large: about 120,000 entries today, and growing. 3.The lookup must be fast: about 30ns for a 10Gb/s line.
IEE, October 2001Nick McKeown26 IP Lookups find Longest Prefixes / / / / / / Routing lookup: Find the longest matching prefix (aka the most specific route) among all prefixes that match the destination address.
IEE, October 2001Nick McKeown27 IP Address Lookup Why it’s thought to be hard: 1.It’s not an exact match: it’s a longest prefix match. 2.The table is large: about 120,000 entries today, and growing. 3.The lookup must be fast: about 30ns for a 10Gb/s line.
IEE, October 2001Nick McKeown28 Address Tables are Large Source: Geoff Huston, Oct 2001
IEE, October 2001Nick McKeown29 IP Address Lookup Why it’s thought to be hard: 1.It’s not an exact match: it’s a longest prefix match. 2.The table is large: about 120,000 entries today, and growing. 3.The lookup must be fast: about 30ns for a 10Gb/s line.
IEE, October 2001Nick McKeown30 Lookups Must be Fast 12540Gb/s Gb/s Gb/s Mb/s B packets (Mpkt/s) LineYear
IEE, October 2001Nick McKeown31 IP Address Lookup Binary tries Example Prefixes: a) b) c) d) 001 e) 0101 f) 011 g) 100 h) 1010 i) 1100 j) e f g h i j 01 a bc d
IEE, October 2001Nick McKeown32 Prefix Length Distribution 99.5% prefixes are 24-bits or shorter Source: Geoff Huston, Oct 2001
IEE, October 2001Nick McKeown Direct Lookup Trie 0000…… …… bits 8 bits When pipelined, allows one lookup per memory access. Although inefficient use of memory, total memory cost < $20.
IEE, October 2001Nick McKeown34 IP Address Lookup Summary Lookup limited by memory bandwidth. Lookup uses high-degree trie. State of the art: 10Gb/s line rate. Scales to: 40Gb/s line rate. By 2008, entire IPv4 address space will fit on one $20 DRAM!
IEE, October 2001Nick McKeown35 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
IEE, October 2001Nick McKeown36 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory Buffer Manager Buffer Memory Buffer Memory
IEE, October 2001Nick McKeown37 Fast Packet Buffers Example: 40Gb/s packet buffer Size = RTT*BW = 10Gb; 40 byte packets Write Rate, R 1 packet every 8 ns Read Rate, R 1 packet every 8 ns Buffer Manager Buffer Memory Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use SRAM? + fast enough random access time, but - too low density to store 10Gb of data. Use DRAM? + high density means we can store data, but - too slow (50ns random access time). Use DRAM? + high density means we can store data, but - too slow (50ns random access time).
IEE, October 2001Nick McKeown38 DRAM Buffer Memory Packet Caches Buffer Manager SRAM Arriving Packets Departing Packets 12 Q Small ingress SRAM cache of FIFO headscache of FIFO tails Q 2 Small ingress SRAM Q 2 DRAM Buffer Memory b>>1 packets at a time
IEE, October 2001Nick McKeown39 Packet Buffers Summary Packet buffers limited by memory bandwidth. Packet buffer caches use hybrid SRAM+DRAM. State of the art: 10Gb/s line rate. Scales to: 40Gb/s line rate.
IEE, October 2001Nick McKeown40 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future
IEE, October 2001Nick McKeown41 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N N times line rate
IEE, October 2001Nick McKeown42 Generic Router Architecture Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Lookup IP Address Update Header Header Processing Address Table Address Table Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory Queue Packet Buffer Memory Buffer Memory DataHdr DataHdr DataHdr 1 2 N 1 2 N DataHdr DataHdr DataHdr Scheduler
IEE, October 2001Nick McKeown43 A Router with Input Queues The best that any queueing system can achieve.
IEE, October 2001Nick McKeown44 A Router with Input Queues Head of Line Blocking The best that any queueing system can achieve.
IEE, October 2001Nick McKeown45 Head of Line Blocking
IEE, October 2001Nick McKeown46 Virtual Output Queues
IEE, October 2001Nick McKeown47 A Router with Virtual Output Queues The best that any queueing system can achieve.
IEE, October 2001Nick McKeown48 Maximum Weight Matching A 1 (n) N N L NN (n) A 1N (n) A 11 (n) L 11 (n) 11 A N (n) A NN (n) A N1 (n) D 1 (n) D N (n) L 11 (n) L N1 (n) “Request” Graph Bipartite Match S*(n) Maximum Weight Match
IEE, October 2001Nick McKeown49 Outline of Proof
IEE, October 2001Nick McKeown50 There are now many ways to achieve 100% throughput…
IEE, October 2001Nick McKeown51 The Evolution of Switching Theory: Practice: Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) Input Queueing (IQ) 58% [Karol, 1987] IQ + VOQ, Maximum weight matching IQ + VOQ, Maximum weight matching IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. IQ + VOQ, Sub-maximal size matching e.g. PIM, iSLIP. 100% [M et al., 1995] Different weight functions, incomplete information, pipelining. Different weight functions, incomplete information, pipelining. Randomized algorithms 100% [Tassiulas, 1998] 100% [Various] Various heuristics, distributed algorithms, and amounts of speedup Various heuristics, distributed algorithms, and amounts of speedup IQ + VOQ, Maximal size matching, Speedup of two. IQ + VOQ, Maximal size matching, Speedup of two. 100% [Dai & Prabhakar, 2000]
IEE, October 2001Nick McKeown52 Maximal Matching with Speedup Fluid model Traffic must satisfy a strong law of large numbers: The fluid model “washes” out the packet structure, but can still prove stability results.
IEE, October 2001Nick McKeown53 Fluid Model
IEE, October 2001Nick McKeown54 Switching Summary Routers use virtual output queues, and a centralized scheduler. State of the art: 2.5Tb/s. Scales to: ~10Tb/s. 100% throughput will be standard soon.
IEE, October 2001Nick McKeown55 Current Internet Router Technology Summary There are three potential bottlenecks: Address lookup, Packet buffering, and Switching. Techniques exist today for: 10+Tb/s Internet routers, with 40Gb/s linecards. But what comes next…?
IEE, October 2001Nick McKeown56 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
IEE, October 2001Nick McKeown57 External Parallelism: Multiple Parallel Routers What we’d like: R RR R The building blocks we’d like to use: R R R R NxN IP Router capacity 100s of Tb/s
IEE, October 2001Nick McKeown58 Multiple parallel routers Load Balancing RR R 1 2 … … k R R R R/k R R R
IEE, October 2001Nick McKeown59 Intelligent Packet Load-balancing Parallel Packet Switching 1 2 k 1 N rate, R 1 N Router Bufferless R/k
IEE, October 2001Nick McKeown60 Parallel Packet Switching Advantages Single-stage of buffering No excess link capacity k power per subsystem k memory bandwidth k lookup rate
IEE, October 2001Nick McKeown61 Parallel Packet Switch Theorem If S > 2k/(k+2) 2 then a parallel packet switch can precisely emulate a single big router.
IEE, October 2001Nick McKeown62 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
IEE, October 2001Nick McKeown63 Eliminating schedulers Two-Stage Switch [Chang et al., 2001] 1 N 1 N 1 N External Outputs Internal Inputs External Inputs First Round-RobinSecond Round-Robin Load Balancing
IEE, October 2001Nick McKeown64 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
IEE, October 2001Nick McKeown65 Do optics belong in routers? They are already there. Connecting linecards to switches. Optical processing doesn’t belong on the linecard. You can’t buffer light. Minimal processing capability. Optical switching can reduce power.
IEE, October 2001Nick McKeown66 Optics in routers Switch Core Linecards Optical links
IEE, October 2001Nick McKeown67 Complex linecards Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard 10Gb/s linecard: Number of gates: 30M Amount of memory: 2Gbits Cost: >$20k Power: 300W Lookup Tables Switch Fabric Arbitration Optics
IEE, October 2001Nick McKeown68 Replacing the switch fabric with optics Switch Fabric Arbitration Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics electrical Switch Fabric Arbitration Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics Physical Layer Framing & Maintenance Packet Processing Buffer Mgmt & Scheduling Buffer Mgmt & Scheduling Buffer & State Memory Buffer & State Memory Typical IP Router Linecard Lookup Tables Optics optical Req/Grant Candidate technologies 1.MEMs. 2.Fast tunable lasers + passive optical couplers. 3.Diffraction waveguides. 4.Electroholographic materials.
IEE, October 2001Nick McKeown69 160Gb/s 40Gb/s Optical 2-stage Switch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Gb/s Gb/s Linecard #1 Linecard # Tb/s IP Router, 625 linecards, each operating at 160Gb/s. The Stanford Phicticious Optical Router
IEE, October 2001Nick McKeown70 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
IEE, October 2001Nick McKeown71 Evolution to circuit switching Optics enables simple, low-power, very high capacity circuit switches. The Internet was packet switched for two reasons: Expensive links: statistical multiplexing. Resilience: soft-state routing. Neither reason holds today.
IEE, October 2001Nick McKeown72 Fast Links, Slow Routers Processing PowerLink Speed (Fiber) 2x / 2 years2x / 7 months Source: SPEC95Int; Prof. Miller, Stanford Univ.
IEE, October 2001Nick McKeown73 Fewer Instructions Instructions per packet since 1996
IEE, October 2001Nick McKeown74 Outline Background What is a router? Why do we need faster routers? Why are they hard to build? Architectures and techniques The evolution of router architecture. IP address lookup. Packet buffering. Switching. The Future More parallelism. Eliminating schedulers. Introducing optics into routers. Natural evolution to circuit switching?
References
IEE, October 2001Nick McKeown76 References General 1.J. S. Turner “Design of a Broadcast packet switching network”, IEEE Trans Comm, June 1988, pp C. Partridge et al. “A Fifty Gigabit per second IP Router”, IEEE Trans Networking, N. McKeown, M. Izzard, A. Mekkittikul, W. Ellersick, M. Horowitz, “The Tiny Tera: A Packet Switch Core”, IEEE Micro Magazine, Jan-Feb Fast Packet Buffers 1.Sundar Iyer, Ramana Rao, Nick McKeown “Design of a fast packet buffer”, IEEE HPSR 2001, Dallas.
IEE, October 2001Nick McKeown77 References IP Lookups 1.A. Brodnik, S. Carlsson, M. Degermark, S. Pink. “Small Forwarding Tables for Fast Routing Lookups”, Sigcomm 1997, pp B. Lampson, V. Srinivasan, G. Varghese. “ IP lookups using multiway and multicolumn search”, Infocom 1998, pp , vol M. Waldvogel, G. Varghese, J. Turner, B. Plattner. “Scalable high speed IP routing lookups”, Sigcomm 1997, pp P. Gupta, S. Lin, N. McKeown. “Routing lookups in hardware at memory access speeds”, Infocom 1998, pp , vol S. Nilsson, G. Karlsson. “Fast address lookup for Internet routers”, IFIP Intl Conf on Broadband Communications, Stuttgart, Germany, April 1-3, V. Srinivasan, G.Varghese. “Fast IP lookups using controlled prefix expansion”, Sigmetrics, June 1998.
IEE, October 2001Nick McKeown78 References Switching N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% Throughput in an Input-Queued Switch. IEEE Transactions on Communications, 47(8), Aug A. Mekkittikul and N. W. McKeown, "A practical algorithm to achieve 100% throughput in input-queued switches," in Proceedings of IEEE INFOCOM '98, March L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switchs,” in Proc. IEEE INFOCOM ‘98, San Francisco CA, April D. Shah, P. Giaccone and B. Prabhakar, “An efficient randomized algorithm for input-queued switch scheduling,” in Proc. Hot Interconnects J. Dai and B. Prabhakar, "The throughput of data switches with and without speedup," in Proceedings of IEEE INFOCOM '00, Tel Aviv, Israel, March 2000, pp C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff-von Neumann switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas.
IEE, October 2001Nick McKeown79 References Future C.-S. Chang, D.-S. Lee, Y.-S. Jou, “Load balanced Birkhoff- von Neumann switches,” Proceedings of IEEE HPSR ‘01, May 2001, Dallas, Texas. Pablo Molinero-Fernndez, Nick McKeown "TCP Switching: Exposing circuits to IP" Hot Interconnects IX, Stanford University, August 2001 S. Iyer, N. McKeown, "Making parallel packet switches practical," in Proc. IEEE INFOCOM `01, April 2001, Alaska.