Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
2 Contents 1. Motivation A 100 Tb/s router 160 Gb/s packet buffer 2. Theory Generic Packet Buffer Problem Optimal Memory Management 3. Implementation
3 Motivating Design: 100Tb/s Optical Router Arbitration 40Gb/s OpticalSwitch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Electronic Linecard #1 Electronic Linecard #1 Electronic Linecard #625 Electronic Linecard #625 Request Grant Gb/s 160Gb/s Gb/s (100Tb/s = 625 * 160Gb/s)
4 Load Balanced Switch Three stages on a linecard Segmentation/ Frame Building 1st stage 1 2 N Main Buffering 2nd stage 1 2 N R/N RRRR 3rd stage 1 2 N RR Reassembly
5 Advantages Load-balanced switch 100% throughput No switch scheduling Hybrid Optical-Electrical Switch Fabric Low (almost zero) power Can use an optical mesh No reconfiguration of internal switch (MEMS)
6 160 Gb/s Linecard Fixed-size Packets Reassembly Segmentation Lookup/ Processing R 1 N 2 VOQs 2nd Stage Load-balancing Switching 1st Stage 3 rd stage R R R R R 0.4 Gbit at 3.2 ns 0.4 Gbit at 3.2 ns 40 Gbit at 3.2 ns
7 Contents 1. Motivation A 100 Tb/s router 160 Gb/s packet buffer 2. Theory Generic Packet Buffer Problem Optimal Memory Management 3. Implementation
8 Packet Buffering Problem Packet buffers for a 160Gb/s router linecard Buffer Memory Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns 40Gbits Buffer Manager Problem is solved if a memory can be (random) accessed every 3.2ns and store 40Gb of data Scheduler Requests
9 Memory Technology Use SRAM? + Fast enough random access time, but - Too low density to store 40Gbits of data. Use DRAM? + High density means we can store data, but - Can’t meet random access time.
10 Can’t we just use lots of DRAMs in parallel? Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager Buffer Memory Read/write 1280B every 32ns 1280B Buffer Memory Buffer Memory Buffer Memory Buffer Memory … ……………… Scheduler Requests
11 128B Works fine if there is only one FIFO Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager (on chip SRAM) 1280B Buffer Memory 1280B 128B 1280B … ……………… 128B Aggregate 1280B for the queue in fast SRAM and read and write to all DRAMs in parallel Scheduler Requests
12 In practice, buffer holds many FIFOs 1280B 1 2 Q e.g. In an IP Router, Q might be 200. In an ATM switch, Q might be Write Rate, R One 128B packet every 6.4ns Read Rate, R One 128B packet every 6.4ns Buffer Manager 1280B 320B ?B 320B 1280B ?B How can we write multiple packets into different queues? … ……………… Scheduler Requests
13 Buffer Manager Arriving Packets R Scheduler Requests Departing Packets R 12 1 Q Small head SRAM cache for FIFO heads (ASIC with on chip SRAM) Parallel Packet Buffer Hybrid Memory Hierarchy cache for FIFO tails Q 2 Small tail SRAM Large DRAM memory holds the body of FIFOs Q 2 Writing b bytes Reading b bytes DRAM b = degree of parallelism
14 Problem: What is the minimum size of the SRAM needed so that every packet is available immediately within a fixed latency? Solutions: Qb(2 +ln Q) bytes, for zero latency Q(b – 1) bytes, for Q(b – 1) + 1 time slots latency. Problem Examples: 1.160Gb/s line card, b =1280, Q =625: SRAM = 52Mbits 2.160Gb/s line card, b =1280, Q =625: SRAM =6.1Mbits, latency is 40ms.
15 Pipeline Latency, x SRAM Size Queue Length for Zero Latency Queue Length for Maximum Latency Discussion Q=1000, b = 10
16 Contents 1. Motivation A 100 Tb/s router 160 Gb/s packet buffer 2. Theory Generic Packet Buffer Problem Optimal Memory Management 3. Implementation
17 Technology Assumptions in 2005 DRAM Technology Access Time ~ 40 ns Size ~ 1 Gbits Memory Bandwidth ~ 16 Gbps (16 data pins) On-chip SRAM Technology Access Time ~ 2.5 ns Size ~ 64 Mbits Serial Link Technology Bandwidth ~ 10 Gb/s 100 serial links per chip
18 Packet Buffer Chip (x4) Details and Status Incoming: 4x10 Gb/s Outgoing: 4x10 Gb/s 35 pins/DRAM x 10 DRAMs = 350 pins SRAM Memory: 3.1 Mbits with 3.2ns SRAM Implementation starts Fall 2003 DRAM Buffer Manager SRAM R/4