1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
2 Internet traffic x2/yr Router capacity x2.2/18 months 5x
3 POP with smaller routersPOP with large routers Interfaces: Price >$200k, Power > 400W About 50-60% of interfaces are used for interconnection within the POP. Industry trend is towards large, single router per POP. Fast (large) routers Big POPs need big routers
4 100Tb/s optical router Objective To determine the best way to incorporate optics into routers. Push technology hard to expose new issues. Photonics, Electronics, System design Motivating example: The design of a 100 Tb/s Internet router Challenging but not impossible (~100x current commercial systems) It identifies some interesting research problems
5 Arbitration 160Gb/s 40Gb/s Optical Switch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Gb/s Gb/s Electronic Linecard #1 Electronic Linecard #625 Request Grant (100Tb/s = 625 * 160Gb/s) 100Tb/s optical router
6 Research Problems Linecard Memory bottleneck: Address lookup and packet buffering. Architecture Arbitration: Computation complexity. Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.
7 Packet Buffering Problem Packet buffers for a 40Gb/s router linecard Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns 10Gbits Buffer Manager
8 Memory Technology Use SRAM? + Fast enough random access time, but - Too low density to store 10Gbits of data. Use DRAM? + High density means we can store data, but - Can’t meet random access time.
9 Can’t we just use lots of DRAMs in parallel? Buffer Memory Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Buffer Memory Read/write 320B every 32ns 40-79Bytes: 0-39…………… B
10 Works fine if there is only one FIFO Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 40-79Bytes: 0-39…………… B Buffer Memory 320B 40B 320B 40B 320B
11 In practice, buffer holds many FIFOs 40-79Bytes: 0-39…………… B 1 2 Q e.g. In an IP Router, Q might be 200. In an ATM switch, Q might be Write Rate, R One 40B packet every 8ns Read Rate, R One 40B packet every 8ns Buffer Manager 320B ?B 320B ?B How can we write multiple packets into different queues?
12 Arriving Packets R Arbiter or Scheduler Requests Departing Packets R 12 1 Q Small head SRAM cache for FIFO heads SRAM Hybrid Memory Hierarchy Large DRAM memory holds the body of FIFOs Q 2 Writing b bytes Reading b bytes cache for FIFO tails Q 2 Small tail SRAM DRAM
13 160Gb/s Linecard: Packet Buffering Solution Hybrid solution uses on-chip SRAM and off-chip DRAM. Identified optimal algorithms that minimize size of SRAM (12 Mbits). Precisely emulates behavior of 40 Gbit SRAM. DRAM 160 Gb/s Queue Manager klamath.stanford.edu/~nickm/papers/ieeehpsr2001.pdf SRAM
14 Research Problems Linecard Memory bottleneck: Address lookup and packet buffering. Architecture Arbitration: Computation complexity. Switch Fabric Optics: Fabric scalability and speed, Electronics: Switch control and link electronics, Packaging: Three surface problem.
15 Arbitration 160Gb/s 40Gb/s Optical Switch Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering Gb/s Gb/s Electronic Linecard #1 Electronic Linecard #625 Request Grant (100Tb/s = 625 * 160Gb/s) 100Tb/s optical router
16 The Arbitration Problem A packet switch fabric is reconfigured for every packet transfer. At 160Gb/s, a new IP packet can arrive every 2ns. The configuration is picked to maximize throughput and not waste capacity. Known algorithms are too slow.
17 Cyclic Shift? 1 N 1 N Uniform Bernoulli iid traffic: 100% throughput Problem: real traffic is non-uniform
18 Two-Stage Switch External Outputs Internal Inputs 1 N External Inputs Load-balancing cyclic shift Switching cyclic shift 1 N 1 N % throughput for broad range of traffic types (C.S. Chang et al., 2001)
19 External Outputs Internal Inputs 1 N External Inputs Cyclic Shift 1 N 1 N Problem: mis-sequencing
20 Preventing Mis-sequencing 1 N 1 N 1 N The Full Frames First algorithm: Keeps packets ordered and Guarantees a delay bound within the optimum Infocom’02: klamath.stanford.edu/~nickm/papers/infocom02_two_stage.pdf Small Coordination Buffers & ‘FFF’ Algorithm Large Congestion Buffers Cyclic Shift
21 Conclusions Packet Buffering Emulation of SRAM speed with DRAM density Packet buffer for a 160 Gb/s linecard is feasible Arbitration Developed Full Frames First Algorithm 100% throughput without scheduling
22 Two-Stage Switch External Outputs Internal Inputs 1 N External Inputs 1 N 1 N 1 (t) 2 (t) b(t) q(t) a(t) Traffic rate: First cyclic shift: Long-term service opportunities exceed arrivals: