Techniques and problems for

Slides:



Advertisements
Similar presentations
Router Internals CS 4251: Computer Networking II Nick Feamster Spring 2008.
Advertisements

Router Internals CS 4251: Computer Networking II Nick Feamster Fall 2008.
1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Fast Buffer Memory with Deterministic Packet Departures Mayank Kabra, Siddhartha Saha, Bill Lin University of California, San Diego.
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
KARL NADEN – NETWORKS (18-744) FALL 2010 Overview of Research in Router Design.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
1 Circuit Switching in the Core OpenArch April 5 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Analyzing Single Buffered Routers Sundar Iyer, Rui Zhang, Nick McKeown (sundaes, rzhang, High Performance Networking Group Departments.
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
High Performance Networking with Little or No Buffers Yashar Ganjali on behalf of Prof. Nick McKeown High Performance Networking Group Stanford University.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
IEE, October 2001Nick McKeown1 High Performance Routers Slides originally by Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Introduction.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
Professor Yashar Ganjali Department of Computer Science University of Toronto
Optics in Internet Routers Mark Horowitz, Nick McKeown, Olav Solgaard, David Miller Stanford University
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Designing Packet Buffers for Router Linecards Sundar Iyer, Ramana Kompella, Nick McKeown Reviewed by: Sarang Dharmapurikar.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Router Design Bruce Davie with help from Hari Balakrishnan & Nick McKeown.
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Winter 2006EE384x Handout 11 EE384x: Packet Switch Architectures Handout 1: Logistics and Introduction Professor Balaji Prabhakar
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Opticomm 2001Nick McKeown1 Do Optics Belong in Internet Core Routers? Keynote, Opticomm 2001 Denver, Colorado Nick McKeown Professor of Electrical Engineering.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
scheduling for local-area networks”
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Buffer Management and Arbiter in a Switch
INF5050 – Protocols and Routing in Internet (Friday )
Weren’t routers supposed
CS 268: Router Design Ion Stoica February 27, 2003.
Packet Forwarding.
Addressing: Router Design
Parallelism in Network Systems Joint work with Sundar Iyer
EE384x: Packet Switch Architectures
Project proposal: Questions to answer
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Computer Networks: Switching and Queuing
Techniques for Fast Packet Buffers
Presentation transcript:

Techniques and problems for 100Tb/s routers Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm

Topics System architecture: Specific problems Parallel Packet Switches: Building big routers from lots of little routers. TCP Switching: Exposing (optical) circuits to IP. Load-balancing and mis-sequencing. Specific problems Fast IP lookup and classification. Fast packet buffers: buffering packets at 40+Gb/s line-rates Crossbar scheduling: (other SNRC project). Today Today

Building a big router from lots of little routers What we’d like: R R NxN R R The building blocks we’d like to use: R

Why this might be a good idea Larger overall capacity Faster line rates Redundancy Familiarity “After all, this is how the Internet is built”

Parallel Packet Switch Demultiplexor OQ Switch Multiplexor (sR/k) (sR/k) R R 1 2 3 N=4 1 1 Demultiplexor Multiplexor R OQ Switch R 2 Demultiplexor 2 Multiplexor R R 3 OQ Switch Demultiplexor Multiplexor R R k=3 N=4 (sR/k) (sR/k)

Parallel Packet Switch Results A PPS can precisely emulate a FCFS shared memory switch: S > 2k/(k+2) @ 2 and a centralized algorithm, S = 1 with (N.k) buffers in the demultiplexor and multiplexor, and a distributed algorithm.

Topics System architecture: Specific problems Parallel Packet Switches: Building big routers from lots of little routers. TCP Switching: Exposing (optical) circuits to IP. Load-balancing and mis-sequencing. Specific problems Fast IP lookup and classification. Fast packet buffers: buffering packets at 40+Gb/s line-rates Crossbar scheduling: (other SNRC project). Today

Optics and Routing Seminar: A Motivating System Phictitious 100Tb/s IP Router Switch Linecard #1 Linecard #625 160- 320Gb/s 160- 320Gb/s 40Gb/s Line termination IP packet processing Packet buffering Line termination IP packet processing Packet buffering 40Gb/s 160Gb/s 40Gb/s Arbitration Request 40Gb/s Grant

Basic Flow Buffer Buffer 160Gb/s 160Gb/s IP Lookup Ingress Linecard Egress Linecard Buffer Buffer Forwarding Table 625x625 Switch 160- 320Gb/s 160- 320Gb/s 160Gb/s 160Gb/s Packet parse IP Lookup Buffer Mgmt Buffer Mgmt Request Arbitration Grant

Packet buffers External Line ~5ns for SRAM ~50ns for DRAM Buffer Memory External Line 64-byte wide bus 64-byte wide bus Rough Estimate: 5 or 50ns per memory operation. Two memory operations per packet. Therefore, maximum ~50 or 5 Gb/s. Aside: Buffers need to be large for TCP to work well, so DRAM is usually required.

Packet buffers OC768c = 40Gb/s; RTT*BW = 10Gb; 64 byte “cells” Example of ingress queues on an OC768c linecard: OC768c = 40Gb/s; RTT*BW = 10Gb; 64 byte “cells” One memory operation every 12.8ns Buffer Memory External Line 64-byte wide bus 64-byte wide bus Scheduler or Arbiter

This is not possible today SRAM: + fast enough random access time, but - too expensive, and - too low density to store 10Gb of data. DRAM: + high density means we can store data, but - too slow (typically 50ns random access time)

FIFO caches Memory Hierarchy Small egress SRAM cache of FIFO heads Large DRAM memory holds the middle of FIFOs 5 7 6 8 10 9 11 12 14 13 15 50 52 51 53 54 86 88 87 89 91 90 82 84 83 85 92 94 93 95 1 Q 2 Writing b cells Reading cache of FIFO tails 55 56 96 97 57 58 59 60 Small ingress SRAM Arriving Packets R Arbiter or Scheduler Requests Departing 1 2 Q 3 4 5 6 We’ll focus on the sizing of this SRAM

Earliest Critical Queue First (ECQF) In SRAM: A Dynamic Buffer of size Q(b-1) Q b - 1 Cells Lookahead: Arbiter requests for future Q(b-1) + 1 slots are known. Q(b-1) + 1 Requests in lookahead buffer Determine which queue runs into “trouble” first. green! Replenish “b” cells for the “troubled” queue Q b - 1 Cells

Results Single Address Bus Patient Arbiter: ECQF-MMA (earliest critical queue first), minimizes the size of SRAM buffer to Q(b-1) cells; and guarantees that requested cells are dispatched within Q(b-1)+1 cell slots. Proof: Pigeon-hole principle (same argument used for strictly non-blocking conditions for a Clos network).

Implementation Numbers (64byte cells, b = 8, DRAM T = 50ns) VOQ Switch - 32 ports Brute Force : Egress. SRAM = 10 Gb, no DRAM Patient Arbiter : Egress. SRAM = 115kb, Lat. = 2.9 us, DRAM = 10Gb Impatient Arbiter : Egress. SRAM = 787 kb, DRAM = 10Gb Patient Arbiter(MA) : No SRAM, Lat. = 3.2us, DRAM = 10Gb VOQ Switch – 512 ports Brute Force : Egress. SRAM = 10Gb, no DRAM Patient Arbiter : Egress. SRAM = 1.85Mb, Lat. = 45.9us, DRAM 10Gb Impatient Arbiter : Egress. SRAM = 18.9Mb, DRAM = 10Gb Patient Arbiter(MA) : No SRAM, Lat. = 51.2us, DRAM = 10Gb Add a point for building a buffer with only SRAM as a thing to compare with Check and re-calculate all numbers for b=8 Focus on the “WHY?”

Some Next Steps Generalizing results and design rules for: Selectable degrees of “impatience”, Deterministic vs. statistical bounds. Application to design of large shared memory switches