The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University

Slides:



Advertisements
Similar presentations
Sundar Iyer Winter 2012 Lecture 8a Packet Buffers with Latency EE384 Packet Switch Architectures.
Advertisements

Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 High Speed Router Design Shivkumar Kalyanaraman Rensselaer Polytechnic Institute
Router Architecture : Building high-performance routers Ian Pratt
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
May 28th, 2002Nick McKeown 1 Scaling routers: Where do we go from here? HPSR, Kobe, Japan May 28 th, 2002 Nick McKeown Professor of Electrical Engineering.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
1 Input Queued Switches: Cell Switching vs. Packet Switching Abtin Keshavarzian Joint work with Yashar Ganjali, Devavrat Shah Stanford University.
1 Circuit Switching in the Core OpenArch April 5 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 Comnet 2006 Communication Networks Recitation 5 Input Queuing Scheduling & Combined Switches.
Analyzing Single Buffered Routers Sundar Iyer, Rui Zhang, Nick McKeown (sundaes, rzhang, High Performance Networking Group Departments.
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
t Popularity of the Internet t Provides universal interconnection between individual groups that use different hardware suited for their needs t Based.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
EE 122: Router Design Kevin Lai September 25, 2002.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Introduction.
CS 268: Lecture 12 (Router Design) Ion Stoica March 18, 2002.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
Katz, Stoica F04 EECS 122: Introduction to Computer Networks Switch and Router Architectures Computer Science Division Department of Electrical Engineering.
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
1 Netcomm 2005 Communication Networks Recitation 5.
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
August 20 th, A 2.5Tb/s LCS Switch Core Nick McKeown Costas Calamvokis Shang-tse Chuang Accelerating The Broadband Revolution P M C - S I E R R.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
4: Network Layer4b-1 Router Architecture Overview Two key router functions: r run routing algorithms/protocol (RIP, OSPF, BGP) r switching datagrams from.
Chapter 4 Queuing, Datagrams, and Addressing
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Computer Networks Switching Professor Hui Zhang
Professor Yashar Ganjali Department of Computer Science University of Toronto
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
Router Architecture Overview
Salim Hariri HPDC Laboratory Enhanced General Switch Management Protocol Salim Hariri Department of Electrical and Computer.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
EE384y EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science,
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
An Introduction to Packet Switching Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Forwarding.
Winter 2006EE384x Handout 11 EE384x: Packet Switch Architectures Handout 1: Logistics and Introduction Professor Balaji Prabhakar
Opticomm 2001Nick McKeown1 Do Optics Belong in Internet Core Routers? Keynote, Opticomm 2001 Denver, Colorado Nick McKeown Professor of Electrical Engineering.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Network layer (addendum) Slides adapted from material by Nick McKeown and Kevin Lai.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Weren’t routers supposed
Packet Forwarding.
Addressing: Router Design
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
Presentation transcript:

The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University

Outline Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?” The Fork-Join Router Parallel Packet Switches

First Generation Packet Switches Shared Backplane Line Interface CPU Memory CPU Buffer Memory Line Interface DMA MAC Line Interface DMA MAC Line Interface DMA MAC Fixed length “DMA” blocks or cells. Reassembled on egress linecard Fixed length cells or variable length packets

Second Generation Packet Switches CPU Buffer Memory Line Card DMA MAC Local Buffer Memory Line Card DMA MAC Local Buffer Memory Line Card DMA MAC Local Buffer Memory

Third Generation Packet Switches Line Card MAC Local Buffer Memory CPU Card Line Card MAC Local Buffer Memory Switched Backplane Line Interface CPU Memory

Fourth Generation Packet Switches

Two Basic Techniques Input-queued Crossbar Shared Memory 1+1 = 2 operations per cell time N+N = 2N operations per cell time

Shared Memory The Ideal A ZZ A ZZZ A A Z A ZPIKTD AAAAAAA FXHBAD Numerous work has proven and made possible: –Fairness –Delay Guarantees –Delay Variation Control –Loss Guarantees –Statistical Guarantees

Precise Emulation of an Output Queued Switch NN Output Queued Switch 1 N Combined Input-Output Queued Switch = ? Scheduler

Result Theorem: A speedup of 2-1/N is necessary and sufficient for a combined input- and output-queued switch to precisely emulate an output-queued switch for all traffic. Joint work with Balaji Prabhakar at Stanford.

Outline Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?” The Fork-Join Router Parallel Packet Switches

Buffer Memory How Fast Can I Make a Packet Buffer? Buffer Memory 5ns SRAM Rough Estimate: –5ns per memory operation. –Two memory operations per packet. –Therefore, maximum 51.2Gb/s. –In practice, closer to 40Gb/s. 64-byte wide bus

Buffer Memory Is It Going to Get Better? time Specmarks, Memory size, Gate density time Memory Bandwidth (to core)

Optical Physical Layers… …are Going to Make Things “Worse” DWDM: –More ’s per fiber  more “ports” per switch. –# ports: 16, …, 1000’s. Data rate: –More b/s per  higher capacity. –Data rates: 2.5Gb/s, 10Gb/s, 40Gb/s, 160Gb/s, …

Approach #1: Ping-pong Buffering Buffer Memory 64-byte wide bus Buffer Memory 64-byte wide bus

Approach #1: Ping-pong Buffering Buffer Memory 64-byte wide bus Buffer Memory 64-byte wide bus Memory bandwidth doubled to ~80 Gb/s

Approach #2: Multiple Parallel Buffers aka Banking, Interleaving Buffer Memory Buffer Memory Buffer Memory Buffer Memory

Outline Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?” The Fork-Join Router Parallel Packet Switches

The Fork-Join Router 1 2 k 1 N rate, R 1 N Router Bufferless

The Fork-Join Router Advantages –k  memory bandwidth  –k  lookup/classification rate  –k  routing/classification table size  Problems –How to demultiplex prior to lookup/classification? –How does the system perform/behave? –Can we predict/guarantee performance?

Outline Quick Background on Packet Switches What’s the problem? “What if data rates exceed memory bandwidth?” The Fork-Join Router Parallel Packet Switches

A Parallel Packet Switch 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k

Parallel Packet Switch Questions 1.Can it be work-conserving? 2.Can it emulate a single big output queued switch? 3.Can it support delay guarantees, strict-priorities, WFQ, …? 4.What happens with multicast?

Parallel Packet Switch Work Conservation rate, R k 1 R/k Input Link Constraint Output Link Constraint

Parallel Packet Switch Work Conservation rate, R k 1 R/k Output Link Constraint

Parallel Packet Switch Work Conservation 1 N rate, R 1 N Output Queued Switch Output Queued Switch Output Queued Switch 1 2 k S(R/k)

Precise Emulation of an Output Queued Switch NN Output Queued Switch 1 N Parallel Packet Switch = ? 1 N 1 N

Parallel Packet Switch Theorems 1.If S > 2k/(k+2)  2 then a parallel packet switch can be work- conserving for all traffic. 2.If S > 2k/(k+2)  2 then a parallel packet switch can precisely emulate a FCFS output-queued switch for all traffic.

Parallel Packet Switch Theorems 3. If S > 3k/(k+3)  3 then a parallel packet switch can be precisely emulate a switch with WFQ, strict priorities, and other types of QoS, for all traffic.

An aside Unbuffered Clos Circuit Switch Expansion factor required = 2-1/N

Clos Network I1I1 IXIX a b c O1O1 OXOX m { }m}m }m}m O 1 O 2 O 3 O x I 1 I 2 I 3 I x b <= min(R,m) entries in each row <= min(R,m) entries in each column R middle stage switches

Clos Network I1I1 IXIX a b c O1O1 OXOX m { }m}m }m}m O 1 O 2 O 3 O x I 1 I 2 I 3 I x b <= min(R,m) entries in each row <= min(R,m) entries in each column R middle stage switches Define: UIL(I i ) = used links at switch I i to connect to middle stages. UOL(O i ) = used links at switch O i to connect to middle stages. If we wish to connect I i to O i : When adding connection: |UIL(I i )| <= m-1 and |UOL(O i )| <= m-1 Worst-case: |UIL(I i ) U UOL(O i )| = 2m -2 Therefore, if R >= 2m-2 there are always enough middle stages.

An aside Unbuffered Clos Circuit Switch Expansion factor required = 2-1/N Expansion  2 - 4/(k+2)

Fork-Join Router Project What’s next? Theory: –Extending results to distributed algorithms. –Extending results to multicast. Implementation/Prototyping: –Under discussion...