Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,

Slides:



Advertisements
Similar presentations
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switch (Borrowed from Isaac Keslassys Defense Talk) Nick McKeown Professor of Electrical Engineering.
Advertisements

1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures.
Optimal-Complexity Optical Router Hadas Kogan, Isaac Keslassy Technion (Israel)
1 Statistical Analysis of Packet Buffer Architectures Gireesh Shrimali, Isaac Keslassy, Nick McKeown
Submitters: Erez Rokah Erez Goldshide Supervisor: Yossi Kanizo.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Routers with a Single Stage of Buffering Sundar Iyer, Rui Zhang, Nick McKeown High Performance Networking Group, Stanford University,
Worst-case Fair Weighted Fair Queueing (WF²Q) by Jon C.R. Bennett & Hui Zhang Presented by Vitali Greenberg.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Analyzing Single Buffered Routers Sundar Iyer, Rui Zhang, Nick McKeown (sundaes, rzhang, High Performance Networking Group Departments.
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
Service Disciplines for Guaranteed Performance Service Hui Zhang, “Service Disciplines for Guaranteed Performance Service in Packet-Switching Networks,”
Scaling Internet Routers Using Optics Producing a 100TB/s Router Ashley Green and Brad Rosen February 16, 2004.
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Using Load-Balancing To Build High-Performance Routers Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
EE 122: Router Design Kevin Lai September 25, 2002.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Fundamental Complexity of Optical Systems Hadas Kogan, Isaac Keslassy Technion (Israel)
Maximum Size Matchings & Input Queued Switches Sundar Iyer, Nick McKeown High Performance Networking Group, Stanford University,
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
1 Netcomm 2005 Communication Networks Recitation 5.
Analysis of a Memory Architecture for Fast Packet Buffers Sundar Iyer, Ramana Rao Kompella & Nick McKeown (sundaes,ramana, Departments.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
Merits of a Load-Balanced AAPN 1.Packets within a flow are transported to their correct destinations in sequence. This is due to the 1:1 logical connection.
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
Summary of switching theory Balaji Prabhakar Stanford University.
Univ. of TehranAdv. topics in Computer Network1 Advanced topics in Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
EE384y EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science,
Winter 2006EE384x1 EE384x: Packet Switch Architectures I Parallel Packet Buffers Nick McKeown Professor of Electrical Engineering and Computer Science,
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Nick McKeown Spring 2012 Lecture 2,3 Output Queueing EE384x Packet Switch Architectures.
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429 Introduction to Computer Networks Lecture 18: Quality of Service Slides used with.
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Techniques for Fast Packet Buffers Sundar Iyer, Ramana Rao, Nick McKeown (sundaes,ramana, Departments of Electrical Engineering & Computer.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
Input buffered switches (1)
Techniques for Fast Packet Buffers Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
scheduling for local-area networks”
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Parallelism in Network Systems Joint work with Sundar Iyer
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
EE384Y: Packet Switch Architectures II
Switch Performance Analysis and Design Improvements
Techniques for Fast Packet Buffers
Presentation transcript:

Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science, Stanford University

Stanford University 2 Motivation To design and analyze: –an architecture of a very high capacity packet switch –in which the memories run slower than the line rate” [Ref: S. Iyer, A. Awadallah, N. McKeown, “Analysis of Packet Switch with Memories Running Slower than the Line Rate, Proc. Infocom, Tel Aviv, Mar 2000.]

Stanford University 3 What limits capacity of packet switches today? Memory bandwidth for packet buffers –Shared memory: B = 2NR –Input queued: B = 2R Switch Arbitration –At the line rate R Packet Processing –At the line rate R

Stanford University 4 How can we scale the capacity of switches? What we’d like: R RR R NxN The building blocks we’d like to use: R R R R Slower NxN Switches Large NxN Switch

Stanford University 5 Why this might be a good idea Larger Capacity Slower than the line rate –Buffering –Arbitration –Packet Processing Redundancy

Stanford University 6 Observations and Questions Random load-balancing: –It’s hard to predict system performance. Flow-by-flow load-balancing: –Worst-case performance is very poor. Can we do better? –What if we switch packet by packet? –Can we achieve 100% throughput –Can we give delay guarantees? 1 2 … … k R R R R/k R R R

Stanford University 7 Architecture of a PPS Definition: A PPS is comprised of multiple identical lower- speed packet-switches operating independently and in parallel. An incoming stream of packets is spread, packet-by- packet, by a demultiplexor across the slower packet-switches, then recombined by a multiplexor at the output. We call this “parallel packet switching”

Stanford University 8 Architecture of a PPS OQ Switch N=4 R R R R R R R R Multiplexor Demultiplexor Multiplexor (sR/k) k=3 1 2 (sR/k)

Stanford University 9 We will compare it to an OQ Switch 1 2 N 1 2 N Output Queued Switch R R R R R R R R Internal BW = 2NR Why? –There is no internal contention –No queueing at the inputs –They give the minimum delay –They can give QoS guarantees

Stanford University 10 Definition Relative Queueing Delay –This is defined as the increased queueing delay faced by a cell in the PPS relative to the delay it receives in a shadow output queued switch –It includes the time difference attributed only due to queueing –A switch is said to emulate an OQ switch if the relative queueing delay is zero

Stanford University 11 A PPS which a bounded relative delay Shadow OQ Switch R R R R R R R R PPS Yes No =? C C t C C t’ C C t” C t’ t” –t’ < Constant

Stanford University 12 Problem Statement Redefined Motivation: “To design and analyze an architecture of a very high capacity packet switch in which the memories run slower than the line rate, which preserves the good properties of an OQ switch” This talk: Expanding the capacity of a FIFO packet switch, with a bounded relative queueing delay, using the PPS architecture.

Stanford University 13 Layer 1 Layer 2 Layer N=4 R R R R R R R R R/3 A Bad Scenario for the PPS

Stanford University 14 Parallel Packet Switch Result Theorem: If S >= 2 then a PPS can emulate a FIFO OQ switch for all traffic patterns.

Stanford University 15 Is this Practical? Load Balancing Algorithm –Is Centralized –Requires N 2 communication complexity –Ideally we want a distributed algorithm Speedup –A speedup of 2 is required –We would ideally like no speedup

Stanford University 16 Layer 1 Layer 2 Layer N=4 R R R R R R R R 2 R/3 Load Balancing in a PPS

Stanford University 17 Distribution of Cells from a Demultiplexor R R/3 Demultiplexor: Input 1 FIFOs for all k=3 layers Cells from every input to a every output are sent to the center stage switches in a round robin manner “No more than 4 consecutive cells can go to the same FIFO i.e. center stage switch”

Stanford University 18 Modification to the PPS Relax the relative queueing bound –Allows a distributed load balancing arbiter Run an independent load balancing algorithm on each demultiplexor –Eliminates N 2 Communication Complexity Keep small & bounded delay buffers at the demultiplexor –Eliminates speedup in the links between the demultiplexor and the center stage switches

Stanford University 19 Layer 1 Layer 2 Layer N=4 R R R R R R R R R/3 Cells as seen by the Multiplexor

Stanford University 20 Solution Read –cells from the corresponding queues (which may be out of order) based on the arrival time from all center stage switches to maintain throughput Introduce –a small and bounded re-sequencing buffer at the multiplexor to re-order cells and send them in sequence Tolerate –a bounded delay relative to the shadow FIFO OQ switch

Stanford University 21 Properties of the PPS Demultiplexor Demultiplexors –Cells arrive at combined rate R over all k FIFOs –Each cell has a property: output –Cells to same output are inserted into the k FIFOs in RR. –Cells are written into each FIFO buffer at leaky bucket rate of less than R/Ck + N –Cells are read from each FIFOs at constant service rate R/k –Max delay faced by a cell is N internal time slots

Stanford University 22 Relative Queueing Delay faced by a Cell Demultiplexors –A maximum relative queueing delay of N internal time slots is encountered by a cell Multiplexors –A maximum relative queueing delay of N internal time slots is encountered by a cell Total Relative Queueing Delay –2N time slots

Stanford University 23 Buffered PPS Results A PPS with a completely distributed algorithm and no speedup with a buffer of size Nk, can emulate a FIFO output queued switch for all traffic patterns within a relative queueing delay bound of 2N internal time slots I.e. 2Nk time slots.

Stanford University 24 Conclusion –Its possible to expand the capacity of a FIFO packet switch using multiple slower speed packet switches. –There remain a couple of open questions Making QoS practical. Making multicasting practical.