048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.

Slides:



Advertisements
Similar presentations
EE384y: Packet Switch Architectures
Advertisements

1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Lecture 4. Topics covered in last lecture Multistage Switching (Clos Network) Architecture of Clos Network Routing in Clos Network Blocking Rearranging.
Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures.
Nick McKeown CS244 Lecture 6 Packet Switches. What you said The very premise of the paper was a bit of an eye- opener for me, for previously I had never.
Frame-Aggregated Concurrent Matching Switch Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Towards Simple, High-performance Input-Queued Switch Schedulers Devavrat Shah Stanford University Berkeley, Dec 5 Joint work with Paolo Giaccone and Balaji.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
1 Performance Results The following are some graphical performance results out of the literature for different ATM switch designs and configurations For.
The Concurrent Matching Switch Architecture Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
1 ENTS689L: Packet Processing and Switching Buffer-less Switch Fabric Architectures Buffer-less Switch Fabric Architectures Vahid Tabatabaee Fall 2006.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion The.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Course.
6/22/20151 CLOS-NETWORK SWITCHES. H. Jonathan Chao 6/22/2015 Page 2 A Growable Switch Configuration i j.
The Crosspoint Queued Switch Yossi Kanizo (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) and David Hay (Politecnico di Torino, Italy)
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Trend in the design and analysis of Internet Routers University of Pennsylvania March 17 th 2003 Nick McKeown Professor of Electrical Engineering and.
1 Achieving 100% throughput Where we are in the course… 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic,
Optimal Load-Balancing Isaac Keslassy (Technion, Israel), Cheng-Shang Chang (National Tsing Hua University, Taiwan), Nick McKeown (Stanford University,
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
August 20 th, A 2.5Tb/s LCS Switch Core Nick McKeown Costas Calamvokis Shang-tse Chuang Accelerating The Broadband Revolution P M C - S I E R R.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scheduling.
1 Growth in Router Capacity IPAM, Lake Arrowhead October 2003 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Nick McKeown CS244 Lecture 7 Valiant Load Balancing.
Merits of a Load-Balanced AAPN 1.Packets within a flow are transported to their correct destinations in sequence. This is due to the 1:1 logical connection.
High Speed Stable Packet Switches Shivendra S. Panwar Joint work with: Yihan Li, Yanming Shen and H. Jonathan Chao New York State Center for Advanced Technology.
Enabling Class of Service for CIOQ Switches with Maximal Weighted Algorithms Thursday, October 08, 2015 Feng Wang Siu Hong Yuen.
Summary of switching theory Balaji Prabhakar Stanford University.
EE384y EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science,
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Stress Resistant Scheduling Algorithms for CIOQ Switches Prashanth Pappu Applied Research Laboratory Washington University in St Louis “Stress Resistant.
Guaranteed Smooth Scheduling in Packet Switches Isaac Keslassy (Stanford University), Murali Kodialam, T.V. Lakshman, Dimitri Stiliadis (Bell-Labs)
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Based on An Engineering Approach to Computer Networking/ Keshav
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Improving Matching algorithms for IQ switches Abhishek Das John J Kim.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
A Load Balanced Switch with an Arbitrary Number of Linecards I.Keslassy, S.T.Chuang, N.McKeown ( CSL, Stanford University ) Some slides adapted from authors.
Input buffered switches (1)
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
scheduling for local-area networks”
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Packet Forwarding.
Addressing: Router Design
Packet Switching (basics)
Packet Scheduling/Arbitration in Virtual Output Queues and Others
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Scheduling Crossbar Switches
Design Principles of Scalable Switching Networks
Presentation transcript:

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling

Spring – Packet Switch Architectures2 Achieving 100% throughput 1. Switch model 2. Uniform traffic  Technique: Uniform schedule (easy) 3. Non-uniform traffic, but known traffic matrix  Technique: Non-uniform schedule (Birkhoff-von Neumann) 4. Unknown traffic matrix  Technique: Lyapunov functions (MWM) 5. Faster scheduling algorithms  Technique: Speedup (maximal matchings)  Technique: Memory and randomization (Tassiulas)  Technique: Twist architecture (buffered crossbar) 6. Accelerate scheduling algorithm  Technique: Pipelining  Technique: Envelopes  Technique: Slicing 7. No scheduling algorithm  Technique: Load-balanced router

Spring – Packet Switch Architectures3 Outline Up until now, we have focused on high performance packet switches with: 1. A crossbar switching fabric, 2. Input queues (and possibly output queues as well), 3. Virtual output queues, and 4. Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?

Spring – Packet Switch Architectures4 Crossbar switch Limiting factors 1. N 2 crosspoints per chip, or N x N -to-1 multiplexors 2. It’s not obvious how to build a crossbar from multiple chips, 3. Capacity of “I/O”s per chip.  State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip.  About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup.  Crossbar chips today are limited by “I/O” capacity.

Spring – Packet Switch Architectures5 Scaling 1. Scaling Line Rate  Bit-slicing  Time-slicing 2. Scaling Time (Scheduling Speed)  Time-slicing  Envelopes  Frames 3. Scaling Number of Ports  Naïve approach  Clos networks  Benes networks

Spring – Packet Switch Architectures6 Bit-sliced parallelism Linecard (from each input) Cell Cell is “striped” across k identical planes. Scheduler makes same decision for all slices. However, doesn’t decrease scheduling speed Other problem(s)? Scheduler k

Spring – Packet Switch Architectures7 Time-sliced parallelism Cell carried by one plane; takes k cell times. Centralized scheduler is unchanged. It works for each slice in turn. Problem: same scheduling speed Scheduler k Cell Linecard (from each input) Cell

Spring – Packet Switch Architectures8 Scaling 1. Scaling Line Rate  Bit-slicing  Time-slicing 2. Scaling Time (Scheduling Speed)  Time-slicing  Envelopes  Frames 3. Scaling Number of Ports  Naïve approach  Clos networks  Benes networks

Spring – Packet Switch Architectures9 Time-sliced parallelism with parallel scheduling Now scheduling is distributed to each slice. Scheduler has k cell times to schedule Problem(s)? Slow Scheduler 2 1 k Cell Linecard (from each input) Cell 3 Slow Scheduler Slow Scheduler Slow Scheduler

Spring – Packet Switch Architectures10 Envelopes  Envelopes of k cells [Kar et al., 2000]  Problem: “Should I stay or should I go now?”  Waiting  starvation (“Waiting for Godot”)  Timeouts  loss of throughput Slow Scheduler Linecard (at each VOQ) Cell

Spring – Packet Switch Architectures11 Frames for scheduling  The slow scheduler simply takes its decision every k cell times and holds it for k cell times  Often associated with pipelining  Note: pipelined-MWM still stable (intuitively: the weight doesn’t change much)  Possible problem(s)? Slow Scheduler Linecard (at each VOQ) Cell

Spring – Packet Switch Architectures12 Scaling a crossbar  Conclusion:  Scaling the line rate is relatively straightforward (although the chip count and power may become a problem).  Scaling the scheduling decision is more difficult, and often comes at the expense of packet delay.  What if we want to increase the number of ports?  Can we build a crossbar-equivalent from multiple stages of smaller crossbars?  If so, what properties should it have?

Spring – Packet Switch Architectures13 Scaling 1. Scaling Line Rate  Bit-slicing  Time-slicing 2. Scaling Time (Scheduling Speed)  Time-slicing  Envelopes  Frames 3. Scaling Number of Ports  Naïve approach  Clos networks  Benes networks

Spring – Packet Switch Architectures14 Scaling number of outputs Naïve Approach 4 inputs 4 outputs Building Block: 16x16 crossbar switch: Eight inputs and eight outputs required!

Spring – Packet Switch Architectures15 3-stage Clos Network n x kn x k m x mm x m k x nk x n 1 N N = n x m k ≥ n 1 2 … m 1 2 … … … k 1 2 … m 1 N nn

Spring – Packet Switch Architectures16 With k = n, is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)

Spring – Packet Switch Architectures17 With k = n is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?

Spring – Packet Switch Architectures18 With k = n a Clos network is rearrangeably non-blocking Route matching is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D -degree bipartite graph can be colored in D colors. (remember: Birkhoff-von Neumann Decomposition Theorem) Therefore, if k = n, a Clos network is rearrangeably non-blocking (and can therefore perform any permutation).

Spring – Packet Switch Architectures19 How complex is the rearrangement?  Method 1: Find a maximum size bipartite matching for each of D colors in turn, O( DN 2. 5 ).  Why does it work?  Method 2: Partition graph into Euler sets, O( N.logD ) [Cole et al. ‘00]

Spring – Packet Switch Architectures20 Euler partition of a graph Euler partition of graph G : 1.Each odd degree vertex is at the end of one open path. 2.Each even degree vertex is at the end of no open path.

Spring – Packet Switch Architectures21 Euler split of a graph Euler split of G into G 1 and G 2 : 1.Scan each path in an Euler partition. 2.Place each alternate edge into G 1 and G 2 G G1G1 G2G2

Spring – Packet Switch Architectures22 Edge-Coloring using Euler sets  Assume for simplicity that  the graph is regular (all vertices have the same degree, D ), and  D=2 i  Perform i “Euler splits” and 1-color each resulting graph. This is log D operations, each of O(E).

Spring – Packet Switch Architectures23 Implementation Scheduler Route connections Route connections Request graph PermutationPaths

Spring – Packet Switch Architectures24 Implementation Pros  A rearrangeably non-blocking switch can perform any permutation  A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons  Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?

Spring – Packet Switch Architectures25 Strictly non-blocking Clos Network Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.

Spring – Packet Switch Architectures26 Clos Theorem I1I1 I2I2 … ImIm O1O1 O2O2 … OmOm M1M1 M2M2 … … … MkMk n x k m x m k x n 1 N N = n x m k ≥ 2n-1 1 N nn

Spring – Packet Switch Architectures27 Clos Theorem IaIa ObOb 1 1 n k 1 n k 1.Consider adding the n -th connection between 1 st stage I a and 3 rd stage O b. 2.We need to ensure that there is always some center-stage M available. 3.If k > (n – 1) + (n – 1), then there is always an M available. i.e. we need k >= 2n – 1. n – 1 already in use at input and output. n-1 n? 1 n-1 n?

Spring – Packet Switch Architectures28 Benes networks Recursive construction

Spring – Packet Switch Architectures29 Benes networks Recursive construction

Spring – Packet Switch Architectures30 Scaling Crossbars: Summary  Scaling the bit-rate through parallelism is easy.  Scaling the scheduler is hard.  Scaling the number of ports is harder.  Clos network:  Rearrangeably non-blocking with k = n, but routing is complicated,  Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.  Benes network: scaling with small components