EE384y 2004 1 EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science,

Slides:



Advertisements
Similar presentations
EE384y: Packet Switch Architectures
Advertisements

1 Maintaining Packet Order in Two-Stage Switches Isaac Keslassy, Nick McKeown Stanford University.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Lecture 4. Topics covered in last lecture Multistage Switching (Clos Network) Architecture of Clos Network Routing in Clos Network Blocking Rearranging.
Nick McKeown Spring 2012 Lecture 4 Parallelizing an OQ Switch EE384x Packet Switch Architectures.
Nick McKeown Spring 2012 Maximum Matching Algorithms EE384x Packet Switch Architectures.
Isaac Keslassy, Shang-Tse (Da) Chuang, Nick McKeown Stanford University The Load-Balanced Router.
A Scalable Switch for Service Guarantees Bill Lin (University of California, San Diego) Isaac Keslassy (Technion, Israel)
Algorithm Orals Algorithm Qualifying Examination Orals Achieving 100% Throughput in IQ/CIOQ Switches using Maximum Size and Maximal Matching Algorithms.
Making Parallel Packet Switches Practical Sundar Iyer, Nick McKeown Departments of Electrical Engineering & Computer Science,
Switching Units.
Analysis of a Packet Switch with Memories Running Slower than the Line Rate Sundar Iyer, Amr Awadallah, Nick McKeown Departments.
1 OR Project Group II: Packet Buffer Proposal Da Chuang, Isaac Keslassy, Sundar Iyer, Greg Watson, Nick McKeown, Mark Horowitz
Chapter 10 Switching Fabrics. Outline Physical Interconnection Physical box with backplane Individual blades plug into backplane slots Each blade contains.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion MSM.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Scaling.
6/22/20151 CLOS-NETWORK SWITCHES. H. Jonathan Chao 6/22/2015 Page 2 A Growable Switch Configuration i j.
1 Internet Routers Stochastics Network Seminar February 22 nd 2002 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
Nick McKeown 1 Memory for High Performance Internet Routers Micron February 12 th 2003 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 EE384Y: Packet Switch Architectures Part II Load-balanced Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University.
CSE 291-a Interconnection Networks Lecture 7: February 7, 2007 Prof. Chung-Kuan Cheng CSE Dept, UC San Diego Winter 2007 Transcribed by Thomas Weng.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Maximal.
Surprise Quiz EE384Z: McKeown, Prabhakar ”Your Worst Nightmares in Packet Switching Architectures”, 3 units [Total time = 15 mins, Marks: 15, Credit is.
August 20 th, A 2.5Tb/s LCS Switch Core Nick McKeown Costas Calamvokis Shang-tse Chuang Accelerating The Broadband Revolution P M C - S I E R R.
Router Architectures An overview of router architectures.
1 Scheduling Crossbar Switches Who do we chose to traverse the switch in the next time slot? N N 11.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
Localized Asynchronous Packet Scheduling for Buffered Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York Stony Brook.
1 IP routers with memory that runs slower than the line rate Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford.
Load Balanced Birkhoff-von Neumann Switches
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
1 Copyright © Monash University ATM Switch Design Philip Branch Centre for Telecommunications and Information Engineering (CTIE) Monash University
Summary of switching theory Balaji Prabhakar Stanford University.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
Routers. These high-end, carrier-grade 7600 models process up to 30 million packets per second (pps).
Applied research laboratory 1 Scaling Internet Routers Using Optics Isaac Keslassy, et al. Proceedings of SIGCOMM Slides:
1 Multicasting in a Class of Multicast-Capable WDM Networks From: Y. Wang and Y. Yang, Journal of Lightwave Technology, vol. 20, No. 3, Mar From:
ISLIP Switch Scheduler Ali Mohammad Zareh Bidoki April 2002.
Packet Forwarding. A router has several input/output lines. From an input line, it receives a packet. It will check the header of the packet to determine.
Nick McKeown1 Building Fast Packet Buffers From Slow Memory CIS Roundtable May 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
1 Performance Guarantees for Internet Routers ISL Affiliates Meeting April 4 th 2002 Nick McKeown Professor of Electrical Engineering and Computer Science,
Winter 2006EE384x1 EE384x: Packet Switch Architectures I a) Delay Guarantees with Parallel Shared Memory b) Summary of Deterministic Analysis Nick McKeown.
Belgrade University Aleksandra Smiljanić: High-Capacity Switching Switches with Input Buffers (Cisco)
Circuit Switching Circuit switching networks,
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Chapter 6: Graphs 6.1 Euler Circuits
Based on An Engineering Approach to Computer Networking/ Keshav
Buffered Crossbars With Performance Guarantees Shang-Tse (Da) Chuang Cisco Systems EE384Y Thursday, April 27, 2006.
SNRC Meeting June 7 th, Crossbar Switch Scheduling Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
1 A quick tutorial on IP Router design Optics and Routing Seminar October 10 th, 2000 Nick McKeown
1 How scalable is the capacity of (electronic) IP routers? Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
Packet Switch Architectures The following are (sometimes modified and rearranged slides) from an ACM Sigcomm 99 Tutorial by Nick McKeown and Balaji Prabhakar,
1 Kyung Hee University Chapter 8 Switching. 2 Kyung Hee University Switching  Switching  Switches are devices capable of creating temporary connections.
The Fork-Join Router Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
Graphs. Representations of graphs : undirected graph An undirected graph G have five vertices and seven edges An adjacency-list representation of G The.
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Packet Forwarding.
Addressing: Router Design
Packet Switching (basics)
Packet Scheduling/Arbitration in Virtual Output Queues and Others
Indirect Networks or Dynamic Networks
Scheduling Crossbar Switches
Write about the funding Sundar Iyer, Amr Awadallah, Nick McKeown
Techniques and problems for
Design Principles of Scalable Switching Networks
Presentation transcript:

EE384y EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University

EE384y Outline Up until now, we have focused on high performance packet switches with: 1. A crossbar switching fabric, 2. Input queues (and possibly output queues as well), 3. Virtual output queues, and 4. Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?

EE384y Crossbar switch Limiting factors 1. N 2 crosspoints per chip, or N x N -to-1 multiplexors 2. It’s not obvious how to build a crossbar from multiple chips, 3. Capacity of “I/O”s per chip.  State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip.  About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup.  Crossbar chips today are limited by “I/O” capacity.

EE384y Scaling number of outputs: Trying to build a crossbar from multiple chips 4 inputs 4 outputs Building Block: 16x16 crossbar switch: Eight inputs and eight outputs required!

EE384y Scaling line-rate: Bit-sliced parallelism Linecard Cell Scheduler Cell is “striped” across multiple identical planes. Crossbar switched “bus”. Scheduler makes same decision for all slices k

EE384y Scaling line-rate: Time-sliced parallelism Linecard Scheduler Cell carried by one plane; takes k cell times. Scheduler is unchanged. Scheduler makes decision for each slice in turn k Cell

EE384y Scaling a crossbar  Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem).  What if we want to increase the number of ports?  Can we build a crossbar-equivalent from multiple stages of smaller crossbars?  If so, what properties should it have?

EE384y stage Clos Network n x kn x k m x mm x m k x nk x n 1 N N = n x m k >= n 1 2 … m 1 2 … … … k 1 2 … m 1 N nn

EE384y With k = n, is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)

EE384y With k = n is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?

EE384y With k = n a Clos network is rearrangeably non-blocking Routing matches is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D -degree bipartite graph can be colored in D colors. Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking (and can therefore perform any permutation).

EE384y How complex is the rearrangement?  Method 1: Find a maximum size bipartite matching for each of D colors in turn, O( DN 2. 5 ).  Method 2: Partition graph into Euler sets, O( N.logD ) [Cole et al. ‘00]

EE384y Edge-Coloring using Euler sets  Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O ( E )].  For D=2 i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).

EE384y Euler partition of a graph Euler partiton of graph G : 1.Each odd degree vertex is at the end of one open path. 2.Each even degree vertex is at the end of no open path.

EE384y Euler split of a graph Euler split of G into G 1 and G 2 : 1.Scan each path in an Euler partition. 2.Place each alternate edge into G 1 and G 2 G G1G1 G2G2

EE384y Edge-Coloring using Euler sets  Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O ( E )].  For D=2 i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).

EE384y Implementation Scheduler Route connections Route connections Request graph PermutationPaths

EE384y Implementation Pros  A rearrangeably non-blocking switch can perform any permutation  A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons  Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?

EE384y Strictly non-blocking Clos Network Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.

EE384y I1I1 I2I2 … ImIm O1O1 O2O2 … OmOm M1M1 M2M2 … … … MkMk n x k m x m k x n 1 N N = n x m k >= n 1 N nn

EE384y Clos Theorem IaIa ObOb x x + n 1 n k 1 n k 1.Consider adding the n -th connection between 1 st stage I a and 3 rd stage O b. 2.We need to ensure that there is always some center-stage M available. 3.If k > (n – 1) + (n – 1), then there is always an M available. i.e. we need k >= 2n – 1. n – 1 already in use at input and output.

EE384y Scaling Crossbars: Summary  Scaling capacity through parallelism (bit- slicing and time-slicing) is straightforward.  Scaling number of ports is harder…  Clos network:  Rearrangeably non-blocking with k = n, but routing is complicated,  Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.