Download presentation
Presentation is loading. Please wait.
Published byJanis Reeves Modified over 9 years ago
1
EE384y 2004 1 EE384Y: Packet Switch Architectures Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm
2
EE384y 2004 2 Outline Up until now, we have focused on high performance packet switches with: 1. A crossbar switching fabric, 2. Input queues (and possibly output queues as well), 3. Virtual output queues, and 4. Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?
3
EE384y 2004 3 Crossbar switch Limiting factors 1. N 2 crosspoints per chip, or N x N -to-1 multiplexors 2. It’s not obvious how to build a crossbar from multiple chips, 3. Capacity of “I/O”s per chip. State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. Crossbar chips today are limited by “I/O” capacity.
4
EE384y 2004 4 Scaling number of outputs: Trying to build a crossbar from multiple chips 4 inputs 4 outputs Building Block: 16x16 crossbar switch: Eight inputs and eight outputs required!
5
EE384y 2004 5 Scaling line-rate: Bit-sliced parallelism Linecard Cell Scheduler Cell is “striped” across multiple identical planes. Crossbar switched “bus”. Scheduler makes same decision for all slices. 1 2 3 4 5 6 7 8 k
6
EE384y 2004 6 Scaling line-rate: Time-sliced parallelism Linecard Scheduler Cell carried by one plane; takes k cell times. Scheduler is unchanged. Scheduler makes decision for each slice in turn. 1 2 3 4 5 6 7 8 k Cell
7
EE384y 2004 7 Scaling a crossbar Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). What if we want to increase the number of ports? Can we build a crossbar-equivalent from multiple stages of smaller crossbars? If so, what properties should it have?
8
EE384y 2004 8 3-stage Clos Network n x kn x k m x mm x m k x nk x n 1 N N = n x m k >= n 1 2 … m 1 2 … … … k 1 2 … m 1 N nn
9
EE384y 2004 9 With k = n, is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)
10
EE384y 2004 10 With k = n is a Clos network non- blocking like a crossbar? Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?
11
EE384y 2004 11 With k = n a Clos network is rearrangeably non-blocking Routing matches is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D -degree bipartite graph can be colored in D colors. Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking (and can therefore perform any permutation).
12
EE384y 2004 12 How complex is the rearrangement? Method 1: Find a maximum size bipartite matching for each of D colors in turn, O( DN 2. 5 ). Method 2: Partition graph into Euler sets, O( N.logD ) [Cole et al. ‘00]
13
EE384y 2004 13 Edge-Coloring using Euler sets Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O ( E )]. For D=2 i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
14
EE384y 2004 14 Euler partition of a graph Euler partiton of graph G : 1.Each odd degree vertex is at the end of one open path. 2.Each even degree vertex is at the end of no open path.
15
EE384y 2004 15 Euler split of a graph Euler split of G into G 1 and G 2 : 1.Scan each path in an Euler partition. 2.Place each alternate edge into G 1 and G 2 G G1G1 G2G2
16
EE384y 2004 16 Edge-Coloring using Euler sets Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O ( E )]. For D=2 i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
17
EE384y 2004 17 Implementation Scheduler Route connections Route connections Request graph PermutationPaths
18
EE384y 2004 18 Implementation Pros A rearrangeably non-blocking switch can perform any permutation A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?
19
EE384y 2004 19 Strictly non-blocking Clos Network Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.
20
EE384y 2004 20 I1I1 I2I2 … ImIm O1O1 O2O2 … OmOm M1M1 M2M2 … … … MkMk n x k m x m k x n 1 N N = n x m k >= n 1 N nn
21
EE384y 2004 21 Clos Theorem IaIa ObOb x x + n 1 n k 1 n k 1.Consider adding the n -th connection between 1 st stage I a and 3 rd stage O b. 2.We need to ensure that there is always some center-stage M available. 3.If k > (n – 1) + (n – 1), then there is always an M available. i.e. we need k >= 2n – 1. n – 1 already in use at input and output.
22
EE384y 2004 22 Scaling Crossbars: Summary Scaling capacity through parallelism (bit- slicing and time-slicing) is straightforward. Scaling number of ports is harder… Clos network: Rearrangeably non-blocking with k = n, but routing is complicated, Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.