Download presentation
Presentation is loading. Please wait.
Published byArthur Clarke Modified over 6 years ago
1
EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University
2
Outline Up until now, we have focused on high performance packet switches with: A crossbar switching fabric, Input queues (and possibly output queues as well), Virtual output queues, and Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?
3
Crossbar switch Limiting factors
N2 crosspoints per chip, or N x N-to-1 multiplexors It’s not obvious how to build a crossbar from multiple chips, Capacity of “I/O”s per chip. State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. Crossbar chips today are limited by “I/O” capacity.
4
Eight inputs and eight outputs required!
Scaling number of outputs: Trying to build a crossbar from multiple chips 16x16 crossbar switch: Building Block: 4 inputs 4 outputs Eight inputs and eight outputs required!
5
Scaling line-rate: Bit-sliced parallelism
k Cell is “striped” across multiple identical planes. Crossbar switched “bus”. Scheduler makes same decision for all slices. Linecard 8 7 6 5 Cell Cell Cell 4 3 2 1 Scheduler
6
Scaling line-rate: Time-sliced parallelism
k Cell carried by one plane; takes k cell times. Scheduler is unchanged. Scheduler makes decision for each slice in turn. Linecard Cell 8 7 6 5 4 Cell 3 Cell 2 Cell 1 Cell Cell Scheduler
7
Scaling a crossbar Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). What if we want to increase the number of ports? Can we build a crossbar-equivalent from multiple stages of smaller crossbars? If so, what properties should it have?
8
3-stage Clos Network N = n x m k >= n m x m n x k k x n 1 1 1 n 1 2
… 2 … … … N m … m N N = n x m k >= n k
9
With k = n, is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)
10
With k = n is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?
11
With k = n a Clos network is rearrangeably non-blocking
Routing matches is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D-degree bipartite graph can be colored in D colors. Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking (and can therefore perform any permutation).
12
How complex is the rearrangement?
Method 1: Find a maximum size bipartite matching for each of D colors in turn, O(DN2.5). Method 2: Partition graph into Euler sets, O(N.logD) [Cole et al. ‘00]
13
Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
14
Euler partition of a graph
Euler partiton of graph G: Each odd degree vertex is at the end of one open path. Each even degree vertex is at the end of no open path.
15
Euler split of a graph G G1 G2 Euler split of G into G1 and G2:
Scan each path in an Euler partition. Place each alternate edge into G1 and G2
16
Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).
17
Implementation Route Scheduler connections Request graph Permutation
Paths
18
Can we eliminate the need to rearrange?
Implementation Pros A rearrangeably non-blocking switch can perform any permutation A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?
19
Strictly non-blocking Clos Network
Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.
20
M1 I1 M2 O1 I2 … O2 … … … Im … Om N = n x m k >= n Mk m x m n x k
k x n 1 1 I1 M2 O1 n n I2 … O2 … … … Im … Om N N N = n x m k >= n Mk
21
n – 1 already in use at input and output.
Clos Theorem x 1 n k 1 n k Ia n – 1 already in use at input and output. Ob x + n Consider adding the n-th connection between 1st stage Ia and 3rd stage Ob. We need to ensure that there is always some center-stage M available. If k > (n – 1) + (n – 1) , then there is always an M available. i.e. we need k >= 2n – 1.
22
Scaling Crossbars: Summary
Scaling capacity through parallelism (bit-slicing and time-slicing) is straightforward. Scaling number of ports is harder… Clos network: Rearrangeably non-blocking with k = n, but routing is complicated, Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.