Presentation is loading. Please wait.

Presentation is loading. Please wait.

EE384Y: Packet Switch Architectures Scaling Crossbar Switches

Similar presentations


Presentation on theme: "EE384Y: Packet Switch Architectures Scaling Crossbar Switches"— Presentation transcript:

1 EE384Y: Packet Switch Architectures Scaling Crossbar Switches
Part II Scaling Crossbar Switches Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford University

2 Outline Up until now, we have focused on high performance packet switches with: A crossbar switching fabric, Input queues (and possibly output queues as well), Virtual output queues, and Centralized arbitration/scheduling algorithm. Today we’ll talk about the implementation of the crossbar switch fabric itself. How are they built, how do they scale, and what limits their capacity?

3 Crossbar switch Limiting factors
N2 crosspoints per chip, or N x N-to-1 multiplexors It’s not obvious how to build a crossbar from multiple chips, Capacity of “I/O”s per chip. State of the art: About 300 pins each operating at 3.125Gb/s ~= 1Tb/s per chip. About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup. Crossbar chips today are limited by “I/O” capacity.

4 Eight inputs and eight outputs required!
Scaling number of outputs: Trying to build a crossbar from multiple chips 16x16 crossbar switch: Building Block: 4 inputs 4 outputs Eight inputs and eight outputs required!

5 Scaling line-rate: Bit-sliced parallelism
k Cell is “striped” across multiple identical planes. Crossbar switched “bus”. Scheduler makes same decision for all slices. Linecard 8 7 6 5 Cell Cell Cell 4 3 2 1 Scheduler

6 Scaling line-rate: Time-sliced parallelism
k Cell carried by one plane; takes k cell times. Scheduler is unchanged. Scheduler makes decision for each slice in turn. Linecard Cell 8 7 6 5 4 Cell 3 Cell 2 Cell 1 Cell Cell Scheduler

7 Scaling a crossbar Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem). What if we want to increase the number of ports? Can we build a crossbar-equivalent from multiple stages of smaller crossbars? If so, what properties should it have?

8 3-stage Clos Network N = n x m k >= n m x m n x k k x n 1 1 1 n 1 2
2 N m m N N = n x m k >= n k

9 With k = n, is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match (1,1), (2,4), (3,3), (4,2)

10 With k = n is a Clos network non-blocking like a crossbar?
Consider the example: scheduler chooses to match (1,1), (2,2), (4,4), (5,3), … By rearranging matches, the connections could be added. Q: Is this Clos network “rearrangeably non-blocking”?

11 With k = n a Clos network is rearrangeably non-blocking
Routing matches is equivalent to edge-coloring in a bipartite multigraph. Colors correspond to middle-stage switches. (1,1), (2,4), (3,3), (4,2) Each vertex corresponds to an n x k or k x n switch. No two edges at a vertex may be colored the same. Vizing ‘64: a D-degree bipartite graph can be colored in D colors. Therefore, if k = n, a 3-stage Clos network is rearrangeably non-blocking (and can therefore perform any permutation).

12 How complex is the rearrangement?
Method 1: Find a maximum size bipartite matching for each of D colors in turn, O(DN2.5). Method 2: Partition graph into Euler sets, O(N.logD) [Cole et al. ‘00]

13 Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).

14 Euler partition of a graph
Euler partiton of graph G: Each odd degree vertex is at the end of one open path. Each even degree vertex is at the end of no open path.

15 Euler split of a graph G G1 G2 Euler split of G into G1 and G2:
Scan each path in an Euler partition. Place each alternate edge into G1 and G2

16 Edge-Coloring using Euler sets
Make the graph regular: Modify the graph so that every vertex has the same degree, D. [combine vertices and add edges; O(E)]. For D=2i, perform i “Euler splits” and 1-color each resulting graph. This is logD operations, each of O(E).

17 Implementation Route Scheduler connections Request graph Permutation
Paths

18 Can we eliminate the need to rearrange?
Implementation Pros A rearrangeably non-blocking switch can perform any permutation A cell switch is time-slotted, so all connections are rearranged every time slot anyway Cons Rearrangement algorithms are complex (in addition to the scheduler) Can we eliminate the need to rearrange?

19 Strictly non-blocking Clos Network
Clos’ Theorem: If k >= 2n – 1, then a new connection can always be added without rearrangement.

20 M1 I1 M2 O1 I2 … O2 … … … Im … Om N = n x m k >= n Mk m x m n x k
k x n 1 1 I1 M2 O1 n n I2 O2 Im Om N N N = n x m k >= n Mk

21 n – 1 already in use at input and output.
Clos Theorem x 1 n k 1 n k Ia n – 1 already in use at input and output. Ob x + n Consider adding the n-th connection between 1st stage Ia and 3rd stage Ob. We need to ensure that there is always some center-stage M available. If k > (n – 1) + (n – 1) , then there is always an M available. i.e. we need k >= 2n – 1.

22 Scaling Crossbars: Summary
Scaling capacity through parallelism (bit-slicing and time-slicing) is straightforward. Scaling number of ports is harder… Clos network: Rearrangeably non-blocking with k = n, but routing is complicated, Strictly non-blocking with k >= 2n – 1, so routing is simple. But requires more bisection bandwidth.


Download ppt "EE384Y: Packet Switch Architectures Scaling Crossbar Switches"

Similar presentations


Ads by Google