Download presentation
Presentation is loading. Please wait.
1
EECS 570: Fall 2003 -- rev1 1 Chapter 10: Scalable Interconnection Networks
2
EECS 570: Fall 2003 -- rev1 2 Goals Low latency –to neighbors –to average/furthest node High bandwidth –per-node and aggregate –bisection bandwidth.. Low cost –switch complexity, pin count –wiring cost (connectors!) Scalability
3
EECS 570: Fall 2003 -- rev1 3 Requirements from Above Communication-to-computation ratio implies bandwidth needs –local or global? –regular or irregular (arbitrary)? –bursty or uniform? –broadcasts? multicasts? Programming Model –protocol structure –transfer size –importance of latency vs. bandwidth
4
EECS 570: Fall 2003 -- rev1 4 Basic Definitions switch is a device capable of transferring data from input ports to output ports in an arbitrary pattern network is a graph V={ switches/nodes} connected by communication channels (aka links) C V x V direct – node at each switch indirect – node only on edge (like internet) route: sequence of links message follows from node A to node B
5
EECS 570: Fall 2003 -- rev1 5 What characterizes a network? Topology –interconnection structure of the graph Switching Strategy –circuit vs. packet switching –store-and-forward vs. cut-through –virtual cut-through vs. wormhole, etc. Routing Algorithm –how is route determined Control Strategy –centralized vs. distributed Flow Control Mechanism –when does a packet (or a portion of it) move along its route –can't send two packets down same link at same time
6
EECS 570: Fall 2003 -- rev1 6 Topologies Topology determines many critical parameters: –degree: number of input (output) ports per switch –distance: number of links in route from A to B –average distance: average over all (A,B) pairs –diameter: max distance between two nodes (using shortest paths) –bisection: min number of links to separate into two halves
7
EECS 570: Fall 2003 -- rev1 7 Bus Fully connected network topology –bus plus interface logic is a form of switch Parameters: –diameter = average distance = 1 –degree = N –switch cost O(N) –wire cost constant –bisection bandwidth constant (or worse) Broadcast is free
8
EECS 570: Fall 2003 -- rev1 8 Crossbar Another fully connected network topology –one big switch of degree N connects all nodes Parameters: –diameter = average distance = 0(1) –degree=N –switch cost O(N 2 ) –wire cost 2N –bisection bandwidth O(N) Most switches in other topologies are crossbars inside
9
EECS 570: Fall 2003 -- rev1 9 How to build a crossbar
10
EECS 570: Fall 2003 -- rev1 10 Switches Input Buffer Output Buffer Cross-bar Control Routing, Scheduling Receiver Transmitter Output Ports Input Ports
11
EECS 570: Fall 2003 -- rev1 11 Linear Arrays and Rings Linear Array –Diameter? N -1 –Avg. Distance? N/3 –Bisection? 1 Torus (ring): links may be unidirectional or bidirectional Examples: FDDI, SCI, FiberChannel, KSR1
12
EECS 570: Fall 2003 -- rev1 12 Multidimensional Meshes and Tori d-dimensional k-ary mesh N = k d k = d √N –nodes in each of d dimensions –torus has wraparound, array doesn't; “mesh” ambiguous general d-dimensional mesh –n = k d-1 x...x k 0 nodes
13
EECS 570: Fall 2003 -- rev1 13 Properties (N=k d,k= d √N) Diameter? –Average Distance? –d x k/3 for mesh Degree Switch Cost? Wire Cost? Bisection? –k d+1
14
EECS 570: Fall 2003 -- rev1 14 Hypercubes
15
EECS 570: Fall 2003 -- rev1 15 Trees Usually indirect, occasionally direct Diameter and avg. distance are logarithmic –k-ary tree, height d = log k N Fixed degree Route up to common ancestor and down –benefit for local traffic Bisection?
16
EECS 570: Fall 2003 -- rev1 16 Fat-Trees Fatter links (really more of them) as you go up, so bisection BW scales with N
17
EECS 570: Fall 2003 -- rev1 17 Butterflies Tree with lots of roots ! N log N (actually N/2 x logN) Exactly one route from any source to any dest R = A xor B, at level i use “straight” edge if r i =o, otherwise cross edge Bisection N/2 vs N (d-1)/d
18
EECS 570: Fall 2003 -- rev1 18 Benes Networks and Fat Trees Back-to-back butterfly can route all permutations –off line What if you just pick a random mid point?
19
EECS 570: Fall 2003 -- rev1 19 Relationship of Butterflies to Hypercubes Wiring is isomorphic Except that Butterfly always lakes log n steps
20
EECS 570: Fall 2003 -- rev1 20 Summary of Topologies Topology Degree Diameter Ave Dist Bisection Diam/Ave Dist N=1024 1DArray 2 N-1 N/3 1 1023/341 1D Ring 2 N/2 N/4 2 512/2 2DArray 4 2(N ½ -1) ⅔N ½ N ½ 63/21 3D Array 6 3(N ⅓ -1) N ⅓ N ⅔ -30/-10 2DTorus 4 N ½ ½N2N ½ 32/16 k-ary n-cube2n nk/2 nk/4nk/4 15/7.5 @ n=3 Hypercube n=logNn n/2 N/2 10/5 2DTree 3 2log 2 N -2Iog 2 N 1 20/-20 4DTree 5 2log 4 N -2Iog 4 N-2/31 10/-9 2D fat tree 4 log 2 N -2Iog 2 N N 20/-20 2D butterfly 4 log 2 N log 2 N N/2 20/20
21
EECS 570: Fall 2003 -- rev1 21 Choosing a Topology Cost vs. performance For fixed cost which topology provides best performance? best performance on what workload? –message size –traffic pattern define cost target machine size Simplify tradeoff to dimension vs. radix restrict to k-ary d-cubes what is best dimension?
22
EECS 570: Fall 2003 -- rev1 22 How Many Dimensions in a Network? d=2 or d=3 Short wires, easy to build Many hops, low bisection bandwidth Benefits from traffic locality d>=4 Harder to build, more wires, longer average length Higher switch degree Fewer hops, better bisection bandwidth Handles non-local traffic better Effect of # of hops on latency depends on switching strategy...
23
EECS 570: Fall 2003 -- rev1 23 Store&Forward vs. Cut-Through Routing messages typ. fragmented into packets & pipelined cut-through pipelines on flits
24
EECS 570: Fall 2003 -- rev1 24 Handling Output Contention What if output is blocked? virtual cut-through –switch w/blocked output buffers entire packet –degenerates to: –requires lots of buffering in switches wormhole –leave flits strung out over network (In buffers) –minimal switch buffering –one blocked packet can tie up lots of channels
25
EECS 570: Fall 2003 -- rev1 25 Traditional Scaling: Latency(P) Assumes equal channel width independent of node count or dimension dominated by average distance
26
EECS 570: Fall 2003 -- rev1 26 Average Distance but equal channel width is not equal cost! Higher dimension => more channels average distance = d(k-1)/2
27
EECS 570: Fall 2003 -- rev1 27 In the 3-D world For large n, bisection bandwidth is limited to O(n 2/3 ) Dally, IEEE TPDS, [Dal90a] For fixed bisection bandwidth, low-dimensional k-ary n-cubes are better (otherwise higher is better) i.e., a few short fat wires are better than many long thin wires What about many long fat wires? For n nodes, bisection area is O(n2/3 )
28
EECS 570: Fall 2003 -- rev1 28 Equal cost in k-ary n-cubes Equal number of nodes? Equal number of pins/wires? Equal bisection bandwidth? Equal area? Equal wire length? What do we know? switch degree: d diameter = d(k-1) total links = Nd pins per node = 2wd bisection = k d-1 = N/k links in each direction 2Nw/k wires cross the middle
29
EECS 570: Fall 2003 -- rev1 29 Latency(d) for P with Equal Width total links(N)=Nd
30
EECS 570: Fall 2003 -- rev1 30 Latency with Equal Pin Count Baseline d=2, has w = 32 (128 wires per node) fix 2dw pins => w(d) = 64/d distance down with increasing d, but channel time up
31
EECS 570: Fall 2003 -- rev1 31 Latency with Equal Bisection Width N-node hypercube has N bisection links 2d torus has 2N ½ Fixed bisection => w(d) = N 1/d /2= k/2 1 M nodes, d=2 has w=512!
32
EECS 570: Fall 2003 -- rev1 32 Larger Routing Delay (w/equal pin) ratio of routing to channel time is key
33
EECS 570: Fall 2003 -- rev1 33 Topology Summary Rich set of topological alternatives with deep relationships Design point depends heavily on cost model nodes, pins, area, … Need for downward scalability lends to fix dimension –high-degree switches wasted in small configuration –grow machine by increasing nodes per dimension Need a consistent framework and analysis to separate opinion from design Optimal point changes with technology store-and-forward vs. cut-through non-pipelined vs. pipelined signaling
34
EECS 570: Fall 2003 -- rev1 34 Real Machines Wide links, smaller routing delay Tremendous variation
35
EECS 570: Fall 2003 -- rev1 35 What characterizes a network? Topology interconnection structure of the graph Switching Strategy circuit vs. packet switching store-and-forward vs. cut-through virtual cut-through vs. wormhole, etc, Routing Algorithm how is route determined Control Strategy centralized vs. distributed Flow Control Mechanism when does a packet (or a portion of it) move along its route can't send two packets down same link at same time
36
EECS 570: Fall 2003 -- rev1 36 Typical Packet Format Two basic mechanisms for abstraction encapsulation fragmentation
37
EECS 570: Fall 2003 -- rev1 37 Routing Recall: routing algorithm determines which of the possible paths are used as routes how the route is determined R: N x N → C, which at each switch maps the destination node n d to the next channel on the route Issues: Routing mechanism –arithmetic –source-based port select –table driven –general computation Properties of the routes Deadlock free
38
EECS 570: Fall 2003 -- rev1 38 Routing Mechanism need to select output port for each input packet in a few cycles Simple arithmetic in regular topologies ex: ∆x, ∆y routing in a grid –west(-x) ∆x<0 –east (+x) ∆x>0 –south(-y) ∆x=0, ∆y<0 –north(+y) ∆x=0, ∆y>0 –processor ∆x=0, ∆y=0 Reduce relative address of each dimension in order Dimension-order routing in k-ary d-cubes e-cube routing in n-cube
39
EECS 570: Fall 2003 -- rev1 39 Routing Mechanism (cont) Source-based message header carries series of port selects used and stripped en route CRC? header length … CS-2, Myrinet, MIT Artic Table-driven message header carried index for next port at next switch –o = R[i] table also gives index for following hop –, I’ = R[i] ATM, HPPI P3P3 P2P2 P1P1 P0P0
40
EECS 570: Fall 2003 -- rev1 40 Properties of Routing Algorithms Deterministic route determined by (source, dest), not intermediate state (i.e. traffic) Adaptive route influenced by traffic along the way Minimal only selects shortest paths Deadlock free no traffic pattern can lead to a situation where no packets move forward
41
EECS 570: Fall 2003 -- rev1 41 Deadlock Freedom How can it arise? necessary conditions: –shared resource –incrementally allocated –non-pre-emptible think of a channel as a shared that is acquired incrementally –source buffer then dest. buffer –channels along a route How do you avoid it? constrain how channel resources are allocated ex: dimension order How do you prove that a routing algorithm is deadlock free?
42
EECS 570: Fall 2003 -- rev1 42 Proof Technique Resources are logically associated with channels Messages introduce dependences between resources as they move forward Need to articulate possible dependences between channels Show that there are no cycles in Channel Dependence Graph find a numbering of channel resources such that every legal route follows a monotonic sequence => no traffic pattern can lead to deadlock Network need not be acyclic, only channel dependence graph
43
EECS 570: Fall 2003 -- rev1 43 Example: k-ary 2D array Theorem: dimension-order x,y routing is deadlock free Numbering +x channel (i,y) → (i+1,y) gets i similarly for -x with 0 as most positive edge +y channel (x,j) -> (x,j+ I) gets N+j similarly for -y channels Any routing sequence' x direction, turn, y direction is increasing
44
EECS 570: Fall 2003 -- rev1 44 Channel Dependence Graph
45
EECS 570: Fall 2003 -- rev1 45 More examples Why is the obvious routing on X deadlock free?. butterfly? tree? fat tree? Any assumptions about routing mechanism? amount of buffering? What about wormhole routing on a ring?
46
EECS 570: Fall 2003 -- rev1 46 Deadlock free wormhole networks? Basic dimension-order routing doesn't work for k-ary d-cubes only for k-ary d-arrays (bi-directional, no wrap-around) Idea: add channels! provide multiple “virtual channels” to break dependence cycle good for BW too! Don't need to add links, or x bar, only buffer resources This adds nodes to the CDG, remove edges?
47
EECS 570: Fall 2003 -- rev1 47 Breaking deadlock with virtual channels
48
EECS 570: Fall 2003 -- rev1 48 Turn Restrictions in X, Y XY routing forbids 4 of 8 turns and leaves no room for adaptive routing Can you allow more turns and still be deadlock free
49
EECS 570: Fall 2003 -- rev1 49 Minimal turn restrictions in 2D north-lastnegative first -x +x -y +y
50
EECS 570: Fall 2003 -- rev1 50 Example legal west-first routes Can route around failures or congestion Can combine turn restrictions with virtual channels
51
EECS 570: Fall 2003 -- rev1 51 Adaptive Routing R: C x N x ∑ → C Essential for fault tolerance at least multipath Can improve utilization of the network Simple deterministic algorithms easily run into bad permutations Fully/partially adaptive, minimal/non-minimal Can introduce complexity or anomalies Little adaptation goes a long way!
52
EECS 570: Fall 2003 -- rev1 52 Contention Two packets trying to use the same link at same time limited buffering drop? Most parallel machine networks block in place link-level flow control tree saturation Closed system - offered load depends on delivered
53
EECS 570: Fall 2003 -- rev1 53 Flow Control What do you do when push comes to shove? ethernet collision detection and retry after delay TCPIW AN: buffer, drop, adjust rate any solution must adjust to output rate Link-level flow control
54
EECS 570: Fall 2003 -- rev1 54 Example: T3D 3D bidirectional torus, dimension order (NIC selected), virtual cut-through, packet sw 16 bit x 150 MHz, short, wide, synch rotating priority per output logically separate request/response 3 independent, stacked switches 8 16-bit flits on each of 4 VC in each directions
55
EECS 570: Fall 2003 -- rev1 55 Routing and Switch Design Summary Routing Algorithms restrict the set of routes within the topology simple mechanism selects turn at each hop arithmetic, selection, lookup Deadlock-free if channel dependence graph is acyclic limit turns to eliminate dependences add separate channel resources to break dependences combination of topology, algorithm, and switch design Deterministic vs adaptive routing Switch design issues input/output/pooled buffering, routing logic, selection logic Flow control Real networks are a “package” of design choices
56
EECS 570: Fall 2003 -- rev1 56 Example: SP 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock packet sw, cut-through, no virtual channel, source-based routing variable packet <= 255 bytes, 31 byte FIFO per input, 7 bytes per output, 16 phit links 128 8-byte “chunks” in central queue, LRU per output
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.