Interconnection Network Topology Design Trade-offs

Slides:



Advertisements
Similar presentations
Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented.
Advertisements

Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 10 th, 2010 John Kubiatowicz Electrical Engineering and Computer Sciences.
EECC756 - Shaaban #1 lec # 9 Spring Network Definitions A network is a graph V = {switches and nodes} connected by communication channels.
CS 258 Parallel Computer Architecture Lecture 4 Network Topology and Routing February 4, 2008 Prof John D. Kubiatowicz
CS252 Graduate Computer Architecture Lecture 21 Multiprocessor Networks (con’t) John Kubiatowicz Electrical Engineering and Computer Sciences University.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Generic Multiprocessor Architecture
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks (con’t) March 15 th, 2010 John Kubiatowicz Electrical Engineering and Computer.
ECE669 L25: Final Exam Review May 6, 2004 ECE 669 Parallel Computer Architecture Lecture 25 Final Exam Review.
EECC756 - Shaaban #1 lec # 8 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
Interconnection Network Topologies
CS 258 Parallel Computer Architecture Lecture 3 Introduction to Scalable Interconnection Network Design January 30, 2008 Prof John D. Kubiatowicz
EECS 570: Fall rev1 1 Chapter 10: Scalable Interconnection Networks.
EECC756 - Shaaban #1 lec # 9 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
CS252/Patterson Lec /28/01 CS162 Computer Architecture Lecture 16: Multiprocessor 2: Directory Protocol, Interconnection Networks.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
EECC756 - Shaaban #1 lec # 10 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 9 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
John Kubiatowicz Electrical Engineering and Computer Sciences
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Topologies.
Interconnect Network Topologies
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 14 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 12 th, 2012 John Kubiatowicz Electrical Engineering and Computer Sciences.
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
Interconnect Networks
CS252 Graduate Computer Architecture Lecture 14 Multiprocessor Networks March 7 th, 2012 John Kubiatowicz Electrical Engineering and Computer Sciences.
CS668- Lecture 2 - Sept. 30 Today’s topics Parallel Architectures (Chapter 2) Memory Hierarchy Busses and Switched Networks Interconnection Network Topologies.
Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Scalable Interconnection Networks. 2 Scalable, High Performance Network At Core of Parallel Computer Architecture Requirements and trade-offs at many.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Shared versus Switched Media.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
Interconnect Networks
Lecture 23: Interconnection Networks
Multiprocessor Interconnection Networks Todd C
Interconnection topologies
Prof John D. Kubiatowicz
Static and Dynamic Networks
Interconnection Network Routing, Topology Design Trade-offs
John Kubiatowicz Electrical Engineering and Computer Sciences
Introduction to Scalable Interconnection Network Design
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
Static Interconnection Networks
Interconnection Networks Contd.
Static Interconnection Networks
Presentation transcript:

Interconnection Network Topology Design Trade-offs CS 258, Spring 99 David E. Culler Computer Science Division U.C. Berkeley CS258 S99

Real Machines Wide links, smaller routing delay Tremendous variation 3/19/99 CS258 S99

Interconnection Topologies Class networks scaling with N Logical Properties: distance, degree Physcial properties length, width Fully connected network diameter = 1 degree = N cost? bus => O(N), but BW is O(1) - actually worse crossbar => O(N2) for BW O(N) VLSI technology determines switch degree 3/19/99 CS258 S99

Linear Arrays and Rings Diameter? Average Distance? Bisection bandwidth? Route A -> B given by relative address R = B-A Torus? Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1 3/19/99 CS258 S99

Multidimensional Meshes and Tori 3D Cube 2D Grid d-dimensional array n = kd-1 X ...X kO nodes described by d-vector of coordinates (id-1, ..., iO) d-dimensional k-ary mesh: N = kd k = dÖN described by d-vector of radix k coordinate d-dimensional k-ary torus (or k-ary d-cube)? 3/19/99 CS258 S99

Properties Routing Average Distance Wire Length? Degree? relative distance: R = (b d-1 - a d-1, ... , b0 - a0 ) traverse ri = b i - a i hops in each dimension dimension-order routing Average Distance Wire Length? d x 2k/3 for mesh dk/2 for cube Degree? Bisection bandwidth? Partitioning? k d-1 bidirectional links Physical layout? 2D in O(N) space Short wires higher dimension? 3/19/99 CS258 S99

Real World 2D mesh 1824 node Paragon: 16 x 114 array 3/19/99 CS258 S99

Embeddings in two dimensions 6 x 3 x 2 Embed multiple logical dimension in one physical dimension using long wires 3/19/99 CS258 S99

Trees Diameter and ave distance logarithmic Fixed degree k-ary tree, height d = logk N address specified d-vector of radix k coordinates describing path down from root Fixed degree Route up to common ancestor and down R = B xor A let i be position of most significant 1 in R, route up i+1 levels down in direction given by low i+1 bits of B H-tree space is O(N) with O(ÖN) long wires Bisection BW? 3/19/99 CS258 S99

Fat-Trees Fatter links (really more of them) as you go up, so bisection BW scales with N 3/19/99 CS258 S99

Butterflies Tree with lots of roots! N log N (actually N/2 x logN) building block 16 node butterfly Tree with lots of roots! N log N (actually N/2 x logN) Exactly one route from any source to any dest R = A xor B, at level i use ‘straight’ edge if ri=0, otherwise cross edge Bisection N/2 vs n (d-1)/d 3/19/99 CS258 S99

k-ary d-cubes vs d-ary k-flies degree d N switches vs N log N switches diminishing BW per node vs constant requires locality vs little benefit to locality Can you route all permutations? 3/19/99 CS258 S99

Benes network and Fat Tree Back-to-back butterfly can route all permutations off line What if you just pick a random mid point? 3/19/99 CS258 S99

Hypercubes Also called binary n-cubes. # of nodes = N = 2n. O(logN) Hops Good bisection BW Complexity Out degree is n = logN correct dimensions in order with random comm. 2 ports per processor 0-D 1-D 2-D 3-D 4-D 5-D ! 3/19/99 CS258 S99

Relationship BttrFlies to Hypercubes Wiring is isomorphic Except that Butterfly always takes log n steps 3/19/99 CS258 S99

Toplology Summary All have some “bad permutations” Topology Degree Diameter Ave Dist Bisection D (D ave) @ P=1024 1D Array 2 N-1 N / 3 1 huge 1D Ring 2 N/2 N/4 2 2D Mesh 4 2 (N1/2 - 1) 2/3 N1/2 N1/2 63 (21) 2D Torus 4 N1/2 1/2 N1/2 2N1/2 32 (16) k-ary n-cube 2n nk/2 nk/4 nk/4 15 (7.5) @n=3 Hypercube n =log N n n/2 N/2 10 (5) All have some “bad permutations” many popular permutations are very bad for meshs (transpose) ramdomness in wiring or routing makes it hard to find a bad one! 3/19/99 CS258 S99

How Many Dimensions? n = 2 or n = 3 n >= 4 Short wires, easy to build Many hops, low bisection bandwidth Requires traffic locality n >= 4 Harder to build, more wires, longer average length Fewer hops, better bisection bandwidth Can handle non-local traffic k-ary d-cubes provide a consistent framework for comparison N = kd scale dimension (d) or nodes per dimension (k) assume cut-through 3/19/99 CS258 S99

Traditional Scaling: Latency(P) Assumes equal channel width independent of node count or dimension dominated by average distance 3/19/99 CS258 S99

Average Distance but, equal channel width is not equal cost! ave dist = d (k-1)/2 but, equal channel width is not equal cost! Higher dimension => more channels 3/19/99 CS258 S99

In the 3D world For n nodes, bisection area is O(n2/3 ) For large n, bisection bandwidth is limited to O(n2/3 ) Bill Dally, IEEE TPDS, [Dal90a] For fixed bisection bandwidth, low-dimensional k-ary n-cubes are better (otherwise higher is better) i.e., a few short fat wires are better than many long thin wires What about many long fat wires? 3/19/99 CS258 S99

Equal cost in k-ary n-cubes Equal number of nodes? Equal number of pins/wires? Equal bisection bandwidth? Equal area? Equal wire length? What do we know? switch degree: d diameter = d(k-1) total links = Nd pins per node = 2wd bisection = kd-1 = N/k links in each directions 2Nw/k wires cross the middle 3/19/99 CS258 S99

Latency(d) for P with Equal Width total links(N) = Nd 3/19/99 CS258 S99

Latency with Equal Pin Count Baseline d=2, has w = 32 (128 wires per node) fix 2dw pins => w(d) = 64/d distance up with d, but channel time down 3/19/99 CS258 S99

Latency with Equal Bisection Width N-node hypercube has N bisection links 2d torus has 2N 1/2 Fixed bisection => w(d) = N 1/d / 2 = k/2 1 M nodes, d=2 has w=512! 3/19/99 CS258 S99

Larger Routing Delay (w/ equal pin) Dally’s conclusions strongly influenced by assumption of small routing delay 3/19/99 CS258 S99

Latency under Contention Optimal packet size? Channel utilization? 3/19/99 CS258 S99

Saturation Fatter links shorten queuing delays 3/19/99 CS258 S99

Phits per cycle higher degree network has larger available bandwidth cost? 3/19/99 CS258 S99

Discussion Rich set of topological alternatives with deep relationships Design point depends heavily on cost model nodes, pins, area, ... Wire length or wire delay metrics favor small dimension Long (pipelined) links increase optimal dimension Need a consistent framework and analysis to separate opinion from design Optimal point changes with technology 3/19/99 CS258 S99