Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.

Slides:



Advertisements
Similar presentations
Shantanu Dutt Univ. of Illinois at Chicago
Advertisements

Parallel Architectures: Topologies Heiko Schröder, 2003.
Parallel Architectures: Topologies Heiko Schröder, 2003.
1 Interconnection Networks Direct Indirect Shared Memory Distributed Memory (Message passing)
CSCI 8150 Advanced Computer Architecture Hwang, Chapter 2 Program and Network Properties 2.4 System Interconnect Architectures.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
Interconnection Networks 1 Interconnection Networks (Chapter 6) References: [1,Wilkenson and Allyn, Ch. 1] [2, Akl, Chapter 2] [3, Quinn, Chapter 2-3]
EECC756 - Shaaban #1 lec # 9 Spring Network Definitions A network is a graph V = {switches and nodes} connected by communication channels.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
Interconnection Network PRAM Model is too simple Physically, PEs communicate through the network (either buses or switching networks) Cost depends on network.
Generic Multiprocessor Architecture
7. Fault Tolerance Through Dynamic or Standby Redundancy 7.6 Reconfiguration in Multiprocessors Focused on permanent and transient faults detection. Three.
Interconnection Network Topologies
3. Interconnection Networks. Historical Perspective Early machines were: Collection of microprocessors. Communication was performed using bi-directional.
Interconnection Network Topology Design Trade-offs
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
EECC756 - Shaaban #1 lec # 10 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
© Sudhakar Yalamanchili, Georgia Institute of Technology (except as indicated) Topologies.
Switching, routing, and flow control in interconnection networks.
Interconnect Network Topologies
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 14 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Computer Science Department
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Department of Computer Science at Florida State LFTI: A Performance Metric for Assessing Interconnect topology and routing design Background ‒ Innovations.
PPC Spring Interconnection Networks1 CSCI-4320/6360: Parallel Programming & Computing (PPC) Interconnection Networks Prof. Chris Carothers Computer.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Switches and indirect networks Computer Architecture AMANO, Hideharu Textbook pp. 92~13 0.
Lecture 3 Innerconnection Networks for Parallel Computers
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Shanghai Jiao Tong University 2012 Indirect Networks or Dynamic Networks Guihai Chen …with major presentation contribution from José Flich, UPV (and Cell.
1 Lecture 13: LRC & Interconnection Networks Topics: LRC implementation, interconnection characteristics.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Birds Eye View of Interconnection Networks
1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
INTERCONNECTION NETWORKS Work done as part of Parallel Architecture Under the guidance of Dr. Edwin Sha By Gomathy Gowri Narayanan Karthik Alagu Dynamic.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Spring EE 437 Lillevik 437s06-l22 University of Portland School of Engineering Advanced Computer Architecture Lecture 22 Distributed computer Interconnection.
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Effective bandwidth with link pipelining Pipeline the flight and transmission of packets over the links Overlap the sending overhead with the transport.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
Interconnection Networks Communications Among Processors.
INTERCONNECTION NETWORK
Topologies.
Parallel Architecture
Interconnect Networks
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Lecture 23: Interconnection Networks
Connection System Serve on mutual connection processors and memory .
Interconnection topologies
Interconnection Network Routing, Topology Design Trade-offs
John Kubiatowicz Electrical Engineering and Computer Sciences
Indirect Networks or Dynamic Networks
Interconnection Network Design Lecture 14
Static Interconnection Networks
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2
Interconnection Networks Contd.
Birds Eye View of Interconnection Networks
Static Interconnection Networks
Presentation transcript:

Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large bisection bandwidth. Regular and Irregular topologies – Regular topology: more organized, more efficient, used when an organization has the total control (supercomputer, data centers) – Irregular topology: less efficient, but better extensibility. Internet

Topology representation Modeled as a graph – Adjacency matrix: graph[N][N] graph[i][j] = 1 if there is a link from node I to node j = 0 otherwise – Adjacency list: graph[i] is a list containing all nodes that node i connects to. – Practical topology data structure: graph[N][DEGREE] graph[i][j] = k if node i connects to node k.

Linear Arrays and Rings Linear array Ring (torus) Short wire torus Diameter = ?, nodal = ? Bisection bandwidth = ?

Describing linear array and ring Array: nodes are numbered from 0, 1, …, N-1 – Node i is connected to node i+1, 0<=i<=N-2 Ring: nodes are numbered from 0, 1, …, N-1 – Node I is connected to node (i+1) mod N, for all 0<=i<=N-1

Multidimensional Meshes and Tori d-dimensional array/torus N = k_{d-1} x k_{d-2} x … x d_0 Each node is described by a d-vector of coordinate Node ((i_{d-1}, i_{d-2}, …, d_0)) is connected to ???

More about multi-dimensional mesh and tori d-dimension k-ary mesh (torus) – Each node is described by a d-vector of coordinates. The value of each item in the vector is between 0 and d_i-1. – Diameter = ? – Nodal degree = ? – Bisection bandwidth = ?

Hypercubes Also call binary n-cubes. # of nodes = N = 2^n Each node is described by its binary representation. N=2, n = 1: nodes 0 and 1 N=4, n= 2: nodes 0(00), 1(01), 2(10), 3(11) N=8, n=3: nodes 0(000), 1(001), 2(010), 3(011), 4(100), 5(101), 6(110), 7(111) N=16, n=4: 0(0000), 1(0001), 2(0010), 3(0011), 4(0100), 5(0101), 6(0110), 7(0111), 8(1000), 9(1001), 10(1010), 11(1011), 12(1100), 13(1101), 14(1110), 15(1111) There is a link between two nodes whose binary representations differ by one bit. Which nodes have links to node 14(1110)? How to map nodes into a topology?

Hypercubes Diameter=? Nodal degree = ? Bisection bandwidth = ?

K-ary n-cube (n-dimensional, k-ary mesh/torus) Extended from binary (hypercube) to k-ary Each dimension has k elements, n dimensions Each node is identified by a k-based number (n digits). – Dimension order routing 4-ary 0-cube 4-ary 1-cube 4-ary 2-cube 4-ary 3-cube

Trees Fixed degree, log(N) diameter, O(1) bisection bandwidth. Routing: up to the common ancestor than go down.

Irregular topology Irregular topology does not any special mathmetic properties – Can be expanded in any way. – No easy way for routing: routes need to be computed like in the Internet. Routes can usually be determined in a regular network by using the coordinates of the source and destination.

Direct and indirect networks All the previously discussed networks are direct networks in that the compute nodes are directly attached to the nodes in the topology. – An example mesh system. Each switch is a 5x5 switch

Indirect networks Compute nodes are not directly attached to each switch, but are rather attached to the whole network. – Using a central interconnect to connect all compute nodes – The network emulate the cross-bar switch functionality.

Fully connected network Different organizations: – Connected by one switch (crossbar switch), connecting all nodes, connected with a crossbar. All permutation communication (each node sends one message and receives one message) can be realized.

Multistage network Try to emulate the cross-bar connection. – Realizing permutation without blocking – Using smaller cross-bar(2x2, 4x4) switches as the building block. Usually O(Nlg(N)) switches (lg(N) stages.

Multi-stage networks examples Butterfly network is blocking. There exist some permutation that results in link contention. Benes network is non-blocking. If the permutation is known a prior, it can always be realized without link contention. (a) An 8-input butterfly network(b) An 8-input Benes network

Clos Network Three stages: ingress stage, middle stage, and egress stage – Ingress/egress stage has r n X m switches – Middle stage has m r X r switches – Each switch at ingress/egress stage connects to all m middle switches (one port to each switch).

Clos Network Clos network is non- blocking when m>=2n-1.

Fat-Trees Fatter links (really more of them) as you go up, so bisection BW scales with N – Not practical, root is an NxN switch

Practical Fat-trees Use smaller switches to approximate large switches. – Connectivity is reduced, but the topology is not implementable – Most commodity large clusters use this topology. Also call constant bisection bandwidth network (CBB)

Slimmed fat-tree Full bisection bandwidth fat-tree: the number of links going up is the same as the number of links going down Slimmed fat-tree the number of links going up is smaller than the number of links going down – uplinks are overprovisioned at the upper level of the tree

Clos network and fat-tree (folded Clos) A generic 3-stage Clos network A generic 2-level fat-tree (folded Clos)

Physical constraint on topologies Number of dimensions. – 2 or 3 dimensions Can be layout physically Short wires, easy to build Many hops, low bisection bandwidth – >=4 dimensions Harder to build, longer wires Fewer hops, better bisection bandwidth – K-ary n-cubes provide a good framework for comparison.

Cost factor Most costs are embedded in NIC+links – Switch cost is usually not dominating With the current technology, long range links are 10x (or more) more expensive than short range links. – Long range links (fiber + optical transceivers+electronic/optical converters) – Short range links (copper wire + electronic transceivers) Topology designs strongly focus on minimizing the number of long range links – 2D, 3D tori can be built without long range links – The center question is how to build a topology that achieve throughput performance with a minimum number of long range links. In on-chip network, long range links are also much more expensive to implement.

Topologies used in the practical systems HPC systems (ranked in June 2015 top 500 supercomputers list) – Tianhe-2 (No. 1): slimmed fat-tree with 2:1 oversubscription factor – Titan (No. 2): Cray gemini network, 3-D torus – Sequoia (No. 3): BlueGene/Q, 5-D torus – K computer (No. 4): 6-D torus – Stampede (No. 8): slimmed fat-tree with 5:4 overscription factors Others: Bluegene/L 3-D torus SGI ICE architecture: bristled hypercube A lot of full bisection bandwidth/slimmed fat-trees for commodity clusters. Topology decides the hardware costs, the large variations of topology indicate there is no clear wins at this time.

Topologies used in the practical systems Data centers – Slimmed fat-trees with variable over-subscription factors. – Also named multi-rooted trees.

Topology for exa-scale platforms Cost and performance constraints – We know full bisectional bandwidth fat-trees are good in performance, but large scale fat-trees are prohibitively expensive – too many long links. – Low dimensional tori do not provide sufficient bisectional bandwidth Need something that provides sufficient bandwidth while not costing too much. Recent proposals: – Slimmed fat-trees (reducing the number of switches at higher level of trees) – Dragonfly (directly connect switches in a regular manner) – Jellyfish (directly and randomly connect switches)