Computer Architecture II 1 Computer architecture II Network topologies.

Slides:



Advertisements
Similar presentations
Compsci 221 / ECE 259 Advanced Computer Architecture II (Parallel Computer Architecture) Interconnection Networks Copyright 2012 Alvin R. Lebeck Duke.
Advertisements

1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)
Computer architecture II
EECC756 - Shaaban #1 lec # 9 Spring Network Definitions A network is a graph V = {switches and nodes} connected by communication channels.
1 Version 3 Module 8 Ethernet Switching. 2 Version 3 Ethernet Switching Ethernet is a shared media –One node can transmit data at a time More nodes increases.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
CS 258 Parallel Computer Architecture Lecture 5 Routing February 6, 2008 Prof John D. Kubiatowicz
Communication operations Efficient Parallel Algorithms COMP308.
Generic Multiprocessor Architecture
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
3. Interconnection Networks. Historical Perspective Early machines were: Collection of microprocessors. Communication was performed using bi-directional.
EECS 570: Fall rev1 1 Chapter 10: Scalable Interconnection Networks.
EECC756 - Shaaban #1 lec # 9 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
EECC756 - Shaaban #1 lec # 10 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Storage area network and System area network (SAN)
Switching, routing, and flow control in interconnection networks.
Interconnect Network Topologies
CS252 Graduate Computer Architecture Lecture 15 Multiprocessor Networks March 14 th, 2011 John Kubiatowicz Electrical Engineering and Computer Sciences.
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
1 Lecture 23: Interconnection Networks Topics: Router microarchitecture, topologies Final exam next Tuesday: same rules as the first midterm Next semester:
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Interconnect Networks
Network Topologies Topology – how nodes are connected – where there is a wire between 2 nodes. Routing – the path a message takes to get from one node.
Winter 2006 ENGR 9861 – High Performance Computer Architecture March 2006 Interconnection Networks.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
1 Message passing architectures and routing CEG 4131 Computer Architecture III Miodrag Bolic Material for these slides is taken from the book: W. Dally,
ECE669 L21: Routing April 15, 2004 ECE 669 Parallel Computer Architecture Lecture 21 Routing.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
ECE 259 / CPS 221 Advanced Computer Architecture II (Parallel Computer Architecture) Interconnection Networks Copyright 2004 Daniel J. Sorin Duke University.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnection Networks Alvin R. Lebeck CPS 220.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Super computers Parallel Processing
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
1 Lecture 14: Interconnection Networks Topics: dimension vs. arity, deadlock.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
Interconnect Networks
Advanced Computer Networks
Lecture 23: Interconnection Networks
Multiprocessor Interconnection Networks Todd C
Course Outline Introduction in algorithms and applications
Azeddien M. Sllame, Amani Hasan Abdelkader
Prof John D. Kubiatowicz
Interconnection Network Routing, Topology Design Trade-offs
Interconnection Network Design Contd.
Introduction to Scalable Interconnection Network Design
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Communication operations
Introduction to Scalable Interconnection Networks
Interconnection Network Design
Interconnection Networks Contd.
Embedded Computer Architecture 5SAI0 Interconnection Networks
CS 6290 Many-core & Interconnect
Networks: Routing and Design
Switching, routing, and flow control in interconnection networks
Presentation transcript:

Computer Architecture II 1 Computer architecture II Network topologies

Computer Architecture II 2 Plan for today Scalable interconnection networks  Basic concepts, definitions  Topologies  Switching  Routing  Performance

Computer Architecture II 3 Outline Basic concepts, definitions Topologies Switching Routing Performance

Computer Architecture II 4 Formalism Graph G=(V,E) V : switches and nodes E: communication channels (edges) e  V  V Route: (v 0,..., v k ) path of length k between nodes 0 und k, where (v i,v i+1 )  E Routing distance Diameter: the maximal route length between two nodes Average distance Degree: number of input (output) channels of a node Bisection width: minimal number of parallel connections that saturates the network

Computer Architecture II 5 What characterizes a network? Bandwidth (offered bandwidth) b = wf –where width w (in bytes) and signaling rate f = 1/t (in Hz) Latency –Time a message travels between two nodes Throughput (delivered bandwidth) –How much from the offered bandwidth is effectively used

Computer Architecture II 6 What characterizes a network? Topology –physical interconnection structure of the network graph Routing Algorithm –restricts the set of paths that messages may follow –many algorithms with different properties Switching Strategy –how data in a message traverses a route –circuit switching vs. packet switching Flow Control Mechanism –when a message or portions of it traverse a route what happens when traffic is encountered?

Computer Architecture II 7 Goals Latency as small as possible High Throughput As many concurrent transfers as possible –Bisection width gives the potential number of parallel connection Cost as low as possible

Computer Architecture II 8 Bus (e.g. Ethernet) Degree = 1 diameter = 1 –No routing necessary bisection width = 1 CSMA/CD-protocol limited bus length Simplest and cheapest dynamic network

Computer Architecture II 9 Complete graph degree= n-1 too expensive for big nets diameter = 1 bisection width=  n/2  n/2  Static Network Connection between each Pair of nodes When cutting the network into two halves, each node has connection to n/2 other nodes. There are n/2 such Nodes.

Computer Architecture II 10 Ring degree= 2 diameter =  n/2  slow for big networks bisection width = Static network A node i linked with nodes i+1 and i-1 modulo n. –Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1

Computer Architecture II 11 For d dimensions degree= d diameter = d ( d  n –1) bisection width = ( d  n) d–1 d-dimensional grid 1,1 1,21,3 2,12,22,3 3,1 3,23,3 Cray T3D und T3E. Static network

Computer Architecture II 12 Crossbar fast and expensive (n 2 switches) Most: Processor x memory degree= 1 diameter = 2 bisection width = n/2 Ex: 4x4, 8x8, 16x Dynamic network 23           switch

Computer Architecture II  Hypercube (1) Hamming-Distance = number of bits in which the binary representation of two numbers differ Two nodes are connected if the Hamming distance is 1 Routing from x to y by decreasing the Hemming distance 0000  0001  0010  0000  0001  0011  0010  0100  0101  0111  0110  Static network

Computer Architecture II 14 Hypercube (2) degree= k diameter = k bisection width = n/2 Two (k-1)-hypercubes are linked through n/2 edges to form a k-hypercube 0000  0001  0011  0010  0000  0001  0011  0010  0100  0101  0111  0110  Intel iPSC/860, SGI Origin 2000 k dimensions, n= 2 k nodes

Computer Architecture II 15 Building block: 2x2 Shuffle Perfect Shuffle Target = cyclic left shift Omega-Network (1)

Computer Architecture II 16 Omega-Network (2) Log 2 n levels of of 2x2 Shuffle building block dynamic network Level i looks at bit i If 0 goes up If 1 goes down See example for 100 sending to

Computer Architecture II 17 Omega-Network (3) n nodes, (n/2) log 2 n building blocks degree= 2 for nodes, 4 for building blocks diameter = log 2 n bisection width = n/2 –for a random permutation, n/2 messages are expected to cross the network in parallel –Extremes If all the nodes want to send to 0, only one message in parallel If each sends a message to himself n messages in parallel

Computer Architecture II 18 Fat Tree /Clos-Network (1) Nodes = leaves of a tree Tree has the diameter 2log 2 n „von farthest left over the root to farthest right" Simple tree has bisection width = 1 bottleneck Fat Tree: –Edges at level i have double capacity as edges at level i-1 –At level i expensive switches with 2 i inputs and 2 i outputs –Known as Clos-networks

Computer Architecture II 19 Fat Tree/Clos-Network (2)          Routing: Direct way over the lowest common parent When alternative exists, choose randomly. Tolerance to node failure diameter 2log 2 n, bisection width: n CM-5

Computer Architecture II 20 Switching How a message traverses the network from one node to the other Circuit switching –One path from source to destination established –All packets will take that way –Like the telephone system Packet switching –A message broken into a sequence of packets which can be sent across different routes –Better utilization of network resources

Packet Routing There are two basic approaches to routing packets, based on what a switch does when the packet begins arriving 1)Store-and-forward 2)Cut-through –Virtual cut-through –Wormhole

Computer Architecture II 22 Packet routing: Store-and- Forward A packet is completely stored at a switch before being forwarded The packet is always on at least two nodes Pb: Switches need lots of memory for storing the incoming packets Switching takes place step-by-step, the blocking danger is small

Computer Architecture II 23 Packet routing: Cut through A packet may come partially into the switch and leave its tail on other nodes –It may reside on more than 2 switches The decision to forward the packet may be taken right away What to do with the rest of the packet if the head blocks? –Cut-through: gather tail where the head is It degenerates into store-and-forward for high contention –Wormhole: If the head blocks the whole “worm” blocks

Computer Architecture II 24 Store&Forward vs Cut-Through Routing h(n/b +  ) vsn/b + h  h: number of hops n: message size b: bandwidth  routing delay per hop

Routing Algorithm How do I know where a packet should go? –Topology does NOT determine routing Routing algorithms 1)Arithmetic 2)Source-based 3)Table lookup 4)Adaptive—route based on network state (e.g., contention)

(1) Arithmetic Routing For regular topology, use simple arithmetic to determine route E.g., 3D Torus xy-routing –Packet header contains signed offset to destination (per dimension) –At each hop, switch +/- to reduce offset in a dimension –When x == 0 and y == 0, then at correct processor Drawbacks –Requires ALU in switch –Must re-compute CRC at each hop (0,0,0) (1,0,0) (0,0,1) (1,0,1) (0,1,1) (1,1,1) (0,1,0) (1,1,0)

(2) Source Based & (3) Table Lookup Routing Source Based Source specifies output port for each switch in route Very simple switches –No control state –Strip output port off header Myrinet uses this Can’t be made adaptive Table Lookup Very small header: contains a field that is a index into table for output port Big tables, must be kept up-to-date

Deterministic vs. Adaptive Routing Deterministic—follows a pre- specified route –K-ary d-cube: dimension-order routing (x1, y1)  (x2, y2) First Dx = x2 - x1, Then Dy = y2 - y1, –Tree: common ancestor Adaptive—route determined by contention for output port

Computer Architecture II 29 (4) Adaptive Routing Essential for fault tolerance –At least multipath Can improve utilization of the network Simple deterministic algorithms easily run into bad permutations

Computer Architecture II 30 Contention Two packets trying to use the same link at same time –limited buffering –drop? Most parallel machines networks block in place –Traffic may back up toward the source –tree saturation: backing up all the way long toward destination Discard packets and inform the source about that

Computer Architecture II 31 Communication Perf: Latency Time(n) s-d = overhead + routing delay + channel occupancy + contention delay –Overhead: time necessary for initiating the sending and reception of a message –occupancy = (n + n e ) / b n: data (payload) size n e : packet envelope size –Routing delay –Contention

Computer Architecture II 32 Bandwidth What affects local bandwidth? –packet densityb x n/(n + n e ) –routing delayb x n / (n + n e + w  )  nr. Of cycles waiting for a routing decision w: width of the channel –contention endpoints within the network Aggregate bandwidth –bisection bandwidth sum of bandwidth of smallest set of links that partition the network Bad if not uniform distribution of communication –total bandwidth of all the channels

Computer Architecture II 33 Interconnects NameLatencyBandwidthTopologyComments Gigabit us1 Gb/sStar or Fat Tree Cheap for small systems Infiniband 4x3.5-7us10-20 Gb/sFat Tree -Not as mature as Myrinet -Smaller switches(128 port) -Cost ~$500/card+port Myrinet3.5-7us2-8 Gb/sClos -Mature, de facto standard port switches -cost ~$500/card + port NUMAlink41-2us8-16 Gb/sFat Tree -SGI Proprietary -Special uproc for I/O -shmem Quadrics1-2us9 Gb/sFat Tree -Expensive -Used in turn-key machines SCI/Dolphin1-2us4 Gb/s2D/3D Torus -Cabling nightmare! -Costs more than Myrinet

Computer Architecture II 34 Myrinet Offered bandwidth 2+2 Gbit/s, full duplex 5-7  s latency Arbitrary Topology, Fat Tree/Clos-Network preferable Routing: Wormhole, Source Routing Cable (8+1 Bit parallel) or fiber optics Flow-control on each link Adaptor –programmable RISC-Processor 333 MHz, –PCI/PCI-X connection, upto 133 MHz, 64-Bit, –8 Gb/s over PCI-X Bus uni-directional –2 MB

Computer Architecture II 35 Myrinet Fat Tree (128 node) 16x16 crossbar

Computer Architecture II 36 Myrinet PCI-Bus-Adaptor Netw. interface 2 MB SRAM Host- DMA PCI Bridge Net- DMA LanAI CPU cable connect PCI (-X)-bridge, 64 Bit, MHz LanAI RISC, 333 MHz 2 LWL-connectors, both duplex 2MB SRAM

Computer Architecture II 37 Myrinet 16x16 crossbar –8 computers connected in the front side (2 chanels) –On the backside 8 outputs (2 chanels) toward next level of Clos network –32x32, two

Computer Architecture II nodes Clos Building block from earlier

Computer Architecture II 39 Myrinet Clos-Network Routing network with bisection width 256 Front side 256 computer connection Back side 256 connection to next level routing units

Computer Architecture II 40 Clos-Network with full bisection width: 64 nodes and 32 nodes