Interconnection Networks Contd.

Slides:



Advertisements
Similar presentations
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
Advertisements

EECC756 - Shaaban #1 lec # 9 Spring Network Definitions A network is a graph V = {switches and nodes} connected by communication channels.
NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)
3. Interconnection Networks. Historical Perspective Early machines were: Collection of microprocessors. Communication was performed using bi-directional.
Interconnection Network Topology Design Trade-offs
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
CS252/Patterson Lec /28/01 CS162 Computer Architecture Lecture 16: Multiprocessor 2: Directory Protocol, Interconnection Networks.
1 Static Interconnection Networks CEG 4131 Computer Architecture III Miodrag Bolic.
EECC756 - Shaaban #1 lec # 10 Spring Generic Multiprocessor Architecture Generic Multiprocessor Architecture Node: processor(s), memory system,
ECE669 L16: Interconnection Topology March 30, 2004 ECE 669 Parallel Computer Architecture Lecture 16 Interconnection Topology.
Interconnect Network Topologies
Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.
Interconnect Networks
Dynamic Networks CS 213, LECTURE 15 L.N. Bhuyan CS258 S99.
ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.
CSE Advanced Computer Architecture Week-11 April 1, 2004 engr.smu.edu/~rewini/8383.
1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Multiprocessor Interconnection Networks Todd C. Mowry CS 740 November 3, 2000 Topics Network design issues Network Topology.
Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 January Session 4.
Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
Topology How the components are connected. Properties Diameter Nodal degree Bisection bandwidth A good topology: small diameter, small nodal degree, large.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Interconnection Networks Communications Among Processors.
Topologies.
Network Connected Multiprocessors
Overview Parallel Processing Pipelining
Parallel Architecture
Interconnect Networks
CS 704 Advanced Computer Architecture
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.
Chapter 8 Switching Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Dynamic connection system
Network Properties, Scalability and Requirements For Parallel Processing Scalable Parallel Performance: Continue to achieve good parallel performance.
Lecture 23: Interconnection Networks
Multiprocessor Interconnection Networks Todd C
Physical constraints (1/2)
Azeddien M. Sllame, Amani Hasan Abdelkader
Prof John D. Kubiatowicz
Static and Dynamic Networks
Interconnection Network Routing, Topology Design Trade-offs
John Kubiatowicz Electrical Engineering and Computer Sciences
Interconnection Network Design Contd.
Introduction to Scalable Interconnection Network Design
Switching, routing, and flow control in interconnection networks
Lecture 14: Interconnection Networks
Interconnection Network Design Lecture 14
Mesh-Connected Illiac Networks
Introduction to Scalable Interconnection Networks
Storage area network and System area network (SAN)
Lecture: Interconnection Networks
Static Interconnection Networks
Interconnection Network Design
High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub
Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2
Advanced Computer and Parallel Processing
Embedded Computer Architecture 5SAI0 Interconnection Networks
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
Birds Eye View of Interconnection Networks
Advanced Computer and Parallel Processing
Static Interconnection Networks
Networks: Routing and Design
Switching, routing, and flow control in interconnection networks
Multiprocessors and Multi-computers
Presentation transcript:

Interconnection Networks Contd. L.N. Bhuyan Partly from Berkeley Notes CS258 S99

More Static Networks: Linear Arrays and Rings Diameter? Average Distance? Bisection bandwidth? Route A -> B given by relative address R = B-A Torus? Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1 2/5/2019 CS258 S99

Multidimensional Meshes and Tori 3D Cube 2D Grid d-dimensional array n = kd-1 X ...X kO nodes described by d-vector of coordinates (id-1, ..., iO) d-dimensional k-ary mesh: N = kd k = dÖN described by d-vector of radix k coordinate d-dimensional k-ary torus (or k-ary d-cube)? Ex: Intel Paragon (2D), SGI Origin (Hypercube), Cray T3E (3DMesh) 2/5/2019 CS258 S99

Hypercubes Also called binary n-cubes. # of nodes = N = 2n. O(logN) Hops Good bisection BW Complexity Out degree is n = logN correct dimensions in order with random comm. 2 ports per processor 0-D 1-D 2-D 3-D 4-D 5-D ! 2/5/2019 CS258 S99

Routing in Hypercube N = 26 nodes S = (sn-1 sn-2… si …s2s1s0) D = (dn-1 dn-2… di… d2d1d0) E-cube routing For i=0 to n-1 Compare si and di Route along i dimension if they differ. Distance = Hamming distance between S and D = the no. of dimensions by which S and D differ. Diameter = Maximum distance = n = log2 N = Dimension of the hypercube No. of alternate parts = n Fault tolerance = (n-1) = O(log2 N) 000=>001=>011=>111 000=>010=>110=>111 000=>100=>101=>111 2/5/2019 CS258 S99

Origin Network Each router has six pairs of 1.56MB/s unidirectional links Two to nodes, four to other routers latency: 41ns pin to pin across a router Flexible cables up to 3 ft long Four “virtual channels”: request, reply, other two for priority or I/O 2/5/2019 CS258 S99

Case Study: Cray T3D Build up info in ‘shell’ Remote memory operations encoded in address 2/5/2019 CS258 S99

Trees Diameter and ave distance logarithmic Fixed degree k-ary tree, height d = logk N address specified d-vector of radix k coordinates describing path down from root Fixed degree Route up to common ancestor and down R = B xor A let i be position of most significant 1 in R, route up i+1 levels down in direction given by low i+1 bits of B H-tree space is O(N) with O(ÖN) long wires Bisection BW? 2/5/2019 CS258 S99

Real Machines Wide links, smaller routing delay Tremendous variation Topology Cycle Time (ns) Channel Width (bits) Routing Delay (cycles) Flit (data bits) nCUBE/2 Hypercube 25 1 40 32 TMC CM-5 Fat-Tree 4 10 IBM SP-2 Banyan 8 5 16 Intel Paragon 2D Mesh 11.5 2 Meiko CS-2 20 7 CRAY T3D 3D Torus 6.67 DASH Torus 30 J-Machine 3D Mesh 31 Monsoon Butterfly SGI Origin 2.5 160 Myricom Arbitrary 6.25 50 Wide links, smaller routing delay Tremendous variation 2/5/2019 CS258 S99

What is Dynamic Network Dynamic Network is the network that can connect any input to any output by enabling or disabling some switches in the network Examples: - Shared Bus: The bus arbiter connects a processor to a memory - Crossbar: Consists of a lot of switching elements, which can be enabled to connect many inputs to many outputs simultaneously - Multistage Network: Consists of several stages of switches that are enabled to get connections - The nodes in static networks (like Mesh) also consist of dynamic crossbars 2/5/2019 CS258 S99

Dynamic Network Consists of Switches Switch Components Output ports transmitter (typically drives clock and data) Input ports synchronizer aligns data signal with local clock domain essentially FIFO buffer Crossbar connects each input to any output degree limited by area or pinout Buffering Control logic complexity depends on routing logic and scheduling algorithm determine output port for each incoming packet arbitrate among inputs directed at same output 2/5/2019 CS258 S99

Crossbar Switch Design Complexity O(N**2) for an NXN Crossbar – Why? See next page 2/5/2019 CS258 S99

How do you build a crossbar From Control N**2 switches => Cost O(N**2) Time taken by the arbiter = O(N**2) Multiplexors are controlled from controller 2/5/2019 CS258 S99

Crossbar Contd. An NXN Crossbar allows all N inputs to be connected simultaneously to all N outputs It allows all one-to-one mappings, called permutations. No. of permutations = N! When two or more inputs request the same output, only one of them is connected and others are either dropped or buffered When processors access memories through crossbar, this situation is called memory access conflicts 2/5/2019 CS258 S99

Multistage Interconnection Network A network consisting of multiple stages of crossbar switches has the following properties. NxN network for N=2n Consists of log2N stages of 2x2 switches Has N/2 2x2 switches per stage Cost O(N log n) instead of O(N2) for Crossbar For N= an, a MIN can be similarly designed with axa switches 2/5/2019 CS258 S99

Multistage interconnection networks 000 1 1 001 2 010 1 3 011 4 100 5 101 6 110 7 111 Omega Network and Self Routing Note: Complexity O(Nlog2N) Conflict, less BW than Crossbar, but cost effective; 2/5/2019 CS258 S99

Example: SP 8-port switch, 40 MB/s per link, 8-bit phit, 16-bit flit, single 40 MHz clock packet sw, cut-through, no virtual channel, source-based routing variable packet <= 255 bytes, 31 byte fifo per input, 7 bytes per output, 16 phit links 128 8-byte ‘chunks’ in central queue, LRU per output run in shadow mode 2/5/2019 CS258 S99

Switching Techniques Circuit Switching: A control message is sent from source to destination and a path is reserved. Communication starts. The path is released when communication is complete. Store-and-forward policy (Packet Switching): each switch waits for the full packet to arrive in switch before sending to the next switch (good for WAN) Cut-through routing or worm hole routing: switch examines the header, decides where to send the message, and then starts forwarding it immediately In worm hole routing, when head of message is blocked, message stays strung out over the network, potentially blocking other messages (needs only buffer the piece of the packet that is sent between switches). CM-5 uses it, with each switch buffer being 4 bits per port. Cut through routing lets the tail continue when head is blocked, storing the whole message into an intermmediate switch. (Requires a buffer large enough to hold the largest packet). 2/5/2019 CS258 S99

2/5/2019 CS258 S99

Store and Forward vs. Cut-Through Advantage Latency reduces from function of: number of intermediate switches X by the size of the packet to time for 1st part of the packet to negotiate the switches + the packet size ÷ interconnect BW 2/5/2019 CS258 S99

Store&Forward vs Cut-Through Routing h(n/b + D) vs n/b + h D what if message is fragmented? wormhole vs virtual cut-through 2/5/2019 CS258 S99

Wormhole: (# nodes x node delay) + transfer time Example Q. Compare the efficiency of store-and-forward (packet switching) vs. wormhole routing for transmission of a 20 bytes packet between a source and destination, which are d-nodes apart. Each node takes 0.25 microsecond and link transfer rate is 20 MB/sec. Answer: Time to transfer 20 bytes over a link = 20/20 MB/sec = 1 microsecond. Packet switching: # nodes x (node delay + transfer time)= d x (.25 + 1) = 1.25 d microseconds Wormhole: (# nodes x node delay) + transfer time = 0.25 d + 1 Book: For d=7, packet switching takes 8.75 microseconds vs. 2.75 microseconds for wormhole routing