Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.

Slides:

Advertisements

Similar presentations

Comparison Of Network On Chip Topologies Ahmet Salih BÜYÜKKAYHAN Fall.

Advertisements

Shantanu Dutt Univ. of Illinois at Chicago

Data Communications and Networking

Interconnection Networks: Flow Control and Microarchitecture.

Super computers Parallel Processing By: Lecturer \ Aisha Dawood.

Parallel System Performance CS 524 – High-Performance Computing.

1 Lecture 23: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Appendix E)

1 CSE 591-S04 (lect 14) Interconnection Networks (notes by Ken Ryu of Arizona State) l Measure –How quickly it can deliver how much of what’s needed to.

Wide Area Networks School of Business Eastern Illinois University © Abdou Illia, Spring 2007 (Week 11, Thursday 3/22/2007)

1 Lecture 8 Architecture Independent (MPI) Algorithm Design Parallel Computing Fall 2007.

NUMA Mult. CSE 471 Aut 011 Interconnection Networks for Multiprocessors Buses have limitations for scalability: –Physical (number of devices that can be.

1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.

Communication operations Efficient Parallel Algorithms COMP308.

Parallel Computing Platforms

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches (Sections 8.1 – 8.5)

CS 584. Algorithm Analysis Assumptions n Consider ring, mesh, and hypercube. n Each process can either send or receive a single message at a time. n No.

Parallel System Performance CS 524 – High-Performance Computing.

Interconnection Networks in Multiprocessor Systems By: Wallun Chan Course: CS 147 Text: Chapter 12, p Professor: Sin-Min Lee.

1 25\10\2010 Unit-V Connecting LANs Unit – 5 Connecting DevicesConnecting Devices Backbone NetworksBackbone Networks Virtual LANsVirtual LANs.

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

Performance and Power Efficient On-Chip Communication Using Adaptive Virtual Point-to-Point Connections M. Modarressi, H. Sarbazi-Azad, and A. Tavakkol.

Network Topologies.

Interconnect Network Topologies

Interconnection Networks. Applications of Interconnection Nets Interconnection networks are used everywhere! ◦ Supercomputers – connecting the processors.

High Performance Embedded Computing © 2007 Elsevier Lecture 16: Interconnection Networks Embedded Computing Systems Mikko Lipasti, adapted from M. Schulte.

Computer Science Department

Interconnect Networks

On-Chip Networks and Testing

1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.

ATM SWITCHING. SWITCHING A Switch is a network element that transfer packet from Input port to output port. A Switch is a network element that transfer.

PPC Spring Interconnection Networks1 CSCI-4320/6360: Parallel Programming & Computing (PPC) Interconnection Networks Prof. Chris Carothers Computer.

1 Lecture 7: Interconnection Network Part I: Basic Definitions Part II: Message Passing Multicomputers.

1 Next Few Classes Networking basics Protection & Security.

Course Wrap-Up Miodrag Bolic CEG4136. What was covered Interconnection network topologies and performance Shared-memory architectures Message passing.

1 Dynamic Interconnection Networks Miodrag Bolic.

Circuit & Packet Switching. ► Two ways of achieving the same goal. ► The transfer of data across networks. ► Both methods have advantages and disadvantages.

Lecture 3 Innerconnection Networks for Parallel Computers

William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.

Anshul Kumar, CSE IITD CSL718 : Multiprocessors Interconnection Mechanisms Performance Models 20 th April, 2006.

Physical Topology Physical layout of the network nodes – Broad description of the network: no detail about device types, connection methods, addressing,...

MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

Chapter 8-2 : Multicomputers Multiprocessors vs multicomputers Multiprocessors vs multicomputers Interconnection topologies Interconnection topologies.

Anshul Kumar, CSE IITD ECE729 : Advanced Computer Architecture Lecture 27, 28: Interconnection Mechanisms In Multiprocessors 29 th, 31 st March, 2010.

Network Concepts Topologies

Birds Eye View of Interconnection Networks

Network Technologies Definitions –Network: physical connection that allows two computers to communicate –Packet: a unit of transfer »A sequence of bits.

1 Interconnection Networks. 2 Interconnection Networks Interconnection Network (for SIMD/MIMD) can be used for internal connections among: Processors,

Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.

Super computers Parallel Processing

HYPERCUBE ALGORITHMS-1

1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)

Day 13 Intro to MANs and WANs. MANs Cover a larger distance than LANs –Typically multiple buildings, office park Usually in the shape of a ring –Typically.

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University

LAN Topologies Part 1. What is topology? Topology is the physical or logical interconnection of communicating devices Physical Topology: LANtopology,

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.

Dynamic connection system

Lecture 23: Interconnection Networks

Refer example 2.4on page 64 ACA(Kai Hwang) And refer another ppt attached for static scheduling example.

Azeddien M. Sllame, Amani Hasan Abdelkader

Static and Dynamic Networks

Interconnection Network Design Lecture 14

Communication operations

High Performance Computing & Bioinformatics Part 2 Dr. Imad Mahgoub

Advanced Computer and Parallel Processing

Interconnection Networks Contd.

Embedded Computer Architecture 5SAI0 Interconnection Networks

CS 6290 Many-core & Interconnect

Advanced Computer and Parallel Processing

Presentation transcript:

Dynamic Interconnect Lecture 5

COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N processors to connect, Nlog(N) links –log(N) stages, each stage is connected by shuffle –Each stage N/2 2x2 switch boxes P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN Omega Network -- Routing Distributed control –Check the bit in this stage if it is 0 then connect to upper port, otherwise connect to the lower port Not all permutations are possible –What if 010 connects to 110, 110 to 100, and 000 to 101 P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN Comparisons between Dynamic Networks Bus System –Assume n processors on the bus; bus width is w bits –Data transfer latency: constant –Bandwidth per processor: O(w/n) to O(w) –Wire complexity: O(w) –Switching complexity: O(n) –Routing capability: only one to one at a time –Advantage: Cheap to build –Disadvantage Low bandwidth available to each processor Prone to failure

COEN Comparisons between Dynamic Networks Crossbar Switch –Assume n x n crossbar with line width of w bits –Data transfer latency: constant –Bandwidth per processor: O(w) to O(nw) –Wire complexity: O(n 2 w) –Switching complexity: O(n 2 ) –Routing capability: all permutations one at a time –Advantage: Highest bandwidth Highest routing capability –Disadvantage High hardware cost

COEN Comparisons between Dynamic Networks Multistage network –Assume n x n processors to connect with line width of w bits using 2 x 2 switch –Data transfer latency: O(logn) –Bandwidth per processor: O(w) to O(nw) –Wire complexity: O(nwlogn) –Switching complexity: O(nlogn) –Routing capability: Some permutations and broadcast –Advantage: Scalability with modular construction Medium cost –Disadvantage Long latency

COEN Message Transfer Mechanisms Message typically consist of: –A header which contains information about the destination –The data that needs to be transmitted –A trailer which signals the end of the message Circuit switching strategy determines how message data is actually transferred across network links in the chosen message route Three components to message transfer cost: –Startup time (ts) - cost of handling message at sending processor –Per-hop time (tp) - it is the time taken by the header to traverse a link –Per-word transfer time (tw) - time taken for a word to traverse a link

COEN Dynamic Network -- Switching Strategy Circuit switching: –A circuit path is established from source to the destination. –Like telephone system –Requires setup time and poor bandwidth, but has short latency –Latency for routing a m word message with l hops: t = ts + tp + mtw  ts + m tw P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN Dynamic Network -- Switching Strategy Store-and-forward (packet switching) –Message travels one link a time when neighbor link is free –Buffer the message when there is link is not free –Like postal offices –No pre-setup time and better bandwidth, but longer latency –Only one link on the path could be active –Latency: n(ts + m tw) P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7 Whole package buffered here

COEN Dynamic Network -- Switching Strategy Cut-through –Similar to Store-and-forward, but –Message will be broken into parcels –All the links on the path could be active –Also called warmhole routing –Small setup time –Latency: l(ts + tp) + mtw  ltp + mtw P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7 Parcels are buffered here

COEN Static Network Vs Dynamic Network Static Network –There is a point-to-point links between processors –Parallel system expansion is easy –Some processors may be “closer” than others –Generally used for message passing machine interconnects Dynamic Network –Paths are established as needed between processors –System expansion is difficult –Processors are usually equidistant –Usually used for shared memory machine interconnects

COEN One-to-all broadcast Algorithms often require a processor to send identical data to all other processors or a subset of processors. This operation is called one-to-all broadcast or single node broadcast At the start of a single node broadcast, each processor has m words of data that needs to be sent. At the end there a p copies of this data, one on each processor The dual of a broadcast operation is a all-to-one reduction or single node reduction All-to-one reduction –At the start of a single node reduction each processor has m words of data, the reduction combines all the data from processors using an associative operator to produce m words at the receiver Naive single node broadcast or reduction using p-1 steps

COEN One-to-all Broadcast 01p M 1 M p-1 M... Broadcast Reduction Accumulation

COEN Store-and-forward Routing on Ring Source send message on both outgoing links in first two steps All other processors receive on a link and transmit on other link It takes p/2 steps Cost: (ts + m tw) p/2 –What if we use circuit switching routing?

COEN Store-and-forward Routing on Hypercube Takes log(p) steps for a p processor hypercube In the ith step, all processors that have the message transmit it to the neighboring processor that differs in the ith most significant bit Cost: (ts + mtw)log(p)

COEN Homework Due next lecture Assume there a mesh interconnect network with p = N x N nodes. Using store-and-forward for the routing. (a) Find the node which the highest complexity for operation one-to-all broadcast. (b) Describe your routing algorithm (using pseudo code). (b) What is the broadcast cost?

COEN Cut-through Routing on Ring Algorithm takes log(p) steps In step i, message is sent to processor at distant p/2 i All messages flow in the same direction Cost: log(p) (ts + mtw) + tp(p-1)

COEN Cut-through Routing on 2D Torus Apply ring algorithm for the processor row of sender Now use ring algorithm for all processor columns 2log(p) steps Cost: –(ts+mtw) log(p) + 2tp(  p -1) This algorithm works for 2D mesh too

COEN Cut-through Routing on Hypercube Takes log(p) steps for a p processor hypercube In the ith step, all processors that have the message transmit it to the neighboring processor that differs in the ith most significant bit Cost: (ts + mtw)log(p) Cut-through does not provide benifits because of the use of only single link of communications

COEN Summary Switching Strategies –Circuit switch –Store-forward –Cut-through (wormhole) One-to-all broadcasting on –Ring Using store-forward Using cut-through –Hypercube Using store-forward Using cut-through