Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.

Dynamic Interconnect Lecture 5

COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N processors to connect, Nlog(N) links –log(N) stages, each stage is connected by shuffle –Each stage N/2 2x2 switch boxes P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN 312 3 Omega Network -- Routing Distributed control –Check the bit in this stage if it is 0 then connect to upper port, otherwise connect to the lower port Not all permutations are possible –What if 010 connects to 110, 110 to 100, and 000 to 101 P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN 312 4 Comparisons between Dynamic Networks Bus System –Assume n processors on the bus; bus width is w bits –Data transfer latency: constant –Bandwidth per processor: O(w/n) to O(w) –Wire complexity: O(w) –Switching complexity: O(n) –Routing capability: only one to one at a time –Advantage: Cheap to build –Disadvantage Low bandwidth available to each processor Prone to failure

COEN 312 5 Comparisons between Dynamic Networks Crossbar Switch –Assume n x n crossbar with line width of w bits –Data transfer latency: constant –Bandwidth per processor: O(w) to O(nw) –Wire complexity: O(n 2 w) –Switching complexity: O(n 2 ) –Routing capability: all permutations one at a time –Advantage: Highest bandwidth Highest routing capability –Disadvantage High hardware cost

COEN 312 6 Comparisons between Dynamic Networks Multistage network –Assume n x n processors to connect with line width of w bits using 2 x 2 switch –Data transfer latency: O(logn) –Bandwidth per processor: O(w) to O(nw) –Wire complexity: O(nwlogn) –Switching complexity: O(nlogn) –Routing capability: Some permutations and broadcast –Advantage: Scalability with modular construction Medium cost –Disadvantage Long latency

COEN 312 7 Message Transfer Mechanisms Message typically consist of: –A header which contains information about the destination –The data that needs to be transmitted –A trailer which signals the end of the message Circuit switching strategy determines how message data is actually transferred across network links in the chosen message route Three components to message transfer cost: –Startup time (ts) - cost of handling message at sending processor –Per-hop time (tp) - it is the time taken by the header to traverse a link –Per-word transfer time (tw) - time taken for a word to traverse a link

COEN 312 8 Dynamic Network -- Switching Strategy Circuit switching: –A circuit path is established from source to the destination. –Like telephone system –Requires setup time and poor bandwidth, but has short latency –Latency for routing a m word message with l hops: t = ts + tp + mtw  ts + m tw P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7

COEN 312 9 Dynamic Network -- Switching Strategy Store-and-forward (packet switching) –Message travels one link a time when neighbor link is free –Buffer the message when there is link is not free –Like postal offices –No pre-setup time and better bandwidth, but longer latency –Only one link on the path could be active –Latency: n(ts + m tw) P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7 Whole package buffered here

COEN 312 10 Dynamic Network -- Switching Strategy Cut-through –Similar to Store-and-forward, but –Message will be broken into parcels –All the links on the path could be active –Also called warmhole routing –Small setup time –Latency: l(ts + tp) + mtw  ltp + mtw P0 P1 P2 P3 P4 P5 P6 P7 P0 P1 P4 P5 P2 P3 P6 P7 Parcels are buffered here

COEN 312 11 Static Network Vs Dynamic Network Static Network –There is a point-to-point links between processors –Parallel system expansion is easy –Some processors may be “closer” than others –Generally used for message passing machine interconnects Dynamic Network –Paths are established as needed between processors –System expansion is difficult –Processors are usually equidistant –Usually used for shared memory machine interconnects

COEN 312 12 One-to-all broadcast Algorithms often require a processor to send identical data to all other processors or a subset of processors. This operation is called one-to-all broadcast or single node broadcast At the start of a single node broadcast, each processor has m words of data that needs to be sent. At the end there a p copies of this data, one on each processor The dual of a broadcast operation is a all-to-one reduction or single node reduction All-to-one reduction –At the start of a single node reduction each processor has m words of data, the reduction combines all the data from processors using an associative operator to produce m words at the receiver Naive single node broadcast or reduction using p-1 steps

COEN 312 13 One-to-all Broadcast 01p-1... 0 M 1 M p-1 M... Broadcast Reduction Accumulation

COEN 312 14 Store-and-forward Routing on Ring Source send message on both outgoing links in first two steps All other processors receive on a link and transmit on other link It takes p/2 steps Cost: (ts + m tw) p/2 –What if we use circuit switching routing? 0123 7654 1 2 3 34 2 2

COEN 312 15 Store-and-forward Routing on Hypercube Takes log(p) steps for a p processor hypercube In the ith step, all processors that have the message transmit it to the neighboring processor that differs in the ith most significant bit Cost: (ts + mtw)log(p) 000 100 110 111011 101 001 010 1 2 2 3 33 3

COEN 312 16 Homework Due next lecture Assume there a mesh interconnect network with p = N x N nodes. Using store-and-forward for the routing. (a) Find the node which the highest complexity for operation one-to-all broadcast. (b) Describe your routing algorithm (using pseudo code). (b) What is the broadcast cost?

COEN 312 17 Cut-through Routing on Ring Algorithm takes log(p) steps In step i, message is sent to processor at distant p/2 i All messages flow in the same direction Cost: log(p) (ts + mtw) + tp(p-1) 0123 7654 1 2 3 2 33

COEN 312 18 Cut-through Routing on 2D Torus Apply ring algorithm for the processor row of sender Now use ring algorithm for all processor columns 2log(p) steps Cost: –(ts+mtw) log(p) + 2tp(  p -1) This algorithm works for 2D mesh too 0123 4567 891011 12131415 1 22 3333 4 4 4 4 4 4 4 4

COEN 312 19 Cut-through Routing on Hypercube Takes log(p) steps for a p processor hypercube In the ith step, all processors that have the message transmit it to the neighboring processor that differs in the ith most significant bit Cost: (ts + mtw)log(p) Cut-through does not provide benifits because of the use of only single link of communications 000 100 110 111011 101 001 010 1 2 2 3 33 3

COEN 312 20 Summary Switching Strategies –Circuit switch –Store-forward –Cut-through (wormhole) One-to-all broadcasting on –Ring Using store-forward Using cut-through –Hypercube Using store-forward Using cut-through

Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.

Similar presentations

Presentation on theme: "Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.

Similar presentations

Presentation on theme: "Dynamic Interconnect Lecture 5. COEN 312 2 Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N."— Presentation transcript:

Similar presentations

About project

Feedback