MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE

Slides:



Advertisements
Similar presentations
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Advertisements

1 Agenda TMA2 Feedback TMA3 T821 Bock 2. 2 Packet Switching.
Jaringan Komputer Lanjut Packet Switching Network.
What is Flow Control ? Flow Control determines how a network resources, such as channel bandwidth, buffer capacity and control state are allocated to packet.
Module 3.4: Switching Circuit Switching Packet Switching K. Salah.
What's inside a router? We have yet to consider the switching function of a router - the actual transfer of datagrams from a router's incoming links to.
1 Lecture 12: Interconnection Networks Topics: dimension/arity, routing, deadlock, flow control.
EE 4272Spring, 2003 Chapter 10 Packet Switching Packet Switching Principles  Switching Techniques  Packet Size  Comparison of Circuit Switching & Packet.
Parallel Routing Bruce, Chiu-Wing Sham. Overview Background Routing in parallel computers Routing in hypercube network –Bit-fixing routing algorithm –Randomized.
CSCI 8150 Advanced Computer Architecture
Communication operations Efficient Parallel Algorithms COMP308.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control Final exam reminders:  Plan well – attempt every question.
1 Lecture 24: Interconnection Networks Topics: topologies, routing, deadlocks, flow control.
1 Lecture 25: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix E) Review session,
Distributed process management: Distributed deadlock
CS401 presentation1 Effective Replica Allocation in Ad Hoc Networks for Improving Data Accessibility Takahiro Hara Presented by Mingsheng Peng (Proc. IEEE.
Switching, routing, and flow control in interconnection networks.
Switching Techniques Student: Blidaru Catalina Elena.
Data Communications and Networking
1 The Turn Model for Adaptive Routing. 2 Summary Introduction to Direct Networks. Deadlocks in Wormhole Routing. System Model. Partially Adaptive Routing.
Communication Networks
On-Chip Networks and Testing
Distributed Routing Algorithms. In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast,
A Distributed Scheduling Algorithm for Real-time (D-SAR) Industrial Wireless Sensor and Actuator Networks By Kiana Karimpour.
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
QoS Support in High-Speed, Wormhole Routing Networks Mario Gerla, B. Kannan, Bruce Kwan, Prasasth Palanti,Simon Walton.
Dynamic Interconnect Lecture 5. COEN Multistage Network--Omega Network Motivation: simulate crossbar network but with fewer links Components: –N.
Computer Architecture Distributed Memory MIMD Architectures Ola Flygt Växjö University
Sami Al-wakeel 1 Data Transmission and Computer Networks The Switching Networks.
Switching breaks up large collision domains into smaller ones Collision domain is a network segment with two or more devices sharing the same Introduction.
William Stallings Data and Computer Communications 7 th Edition Chapter 1 Data Communications and Networks Overview.
Computer Networks with Internet Technology William Stallings
Cisco 3 - Switching Perrine. J Page 16/4/2016 Chapter 4 Switches The performance of shared-medium Ethernet is affected by several factors: data frame broadcast.
1 Lecture 15: Interconnection Routing Topics: deadlock, flow control.
BZUPAGES.COM Presentation On SWITCHING TECHNIQUE Presented To; Sir Taimoor Presented By; Beenish Jahangir 07_04 Uzma Noreen 07_08 Tayyaba Jahangir 07_33.
Unit III Bandwidth Utilization: Multiplexing and Spectrum Spreading In practical life the bandwidth available of links is limited. The proper utilization.
Super computers Parallel Processing
McGraw-Hill©The McGraw-Hill Companies, Inc., 2000 CH. 8: SWITCHING & DATAGRAM NETWORKS 7.1.
WAN Transmission Media
Networks: Routing, Deadlock, Flow Control, Switch Design, Case Studies Alvin R. Lebeck CPS 220.
CCNA3 Module 4 Brierley Module 4. CCNA3 Module 4 Brierley Topics LAN congestion and its effect on network performance Advantages of LAN segmentation in.
1 Lecture 24: Interconnection Networks Topics: communication latency, centralized and decentralized switches, routing, deadlocks (Appendix F)
Switching. Circuit switching Message switching Packet Switching – Datagrams – Virtual circuit – source routing Cell Switching – Cells, – Segmentation.
Computer Networks Chapter 8 – Circuit Switching versus Packet Switching.
Data Communication Networks Lec 13 and 14. Network Core- Packet Switching.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Tree-Based Networks Cache Coherence Dr. Xiao Qin Auburn University
1 Lecture 22: Interconnection Networks Topics: Routing, deadlock, flow control, virtual channels.
COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University
SWITCHING. Switching is process to forward packets coming in from one port to a port leading towards the destination. When data comes on a port it is.
Overview Parallel Processing Pipelining
Lecture 23: Interconnection Networks
Packet Switching Datagram Approach Virtual Circuit Approach
Azeddien M. Sllame, Amani Hasan Abdelkader
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Switching, routing, and flow control in interconnection networks
Switching Techniques In large networks there might be multiple paths linking sender and receiver. Information may be switched as it travels through various.
Communication operations
Data and Computer Communications
Data Communication Networks
PRESENTATION COMPUTER NETWORKS
Switching Techniques.
Circuit Switching Packet Switching Message Switching
Advanced Computer and Parallel Processing
Lecture: Interconnection Networks
CS 6290 Many-core & Interconnect
Advanced Computer and Parallel Processing
Switching, routing, and flow control in interconnection networks
Multiprocessors and Multi-computers
Chapter 2 from ``Introduction to Parallel Computing'',
Presentation transcript:

MESSAGE ROUTING SCHEMES IN A HYPERCUBE MACHINE S. Raghupathy, M. R. Leuze, and S. R. Schach Presented by: Syed Md. Shakir

What are Interconnected Networks and why do we need them? One way for processors to communicate data is to use a shared memory and shared variables. However this is unrealistic for large numbers of processors. A more realistic assumption is that each processor has its own private memory and data communication takes place using message passing via an Interconnection Network. The interconnection network plays a central role in determining the overall performance of a multicomputer system. If the network cannot provide adequate performance, for a particular application, nodes will frequently be forced to wait for data to arrive.

Parallel Computers Large-scale parallel computers are potential candidates for providing very high computational power These systems are usually organized as an ensemble of nodes, each with its own processor, local memory, and other supporting devices. The nodes are interconnected using a variety of topologies that can be classified into two broad categories: Direct Indirect.

Direct Networks In direct networks, each node has a point-to-point or direct connection to some of the other nodes, called neighboring nodes; examples of direct network topologies include hypercube, mesh, and tree.

Indirect Networks In indirect networks, the nodes are connected to other nodes or a shared memory through one or more switching elements. Examples of indirect networks include crossbar, bus, and multistage interconnection networks. Multistage interconnected Network

Indirect Network Cross Bar

Communication Latency The communication latency of direct networks depends on several factors including switching, routing, flow control, and topology. Several switching techniques have been proposed for direct networks. Wormhole switching has emerged as a popular technique and has been used in both commercial and experimental systems. Wormhole switching can be employed in both direct and indirect networks. It is widely used in contemporary multicomputer because of its low latency and requirement of small buffers at the nodes.

cont... The mesh is an asymmetrical topology in which the node degree depends on its location. Interprocessor communication performance depends on the location of source and destination. The torus and hypercube are symmetrical topologies in which the degree of a node is the same irrespective of its location in the network. Thus, unlike the mesh, all the nodes in tori and hypercubes are identical in connectivity.

Routing in Parallel Computers Parallel computers are modeled by directed graphs All interconnections between processors (nodes) occur in synchronous steps Each link can carry at most one unit message (packet) in one step During a step, a node can send at most one packet to each of its neighbors Each node is uniquely identified by a number between 1 and N

Switching Techniques In most multicomputer systems, a message enters the network from a source node and is switched or routed towards its destination through a series of intermediate nodes. Four types of switching techniques are usually used for this purpose: circuit switching packet switching virtual cut-through switching wormhole switching.

Circuit Switching In circuit switching, a dedicated path is established between the source and the destination before data transfer initiates. Once the data transfer is initiated the message is never blocked. As the channels creating the path are reserved exclusively, buffering of data is not required. On the other hand, establishing the path requires significant overhead: during the data-transmission phase, all channels are reserved for the entire duration of message transfer. Circuit switching thus degrades performance and is no longer used in commercial multicomputer systems.

Packet Switching In packet switching, a message is divided into packets that are independently routed towards its destination. The destination address is encoded in the header of each packet. The entire packet is stored at every intermediate node and then forwarded to the next node in its path. The main advantage of packet switching is that the channel resource is occupied only when a packet is actually transferred.

Packet Switching cont... Each packet contains the routing information and alternative paths can be selected upon encountering network congestion or faulty nodes. The major drawback of packet switching Since the packet is stored entirely at each intermediate node, the time to transmit a packet from source to destination is directly proportional to the number of hops in the path. At each intermediate node, we need buffer space to hold at least one packet.

Virtual Cut Through In order to reduce the time to store the packets at each node, Kermani and Kleinrock introduced a technique called virtual cut-through In this, while routing toward its destination, a message is stored at an intermediate node only if the next channel required is occupied by another packet. Now, the distance between the source and destination has little effect on communication latency.

cont... In an extreme case, when a message encounters blocking at all the intermediate nodes, the virtual cut-through technique reduces to packet switching. The disadvantage of the virtual cut-through technique Implementation cost: each node must provide sufficient buffer space for all the messages passing through it, and because multiple messages may be blocked at any node, a very large buffer space is required at each node. This implementation constraint limits the use of virtual cut-through technique.

Wormhole Switching Wormhole switching is a variant of the virtual cut-through technique that avoids the need for large buffer spaces. In wormhole switching, a packet is transmitted between the nodes in units of flits, the smallest units of a message on which flow control can be performed. The header flit(s) of a message contains all the necessary routing information and all the other flits contain the data elements. The flits of the message are transmitted through the network in a pipelined fashion.

cont... Since only the header flit(s) has the routing information, all the trailing flits follow the header flit(s) contiguously. Flits of two different messages cannot be interleaved at any intermediate node. Successive flits in a packet are pipelined asynchronously in hardware using a handshaking protocol. When the header flit is blocked, then all the trailing flits occupy the buffers at the intermediate nodes.

Wormhole Switching Message format and routing in Wormhole Switching D Messages D H Packets Flits D D D D D D D D D D D D D D H D: Data Flit H: Header Flit (a) (b) Message format and routing in Wormhole Switching

Advantages of Wormhole Switching The main advantage of wormhole switching derives from the pipelined message flow since transmission latency is insensitive to the distance between the source and destination. Moreover, since the message moves flit by flit across the network, each node needs to store only one flit. Some implementations, however, require storage of multiple flits at each node to improve routing performance. The reduction of buffer requirements at each node has a major effect on the cost and size of multicomputer systems.

Disadvantages of Wormhole Switching The main disadvantage of wormhole switching comes from the fact that only the header flit has the routing information. If the header flit cannot advance in the network due to resource contention, all the trailing flits are also blocked along the path and these blocked messages can block other messages. This chained blocking can also lead to deadlock where messages wait for each other in a cycle and hence no message can advance any further.

cont... Prevention of deadlock is one of the main issues in wormhole switching, and is usually accomplished by a suitable choice of routing function that selectively prohibits messages from taking all the available paths, thus preventing cycles in the network. Selection of a routing algorithm is thus a major issue in wormhole-switched networks.

Hypercube Network An n-dimensional hypercube network: Number of nodes: N = 2n Degree: n The node i with address (i1, i2, …, in)  {0, 1}n and the node j with address (j1, j2, …, jn)  {0, 1}n are connected if the hamming distance between (i1, i2, …, in) and (j1, j2, …, jn) is 1

Hypercube Topology

4d Hypercube K dimensional hypercube is formed by combining two k-1 dimensional hypercubes and connecting corresponding nodes i.e. hypercubes are recursive, each node is connected to k other nodes i.e. each is of degree k.

Static routing in Hypercube Given a source node Ns Destination node Nd The addresses of the 2n processors can be represented using n bits. Then the next node on the route from Ns to Nd is the node represented by bit pattern (en-l, . . ., cl, CO) with bit i flipped, that is to say, the message is routed in dimension i The algorithm continues in this way until the message arrives at node Nd.

Static routing Algorithm: Given a destination address d(i) and an intermediate node (i) Compare the bits of d(i) with (i) from left to right Identify the first bit position at which these two addresses differ Route this packet to its neighbor n(i) such that (i) and n(i) differ only in this bit position

Static Routing Algorithm Example: Source: (0, 0, 0, 0, 0, 0) Destination: (1, 0, 1, 0, 1, 1) (0, 0, 0, 0, 0, 0)  (1, 0, 0, 0, 0, 0)  (1, 0, 1, 0, 0, 0)  (1, 0, 1, 0, 1, 0)  (1, 0, 1, 0, 1, 1)

Advantages and Disadvantages No overhead for calculating new routes. Same CPU cycles can be used for other computational purpose. Disadvantage Blocking is a common consequence.

Dynamic routing It allows every message to select the (locally) optimal route under the current circumstances. In Dynamic routing, if link is blocked then attempt is made to pass the message through other link. More utilization of the network It uses local knowledge.

Dynamic routing Allows the message to route from Ns, to Nd ,depending on circumstances. Allows optimal route under the current circumstances; Overhead of implementing dynamic routing. At each node calculations have to be performed to determine the next node to which the message should be routed, and links have to be tested to see which ones are free.

Advantages And Disadvantages Blocking is not a major problem Disadvantages: overhead of implementing dynamic routing. At each node calculations have to be performed to determine the next node to which the message should be routed, links have to be tested to see which ones are free. The size of the overhead will vary from hypercube to hypercube. In some machines, the additional work can be done in hardware in parallel with other operations; in other machines, it must be done in software, using machine cycles that could otherwise be used for productive computing.

PRIORITIZATION If a number of messages are waiting to use a link, one method of choosing which message to transmit is on the basis of: (FIFO), the method used in commercial hypercubes. In the paper alternative prioritization schemes, such as LIFO, giving priority to the message with the maximum number of remaining hops is also considered

Other Prioritization Schema The processes form a DAG, each process can be assigned a sequence number such that every message is sent to a process with a higher sequence number than the sequence number of the process that generated the message. The sequence number of the generating process can then be used to prioritize messages

Message Format

The Prioritization Schema

The Simulator The simulator was constructed to investigate routing strategies. The header contains information such as source and destination node, as well as information needed when the order of transmission of messages is done on the basis of prioritization, such as sequence number, time generated, arrival time at the current node, and number of hops that still have to be traversed.

Execution Cycle Of The Simulator The simulator has three phases Message generation Message ordering Message routing.

Message Generation Phase In this phase each active process is checked to see if it has received all the messages it requires. If so, the messages it is to transmit are generated, and placed in the message buffer. The process then terminates. After all possible messages have been generated, the simulator enters the message ordering phase.

Message Ordering Phase After entering the message ordering phase the messages in each buffer are ordered according to the prioritization scheme currently being evaluated. In the case of equal priorities, ties are broken randomly. Finally, the message routing cycle commences.

Message Routing Phase After each message is fetched from the message buffer and an attempt is made to transmit it to a neighboring node. If static routing is being used, and the predetermined link is in use, then that particular message is blocked. When dynamic routing is used, an attempt is made to transmit the message over the first unused link that will move it closer to its destination.

Results Dynamic routing performs better than static routing, but the improvement factor varies depending on the prioritization scheme. At best, the improvement is by a factor of two. Best results occur when priority is given to messages with the lowest sequence number. Results almost as good are obtained when priority is given to messages with the fewer number of hops, either in the original message or remaining to be traversed.

Results Continued... Messages of lowest sequence number are essentially those transmitted earliest in the computation sequence. Giving priority to such messages essentially speeds up the rate at which processes can begin transmitting, and hence speeds up the computation as a whole. The traffic congestion in the hypercube is decreased by giving priority to messages with the fewest numbers of hops and therefore allowing the longer messages to proceed with less blocking than would otherwise be the case. By giving priority to messages with fewer choices, the overall amount of blocking is decreased

One Bidirectional Link Between Nodes

Two Unidirectional Links Between Nodes

Percentage Improvement When Two Unidirectional Lines Are Used

Observations From The Graphs Above Having two unidirectional links improve throughput over one bidirectional link Note : Improvement depends on the prioritization scheme. The percentage improvement is rarely more than fifteen per cent, and is usually much smaller. This effect may be caused by the fact that the problem graph is a DAG, thereby imposing a directionality on the flow of messages

Conclusions Throughput of a certain class of problems on a hypercube can be increased by up an order of two through use of dynamic rather than static routing algorithms, and also by prioritizing the messages. It is likely that different prioritization schemes would yield improved throughput for other classes of problems.

Questions ?