Distributed Routing Algorithms
In a message passing distributed system, message passing is the only means of interprocessor communication. Unicast, Multicast, Broadcast Communication latency in a distributed system depends on the following factors:
Topology Routing Flow control Switching
Topology Network topology can be classified as general purpose and special purpose. A general purpose network does not have a uniform and structured formation while a special purpose network follows a predefined structure.
Switching store-and-forward that includes packet switching cut-through that includes circuit switching, virtual cut-through, and wormhole. Store-and-forward switching: a message is divided into packets that can be sent to a destination via different paths. When a packet reaches an intermediate node, the entire packet is then forwarded to the next node. Circuit switching: a physical circuit is constructed before the transmission. After the packet reaches the destination, the path is destroyed. Virtual cut-through switching: the packet is stored at the intermediate node only if the required channel is busy; otherwise, it is forwarded immediately without buffering.
Wormhole differs from virtual cut-through in two aspects: (1) Each packet is further divided into a number of flits. (2) When the required channel is busy, instead of buffering the remaining flits by removing them from the network channels, the flow control blocks the trailing flits and they stay in flit buffers along the established route.
At the system level, the main difference between store-and-forward and cut-through is that the former is sensitive to the length of the selected path while the latter, especially in wormhole routing with pipelined flits, is almost insensitive to path length in the absence of network congestion. That is, one unicasting to any destination is considered one step. The objective of using the store-and-forward model is to minimize the path length. The objective of using the cut-through model is to reduce network congestion.
Type of communication Unicast, Multicast, Broadcast. Personalized: a source sends different messages to different destinations.
Routing Routing algorithms can be classified as : Special purpose vs. general purpose Minimal vs. nonminimal Deterministic vs. adaptive Source routing vs. destination routing Fault-tolerant vs. non fault-tolerant Redundant vs. non redundant Deadlock-free vs. non deadlock-free
General vs. Special Purpose General purpose algorithms are suitable for all types of networks but may not be efficient for a particular network. Special- purpose algorithms are usually efficient by taking advantage of the topological properties of specific networks.
Minimal vs. Nonminimal Minimal-path algorithms provide a least cost path between source and destination. This scheme can lead to congestion in parts of a network. A nonminimal routing scheme may route the message along a longer path to avoid network congestion.
Deterministic vs. Adaptive In a deterministic algorithm the routing path changes only in response to topological changes in the underlying network and does not use any information regarding the state of the network. In a dynamic algorithm the routing path changes based on the traffic in the network.
Fault-tolerant vs. non Fault- tolerant In a fault-tolerant routing a routing message is guaranteed to be delivered in the presence of faults. In a non fault- tolerant routing it is assumed that no fault may occur, and hence, there is no need for the routing algorithm to dynamically adjust its activities.
Redundant vs. non Redundant A typical routing algorithm is nonredundant, i.e., for each destination one copy of the message is forwarded. In certain cases a shared path is used to forward the routing message to several destinations. For the purpose of fault tolerance, multiple copies are set to a destination via multiple edge-disjoint paths. As long as one of these paths remains healthy at least one copy will successfully reach its destination. Each destination should make sure only one copy is accepted.
Deadlock-free vs. non Deadlock- free A deadlock-free routing ensures freedom from deadlock through carefully designed routing algorithms. In a non deadlock-free routing no special provision is given to prevent or avoid the occurrence of a deadlock.
Routing functions The routing function defines how a message is routed from the source node to the destination node. Destination-dependent This routing function depends on the current and destination nodes only. Input-dependent This routing function depends on the current and destination nodes and the adjacent link (or node) from which a message is received. Source-dependent This routing function depends on the source, current, and destination nodes. Path-dependent This routing function depends on the destination node the routing path from the source node to the current node.
Dijkstra’s centralized algorithm Let D(v) be the distance (sum of link weights along a given path) from source s to node v. Let l(v,w) be the given cost between nodes v and w. There are two parts to the algorithm: An initialization step and a step to be repeated until the algorithm terminates.
1 Initialization. Set N={s}. For each node v not in N, set D(v)=l(s,v). We use ∞ for nodes not connected to s. Any number larger than the maximum cost or distance in the network will suffice. 2 At each subsequent step. Find a node w not in N for which D(w) is a minimum and add w to N. Then update D(v) for all nodes remaining that are not in N by computing: D(v)= min[D(v), D(w)+l(w,v)] Step 2 is repeated until all nodes are in N.
Ford’s distributed algorithm Each node v has the label (n,D(v)) where D(v) represents the current value of the shortest distance from the node to the destination and n is the next node along with the currently computed shortest path. 1 Initialization. With node d being the destination node, set D(d)=0 and label all other nodes (., ∞). 2 Shortest-distance labeling of all nodes. For each node v<>d do the following: Update D(v) using the current value D(w) for each neighboring node w to calculate D(w)+l(w,v) and perform the following update: D(v)=min{D(v), D(w)+l(w,v)}
An example P2 P4 P3 P1 P
Dijkstra’s centralized algorithm RoundND(1)D(2)D(3)D(4) Initial{P5} 202 1{P5,P4} 342 2{P5,P4,P2}7342 3{P5,P4,P2,P3}7342 4{P5,P4,P2,P3,P1}7342
Ford’s distributed algorithm RoundP1P2P3P4 Initial (., ) 1 (P5,20)(P5,2) 2(P3,25)(P4,3)(P4,4)(P5,2) 3(P2,7)(P4,3)(P4,4)(P5,2)
Unicasting in Special-Purpose Networks The routing algorithms in the previous section are general and are suitable for all types of network topologies. However, they may not be efficient for special-purpose networks such as rings, meshes, and hypercubes.
Bidirectional rings Deterministic unicasting on a bidirectional ring is simple: a message is forwarded along one direction (clockwise or counterclockwise) depending on the position of the destination. In multiple-path routing two paths can be used: one along the clockwise direction and the other counterclockwise direction. Two copies of the routing message are sent, one to each direction; or the message is halved and each half is forwarded to a different direction.
Meshes Adaptive routing and XY routing in 2-d mesh
Hypercubes The length of the shortest path between two nodes u and w is the Hamming distance between u and w denoted as H(u,w). The number of shortest node-disjoint paths equals the Hamming distance between the source and destination nodes. If the selection follows a predefined order, the routing is deterministic and is called e- cube routing. The multiple-path routing in hypercubes is based on the following property: If two nodes s and d are separated by k-hamming-distance in an n-cube, there are n node-disjoint paths between nodes s and d. Out of these n paths k have a length of k and the remaining n-k have a length of k+2.
An example d s node-disjoint paths between 000 and 110: Path 1: 000->100->110 Path 2: 000->010->110 Path 3: 000->001->011->111-> Path 1: 000->100 Path 2: 000->001->101->100 Path 3: 000->010->110->100
Broadcasting in Special-Purpose Networks - Rings Broadcasting in rings is: two copies of a message are sent from both directions and they terminate at the two furthermost nodes, respectively. The total number of steps is half of the number of nodes.
One-port model: a node can only forward a copy of the message to one of its neighbors in one step. All-port model: a node can forward a copy of the message to all its neighbors in one step.
Contention-free broadcasting in a wormhole-routed ring: one port For the one-port model, the best strategy is: the source s sends the message to the furthermost node in the first step. Partition the ring into two equal halves with one node that has a copy of the message in each half. The above process is repeated until all the nodes have a copy. The total number of steps is log n
Contention-free broadcasting in a wormhole-routed ring: all-port For the all-port model, using the cut-through model, the source can send the message to two nodes that are n/3 distance away where n is the total number of nodes. In the next step each of three nodes sends the message to two nodes that are n/6 distance away. In general, after k steps 3^k nodes have a copy and each sends the message to two nodes that are n/3(k+1) distance away. Basically, this approach cuts a path into three subpaths of equal length with the center node of each subpath as the only node with a copy of the routing message.
Broadcasting in a wormhole- routed mesh: one-port S 1 22
A broadcast with message- partition in 2-d meshes S Personalized broadcast of ¼ message in one row Broadcast of ¼ message in columns Collecting four ¼ messages in each row.
Hypercubes A broadcasting initiated from 000. A Hamiltonian cycle in a 3-cube.
Path-based Approach Low-channel High-channel A multicast in a 4x4 mesh
U-mesh algorithm Source: (0,0) Destinations: (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), and (3,2) The lexicographical order of destinations and source is: (0,0), (1,0), (1,1), (1,2), (1,3), (2,0), (2,1), (3,2) {(0,0), (1,0), (1,1), (1,2)} and {(1,3), (2,0), (2,1), (3,2)}
Virtual Channels
Positive network Negative network
Unidirection ring P2 P0 P1P3 P2 P0 P1P3 Ch3 Ch2 Ch1 Ch0 Cl0 Cl1 Cl2 Cl3 Ch3 Ch2 Ch1 Ch0 Cl3 Cl2 Cl1 Cl0
Unidirection ring algorithm If the source address is larger than the destination address, any channel can be used to start with; however, once a high (or low) channel is selected, the remaining steps should use high (or low) channels exclusively. If the source address is smaller than the destination, high channels are used and high virtual channels are switched to low virtual channels after crossing node P3.
Turn model
Deadlock
Four turns allowed in XY-routing
Six turns allowed in positive-first routing
Six turns allowed in negative- first routing
Adaptivity of positive-first routing s d Y X Y X s d Fully adaptive deterministic