Some Unsolved Mathematical Problems in Systems Area Networking Mark Stewart M elbourne O perations Re search
Abstract Cluster computing represents the only feasible way of addressing many significant and computationally challenging problems today. This in turn has created a demand for networking technologies with higher bandwidths and lower latencies than contemporary Local Area Networks. The Systems Area Network (SAN) is the answer. This talk presents a number of open problems which I encountered whilst working at a start-up company developing InfiniBand Switches for the SAN market. The problems are mathematical (and/or computer scientific) in nature and in my opinion have an enduring nature to them.
Routing Algorithms: Deadlock Freedom and Load Balancing Most SAN technologies use a class of flow control algorithms known as “credit based flow control,” in which packet loss is avoided by a downstream node holding onto a packet until it’s upstream neighbor has indicated that it has adequate resources to accept more packets. This can lead to a phenomena known as Deadlock in which a cycle of nodes are each waiting on the others to free up resources before they can make progress. Think of Grid Lock in a road network and you have the same concept
Routing Algorithms: Deadlock Freedom and Load Balancing Digression: the dynamics of how a network enters a Deadlocked state is interesting. Most (all?) of the literature focuses on how to detect the potential for deadlock, but does not deal with the expected time before a network enters such a state. The instances of deadlock I have witnessed give a expected time until deadlock well below human thresholds for perception. Far faster than any simplistic modeling would predict. A better understanding of why the time to deadlock can be so fast, might lead to a far better deadlock avoidance and recovery schemes than are currently used.
Routing Algorithms: Deadlock Freedom and Load Balancing Current Solutions: Timeouts – In the event that a switch has held a packet to long drop the packet to free up resources. Only really effective if the expected time to deadlock is large compared with timeout values. (see previous digression) Deadlock Free Routing – use a routing algorithm which guarantees that the associated buffer dependency graph is acyclic. Adaptive Routing – mitigates the problem but doesn’t solve it – also a series of talks in its own right.
Routing Algorithms: Deadlock Freedom and Load Balancing Deadlock Free Routing: Spanning Tree algorithm –Simplest of the known algorithms –Choose a spanning tree –Don’t use links that aren’t in the spanning tree –Makes poor use of network resources
Routing Algorithms: Deadlock Freedom and Load Balancing Deadlock Free Routing: Up*/Down* –Select a node to be the Hub –Order the nodes by “distance” from the Hub Any Tie Breaking rule is fine –Label the directed arcs as Up arcs if they go to a node “closer” to the hub. –Label the directed arcs as down arcs if they go to a node “further” from the hub. –Choose the shortest paths which do not route from a down arc to an up arc. –In principal better than the spanning tree algorithm but in practice …
Routing Algorithms: Deadlock Freedom and Load Balancing Deadlock Free Routing: A common network topology hub The orange switches carry no traffic. Degrades sustainable network throughput by an order of magnitude
Routing Algorithms: Deadlock Freedom and Load Balancing The Challenge Develop a Generic Deadlock Free Routing algorithm that makes better use of existing network resources Note: A related problem would be to design networks for which the existing algorithms are more appropriate. I’d lack the courage to try and sell that idea to a customer again.
Inverse Multiplexing The impact of Cut Through There are essentially two ways to build faster links –Use higher frequencies –Use more “wires” and inverse multiplexing Multiple frequencies is effectively more wires and very hard in this context –Typically a combination of both is used. Cut through is a mechanism by which switching latency is reduced through forwarding a packet before the switch has finished receiving a packet. –Any SAN switch must do this.
Inverse Multiplexing The impact of Cut Through If we have multiple “wires” between two switches should we: Use inverse multiplexing to form a faster link Use some form of load balancing across the slower wires? Conventional queuing analysis says we should inverse multiplex. –And that this is always better –A server of rate M is better than M servers of rate 1. Not so (at least not always)
Inverse Multiplexing The impact of Cut Through Cut Through has a problem when the outgoing link of a switch is faster than the incoming link. –If the packet is forwarded too soon the switch will run out of data and the packet will corrupted. A switch must delay packets when there is a rate mismatch. In lightly loaded networks this is the dominant source of network latency! For some special applications even under heavy load this is the dominant source of network latency.
Inverse Multiplexing The impact of Cut Through Question: –Does there exist a traffic pattern for which adaptive routing would not outperform the faster link? –What can we say about average performance etc.
Network Design and Network Load Many SANs are designed around Non-Blocking Topologies. –But lack the signaling infrastructure to exploit the to make use of the networks non-blocking potential. The principal advantages derived from such topologies is a modest maximum hop count and that the expected offered load to any link is less than one. –Assuming good load balancing, and a few other things
Network Design and Network Load Given that it is rare to exploit the non-blocking potential of the networks, is there a better choice of network topology? Absolutely! –As an existence proof there are other Non-Blocking Topologies of identical cost and lower average hop count. –They are not suitable for a self routing network, but that has not been a design consideration for years (decades?)
Network Design and Network Load What if we drop the Non-Blocking bit altogether? Question: What is the Topology with fewest switches? That has –N single ported end stations –Switches of valency V –A maximum offered load L on any link Under the assumption that each end station talks to every other end station at rate 1/(N-1). Shortest Path Routing is used, Ties are resolved by some load balancing heuristic. –A maximum hop count H between any two end stations.
Network Design and Network Load There is a lot of work in the literature on finding networks of fixed maximum valency, and of proscribed maximum diameter. These networks may be used to construct good but sub-optimal solutions to the above question. Note: this sub-optimality is evidenced as pockets of sub-optimality.
Network Design and Network Load For N=30, V=6, L=1, H=2 K3*C5 yields the following graph But doesn’t extend to larger N (unless we increase V,H or L)
Network Design and Network Load For N=42, V=6, L=1, H=2 The following graph is viable. Family works for any even V. Is it optimal?
Network Design and Network Load For N=84, V=6, L=1, H=3 The following graph is viable. Is it optimal?