Routing and Network Design: Algorithmic Issues Kamesh Munagala Duke University
Graph Model for the Links Model sensor nodes as vertices in a graph Gateway d(7,8) = “Length” of link “Length” of link models communication cost per bit “Length” should be a function of #bits being sent (Why?)
Specialized Nature “Geometric Random” graph Nodes on a 2D plane Each node has a fixed communication radius Correlation Structures: Spatial Gaussian models Simple AR(1) temporal models Assumptions do not always hold!
Unique Features Distributed algorithms: Reconfigure routes around failures Learning network topology Learning correlation structures Query processing Light-weight implementations: Low compute power and memory Limited communication and battery life Noisy sensing and transmission
Goals in this Lecture General algorithmic ideas capturing: Simplicity and efficiency Some performance guarantees Distributed implementations Low reliance on specific assumptions Caveats: Ideas need to be tailored to context Specialized algorithms might work better
Topics What constitutes good routing? Measures of quality Algorithm design framework Basic problem statements Spanning, shortest path, and Steiner trees Aggregation networks Location and subset selection problems Solution techniques Types of guarantees on solution quality Models of information in a sensor network Tailoring generic algorithms to specific models
Problem 1: Information Aggregation
Routing Tree Problem Statement: Route information from nodes to gateway Choose subset of edges to route data Edges “connect” all nodes to gateway Tree Property Minimize: Long-term average “cost” of routing Answer will depend on: What constitutes “cost” Correlations in data being collected
Toy Example Gateway Each node has 100 bits of information to send to gateway Value on link (edge) is the cost of transmitting one bit How should we route the bits? “Star” network
Depends on Correlations Gateway Suppose information is perfectly correlated Information from all sources together is also 100 bits! Spanning tree is optimal Cost = 100 * ( ) = 900 units Ignore cost of compression
Other Extreme: No Correlation Gateway Suppose information is not correlated at all Information from all sources together is now 400 bits Shortest path tree is optimal Cost = 100 * ( ) = 2400 units
Had we used a Spanning Tree Gateway Suppose information is not correlated at all Information from all sources together is now 400 bits Shortest path tree is optimal Cost = 100 * ( ) = 3000 units > 2400 units!
In summary… Moral of the story: Choosing good routes is important Choice depends on correlation structure Issues to address: How do we specify correlations Simple yet faithful specifications desirable Algorithms for finding (near-)optimal routes Efficient and simple to implement Reliability and “backup” routes
There could be n n-2 many spanning trees in general Exhaustive enumeration is out of question Minimum Spanning Tree Cost of MST =
Spanning Tree Algorithm “Greedy” schemes add edges one at a time in clever fashion No backtracking Kruskal's algorithm: Consider edges in ascending order of cost. Insert an edge unless doing so would create a cycle. Prim's algorithm: Start with gateway and greedily grow a tree from the gateway outward. At each step, add the cheapest edge that has exactly one endpoint in current tree.
Prim’s Algorithm: Execution
“Distributed” Algorithm? Nodes connect in arbitrary order Each node simply connects to “closest” existing neighbor Cost = 25
Guarantee on “Online” Scheme n nodes in graph Cost of “online” tree is within log n factor of cost of MST Irrespective of order in which nodes join the system! Intuition: In “star” network, “online” scheme produces MST! Natural implementation: Greedy starting from gateway Such a guarantee is called an “approximation guarantee”
Shortest Paths: OSPF Key algorithmic idea: Greedy local updates Each node v maintains “tentative” distance d(v) to gateway Initially, all these distances are infinity Each node v does a greedy check: If for some neighbor u, d(v) > d(u) + Length(u,v), then: Route v through u Set d(v) = d(u) + Length(u,v) Run this till it stabilizes
OSPF Execution ∞ ∞ ∞0 ∞
Rate of Convergence n nodes in graph 1.The protocol converges to the shortest path tree 2.The number of rounds till convergence is roughly n
Intermediate Correlations One tree for all correlation values? Both spanning and shortest path trees at once? Do-able if we settle for “nearly” optimal trees In other words, there exists a tree with: Cost at most thrice cost of MST Distances to gateway at most twice S.P. distances
Example: MST n nn n 111 Gateway 11 1 n 2 nodes Cost of MST = n 2 +n Path length = n 2 +n
Example: Shortest Path Tree n nn n 111 Gateway 11 1 n 2 nodes Cost = n 3 Path Length = n 2
Example: Balanced Tree n nn n 111 Gateway 11 1 n nodes Cost = 2n 2 Path Length = 2n
Walk on a Tree Gateway
Balancing Algorithm Gateway Walk along Spanning Tree Add shortcuts to gateway At node v: Suppose previous shortcut at u If SP(u) + Walk(u,v) > 2 SP(v) Add “shortcut” from v Walk too long! Shortcut
Example Revisited n nn n 11 1 Gateway 11 1 n nodes Walk length = 2n
Proof Idea Final Path Lengths < 2 S.P. Lengths Follows from description Final Cost < 3 MST Cost Final Cost = MST + Shortest Paths Added Suppose paths are added at …,u,v… on walk SP(u) + Walk(u,v) > 2 SP(v) Add these up: Total Walk Length > Total Length of added Paths But, Total Walk Length = 2 MST Cost
Problem 2: Sensor Location
“Most Informative” Placement Close by locations are not very “informative”
Abstraction Parameters: Each node v has communication cost to gateway = c v Depends on location Subset S of nodes has “information” f(S) Information is a property of a set of nodes Depends on whether “close by” nodes also in set Problem Statement: Choose set S so that: Sum of costs of nodes in S is at most C Maximize Information = f(S)
Algorithmic Issue Number of subsets of n locations = 2 n Inefficient to enumerate over them Given subset S, how do we compute f(S) Needs a correlation model among locations Communication costs are not additive Also depend on location of nodes!
Information Functions f(S) = Entropy of S Correlations are multidimensional Gaussian: = Covariance matrix between locations Entropy log det( ) Covariance(j,k) exp(-dist(j,k) 2 / h 2 )
Properties of f(S) A B v Location v is more informative w.r.t A than w.r.t B Property 2: f(A+v) - f(A) ≥ f(B+v) - f(B) Property 1: f(A+v) ≥ f(A)
Greedy Algorithm Start with S = Repeat till cost of S exceeds C: Choose v such that: ( f(S+v) - f(S) ) / c v is maximized “Information gain per unit cost” Add v to S
Analysis Suppose: All costs c v = 1 O = Best Information set of size at most C At any stage, if adding v is best greedy decision Adding entire O cannot give more information per unit cost! f(S + v) - f(S) ≥ ( f(S + O) - f(S) )/C ≥ ( f(O) - f(S) )/C Let d(S) = f(O) - f(S) = Deficit w.r.t. Optimal Solution Implies: d(S) - d(S+v) ≥ d(S) / C
Analysis d(S+v) ≤ d(S) (1 - 1/C) d(Initial) = f(O) d(Final) = f(O) - f(Final) f(O) - f(Final) = d(Final) ≤ d(Initial) ( 1 - 1/C ) C ≤ f(O) / 2 Implies: f(Final) ≥ f(O) / 2 Greedy set has information at least 1/2 information in optimal set
Two-Level Routing Aggregation Hub
Clustering Optimal placement of cluster-heads Minimize routing cost
K-Means Algorithm Start with k arbitrary leaders Repeat Steps 1 and 2 till convergence: Step 1: Assign each node to closest leader Yields k “clusters” of nodes Step 2: For each cluster, choose “best” leader Minimizes total routing cost within cluster
Analysis Convergence is guaranteed: Each step reduces total distance Step 1: Each node travels smaller distance Step 2: Each cluster’s routing cost reduces Rate of convergence: Fast in practice Quality of solution: “Local” optimum depending on initial k nodes Need not be best possible solution Works very well in practice