Efficient and Robust Query Processing in Dynamic Environments Using Random Walk Techniques Chen Avin Carlos Brito
IPSN04 - Berkeley - 04/27/ Outline Motivation Random Walk and Partial Cover Time Efficiency Robustness Quality Load Balancing, Scalability and Latency Discussion
IPSN04 - Berkeley - 04/27/ Motivation Sensor Network as large, dense and dynamic networks Task: Query the network Common systems depend on state information stored in the nodes for proper operation and control (i.e. spanning trees, cluster heads) Critical points of failure lead to recovery mechanism Explore the properties of uncontrolled scheme like random walk Simple process, no critical point of failure, all nodes are equally unimportant at all times
IPSN04 - Berkeley - 04/27/ Random Walk Visiting the nodes of the graph in a random order At each step, a token moves to a neighbor with some distribution (simple = uniform)
IPSN04 - Berkeley - 04/27/ Random Walk for Sensor Nets Easily implemented in sensor networks: base station issues a token with a query (almost) Assumption free method, the protocol does not require knowledge of: Location Neighbors Transmission range Symmetric connection High density and redundancy are advantage
IPSN04 - Berkeley - 04/27/ Cover Time Cover Time: the expected time to visit all the nodes in a random walk (starting at the worst case node) How efficient is the process ? h ij : the expected time to go from node i to j h max : max (h ij | all nodes in the graph) Matthew’s Bound: C ≤ h max ·log(n)
IPSN04 - Berkeley - 04/27/ Cover Time Known results: Worst cases: O(n 3 ) Lollipop graph Line: O(n 2 ) Best cases: O(n·log(n)) Star Complete Graph Hypercube Grid: O(n·log 2 (n)) Random sensor networks ?
IPSN04 - Berkeley - 04/27/ Partial Cover Time (PCT) In sensor network we don’t need to consult every node How efficient is to visit 80% of the nodes ? Lemma: PCT(c) ≤ O(h max ) O(n) in Hypercube O(n·log(n)) in Grid
IPSN04 - Berkeley - 04/27/ Lemma Proof Sketch α V time when node v is first visited γ time when more than half of the nodes visited c expected time to visit more than half of the nodes E[γ] 2k+1 γ αiαi αjαj k+1 (k+1) γ ≤ ∑ α V E[γ] ≤ 1/(k+1) ∑E[ α V ] ≤ (2k+1)/(k+1)h max c < 2h max
IPSN04 - Berkeley - 04/27/ Outline Overview of our approach Random Walk and Partial Cover Time Efficiency Robustness Quality Load Balancing, Scalability and Latency Discussion
IPSN04 - Berkeley - 04/27/ Efficiency – Simple Walk Number of steps normalize to n % of Cover 3.12 Grid Random 15 Random 19 Hyper Cube
IPSN04 - Berkeley - 04/27/ Biased Random Walk Can we improve this results? Give priority to unvisited nodes Define bias parameter: 0 ≤ bias ≤ 1 Visited neighbor selected with probability (1- bias) / d Unvisited with (1- bias) / d + bias / d u The protocol remain (almost) the same
IPSN04 - Berkeley - 04/27/ Biased Random Walk Number of steps normalize to n % of Cover Bias = 0 Bias = 0.1 Bias = 0.2 Bias = 0.4 Bias = 0.6 Bias = 0.8 Bias =
IPSN04 - Berkeley - 04/27/ Comparison with Clustering Analytical result for Cluster Head scheme shows that the number of messages for optimal protocol on grid require ≈ 0.945n 7/6 The efficiency of both systems is similar
IPSN04 - Berkeley - 04/27/ Outline Overview of our approach Random Walk and Partial Cover Time Efficiency Robustness Quality Load Balancing, Scalability and Latency Discussion
IPSN04 - Berkeley - 04/27/ Robustness to Dynamics The probability that a node will fail when it has the token is negligible No critical point of failure (but do need reliable token passing) All we require is connectivity in the token neighborhood Robust to independent and dependent failures (disaster areas)
IPSN04 - Berkeley - 04/27/ Spanning tree in dynamic env. Nodes close to the root are more important When a node fails all nodes in the sub-tree are disconnected from the root and must participate in recovery mechanism Assuming independent failure (or duty cycle) probability p, (q=1-p) the expected number of nodes to report is O(q h ) Since R << network area, h is large p=0.1. h=10 65% will not report to the root.
IPSN04 - Berkeley - 04/27/ Outline Overview of our approach Random Walk and Partial Cover Time Efficiency Robustness Quality Load Balancing, Scalability and Latency Discussion
IPSN04 - Berkeley - 04/27/ How far are the unvisited nodes from visited ones ? 90% are at most 2 hops Expected random walk will not leave large area uncovered Quality of Partial Cover - 1
IPSN04 - Berkeley - 04/27/ Quality of Partial Cover - 2 How long must a node wait before a walk will visit its neighborhood? 85% are visited at most every other run At most will need to wait 4 runs
IPSN04 - Berkeley - 04/27/ Application Example Find the histogram of the data in the network Assume non uniform distribution Token report after seeing 80% of the nodes
IPSN04 - Berkeley - 04/27/ Outline Overview of our approach Random Walk and Partial Cover Time Efficiency Robustness Quality Load Balancing, Scalability and Latency Discussion
IPSN04 - Berkeley - 04/27/ Load Balancing The stationary distribution of the Markov chain π = (π 1, …, π n ) is π i =d i /2m In regular graphs π is uniform, but this only after long walks Here we issue many “short” walks
IPSN04 - Berkeley - 04/27/ Scalability 2.92n 3.37n X 16
IPSN04 - Berkeley - 04/27/ Latency Random walk is sequential process The latency is proportional to the number of steps to accomplish the task Reduce the range of applicability Future work: combine result from few parallel random walks in the network
IPSN04 - Berkeley - 04/27/ Discussion Achieving control in highly dynamic env. is problematic, and in many cases not energy efficient do to recovery mechanism How do we do with uncontrolled process such as random walk? Not Bad ! Not applicable in all cases, but, When applicable provides an elegant, simple and efficient solution