MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer.

MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer & Information Science University of Pennsylvania

MBG 2 PODS 04, June 2004 Sensor Networks Cheap => plentiful, large scale networks Low power => long life Easily deployed, wireless => hazardous/inaccessible locations Base Station w/connection to outside world => info flows to station & we can extract data, Self-configuring => just deploy, and network sets itself up

MBG 3 PODS 04, June 2004 Management & Use of Sensor Nets Power Consumption and Network Lifetime Battery replacement is difficult. Network lifetime may be a function of worst- case battery life. Dominant cost is per-byte cost of transmission Minimize worst-case power consumption at any node, maximize network lifetime.

MBG 4 PODS 04, June 2004 Management & Use of Sensor Nets Primary operation is data extraction Declarative DB-like interface Issue query, answers stream towards basestation Not necessarily consistent with power concerns COUNT, MIN, MAX easy to optimize: aggregate in network.

MBG 5 PODS 04, June 2004 Example of simple aggregate query: MAX Primary operation is data extraction Aggregate in network Transmit only max observation of all your children O(1) per node. But what about complicated, but natural, queries, e.g. MEDIAN?

MBG 6 PODS 04, June 2004 Richer queries: Order-statistics Exact-order-statistic query:  -quantile(S,  ) returns the element with rank r =  |S| Approximate order statistic query: approx-  -quantile(S, ,  ) returns an element with rank r, (1 -  )  |S| <= r <= (1 +  )  |S| Quantile summaries query: quantile-summary(S,  ) returns a summary Q, such that  -quantile(Q,  ) returns an element with rank r, (1 -  )  |S| <= r <= (1 +  )  |S|

MBG 7 PODS 04, June 2004 Goals Respond accurately to quantile queries Balance power consumption over network (load on any two nodes differs by at most a poly-log factor) Topology independent Structure solution in a manner amenable to standard optimizations: e.g. non-holistic/decomposable

MBG 8 PODS 04, June 2004 Naïve approaches 1.Send all |S| to basestation at root – If root has few childre, then  ( |S| ) cost 2.Sample w/probability 1/(|S|  2 ) – Uniform sampling – Lost samples – Only probabilistic “guarantee” Some way to handle summaries in such a way that they can be aggregated in the network?

MBG 9 PODS 04, June 2004 Preliminaries: Representation Record min possible rank (rmin) and max rank (rmax) for each stored value: Proposition: If for all 1 <= j < |Q|, rmax(j+1)- rmin(j) <= 2  |S|, then Q is an  -approximate summary. 1,1 2,2 3,3 4,4 5,5 6,6 … 10,10 10 21 29 34 40 46… 189 1,1 4,20 23,23 37,37 51,54 … 100,100 10 24 34 40 44 …341.. 37,37 51,54 …... 40 44 … rmin(4) rmax(5)  =.1, |S| = 100 Median = Q(S,.5) return entry w/rank.5 100 = 50 (.5-  )*100 = 40, (.5+  )*100 = 60 40 <= 51 <= rank(44) <= 54 <= 60 Choose largest tuple s.t. rmin <= r - 

MBG 10 PODS 04, June 2004 Preliminaries: COMBINE and PRUNE Operations COMBINE : merge n quantile summaries into one PRUNE : reduce number of entries in Q to B+1 Q = COMBINE ({Q i }) –Sort entries, compute new rmin and rmax for each entry. –Let predecessor set for a given entry (v,rmin,rmax) consist of max entry v’ from each input summary, such that v’ = v) –rmin =  rmin’ for v’ in predecessor set –rmax = 1 +  (rmax’-1) for v’ in successor set Q’ = PRUNE (Q,B) –Query for each of the [0, 1/B, 2/B, …, 1] quantiles from Q, and return that set as Q’.

MBG 11 PODS 04, June 2004 Combining summaries COMBINE ({Q i }): merge n quantile summaries into one –Sort entries, and compute “merged” rmin and rmax. Theorem: Let Q = COMBINE ({Q i }), then |S| =  |S i |, |Q| =  |Q i |, and  <= max(  i ). … 11,13 12,16 … … 50 70 … … 8,10 11,12 … … 40 60 … … 121,125 130,140 … … 43 51 … |S|=300,  =.01 |S|=400,  =.01 |S|=1000,  =.02 … 140,163 149,166 … … 50 51 … |S|=1700,  =.02

MBG 12 PODS 04, June 2004 Pruning Summary PRUNE (Q,B): reduce number of entries in Q to B+1 Theorem: Let Q’ = PRUNE (Q,B), then |S| =  |S i |, |Q’| = B+1, and  ’ <=  + 1/(2B) –2  ’|S| = (2  + (1/B))|S| –  ’ =  + 1/(2B) … 160,183 334,353 … … 60 81 … |S|=1700,  =.02, 2  |S| = 35 [(1/B)-  (1/B)+  ] [(2/B)-  (2/B)+  ] 1/B  

MBG 13 PODS 04, June 2004 Naïve approaches 1.Send all |S| to basestation at root – If root has few childre, then  ( |S| ) cost 2.Sample w/probability 1/(|S|  2 ) – Uniform sampling – Lost samples – Only probabilistic “guarantee” 3.COMBINE all children, and PRUNE before sending to parent. – How do you choose B? – If h known and small (say, log |S|), then B=h/  and this has cost O(h/  ) – But if topology is deep (linear?), this can be O(|S|). Need an algorithm that is impervious to pathological topologies

MBG 14 PODS 04, June 2004 Topology Independence Intuition: PRUNE based on the number of observations, not on the topology. Only call PRUNE when |S| doubles. then the # of PRUNE operations is bounded by log(|S|) If the number of PRUNE operations is known, then we can choose a value of B that is guaranteed to never violate the precision guarantee. If there are log(|S|) PRUNEs, then B=log(|S|)/(2  ). If there are restrictions on when we can perform PRUNE, then the number/size of the summaries we transmit to our parents will be larger than B.

MBG 15 PODS 04, June 2004 Algorithm The class of a quantile summary Q’ is floor(log (|S’|)) Algorithm: –Receive all summaries from children. –COMBINE and PRUNE all summaries of each class with each other. –Transmit the resulting summaries to your parent. Analysis –At most log(|S|) classes –B = log(|S|)/(2  ). –Total communication cost to parent = log(|S|) log(|S|)/(2  ). Theorem: There is a distributed algorithm to compute an  -approximate summary with a maximum transmission load of O(log 2 (|S|)/(2  ) at each node.

MBG 16 PODS 04, June 2004 An Improved Algorithm New Algorithm: If we know h in advance, then keep only the log(8h/  ) largest classes. Look only at min and max of each deleted summary, and merge into the largest class, introducing an error of at most  /2h. Maximum transmission load is O((log(|S|)log(h/  ))/  ) Details in paper….

MBG 17 PODS 04, June 2004 Computing Exact Order-Statistics Initially (can choose any  ): lbound = - , hbound = + , offset = 0, rank =  |S|, p = 1; Each round: –query for 1. Q = an  -approximate summary of [lbound,hbound] 2. offset = exact rank of lbound. –r = rank-offset; lbound = Q(r-  p |S|/4), hbound = Q(r+  p |S|/4); p = p+1; After at most p = log 1/  (|S|) passes, Q is exact, and we can return the element with rank rank. Theorem: There is a distributed algorithm to compute exact order statistics with max transmission load of O(log 3 (|S|)) values at each node.

MBG 18 PODS 04, June 2004 Practical Concerns Amenable to general optimization techniques Exact: more precise e increases msg size, but reduces # of passes Simulator of Madden & Stanek Worst-case assumptions for our algorithm Summary Median

MBG 19 PODS 04, June 2004 Related Work Power mgmt: e.g. [Madden et al, SIGMOD ‘03], QUERY LIFETIME, sample rate adjusted to meet lifetime requirement. Multi-resolution histograms: [Hellerstein et al, IPSN ‘03], wavelet histograms, no guarantees, multiple passes for finer resolution. Streaming data: [GK, SIGMOD ‘01], [MRL: Manku et al, SIGMOD 98] avg-case cost, [CM: Cormode, Muthukrishnan, LATIN ‘04] avg- case cost, probabilistic, space blowup. Multi-pass algorithms for exact order- statistics, based on [Munro & Paterson, Theoretical Computer Science, 80]

MBG 20 PODS 04, June 2004 Conclusions Substantial improvement in worst-case per- node transmission cost over existing algorithms. Topology independent In simulation, nodes consume less power than our worst-case analysis Some observations on streaming data vs. sensor networks:….

MBG 21 PODS 04, June 2004 Sensor networks vs. Streaming data If sensor values change slowly => multiple passes. Uniform sampling of sensor networks is hard; unknown density, loss can eliminate entire subtree. Sensor nets are inherently parallel; no single processor sees all values. To compare 2 disjoint subtrees either (a) cost in communication and memory, or (b) aggregate values in one subtree before seeing other --- discarding info. Sensor net model is strictly harder than streaming: Input at any node is summary of prior data plus new datum

MBG 22 PODS 04, June 2004 Questions?

MBG 23 PODS 04, June 2004 Power Consumption of Nodes in Sensor Net 4000-4500 nJ to send a single bit. No noticeable startup penalty. A 5mW processor running at 4MHz uses 4- 5nJ per instruction. Transmitting a single-bit costs as much as 800-1000 instructions.

MBG 24 PODS 04, June 2004 Optimizations Conservative Pruning: Discard superfluous i/B quantiles, if rmax(i+1)-rmin(i-1) < 2  |S| Early Combining: COMBINE and PRUNE different classes, as long as sum of |S i | increases class of largest input summary

MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer.

Similar presentations

Presentation on theme: "MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer.

Similar presentations

Presentation on theme: "MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer."— Presentation transcript:

Similar presentations

About project

Feedback