Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams Amit Manjhi, Suman Nath, Phillip B. Gibbons Carnegie Mellon University Intel Research Pittsburgh
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '052 Background: Sensors Constraints: –Conserving battery power is important –Communication consumes orders of magnitude more energy than local computation –Operate in dynamic, harsh environments Battery-powered tiny devices –Used in Eco-system monitoring at James Reserve, Habitat monitoring at Great Duck Island, etc.
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '053 Background: Sensor networks In-network aggregation is performed to save communication Important type of query is computing aggregates e.g., total number of live sensors Count 3 3
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '054 Existing energy-efficient in-network approaches: Tree and Multi-path Tree [ TinyDB, Cougar ] Multi-path [ Considine et al. ICDE ‘04 ] + Robust Topology - Approximate answer+ Exact answer - Non-robust topology
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '055 Tree and Multi-path Tradeoffs Can we get the best of both by adapting to changing loss rates? Robust topology Exact answer Loss rate varies with change in conditions
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '056 Our solution: Tributary-Delta Simultaneously run Tree and Multi-path in different parts of the network As energy-efficient as tree or multi-path Multi-path region adapts to loss rate Delta (Multi-path region) Tributary (Tree region)
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '057 Outline Background and motivation Tributary-Delta Simple aggregates in TD framework Frequent Items in TD framework Evaluation Related work and conclusion
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '058 How does Tributary-Delta work? Correctness: A tree node should not receive aggregates from a multi-path node Gives rise to a delta at the centre (multi-path aggregation is used in the nodes at the centre) Delta (Multi-path region) Tributary (Tree region) Delta T T T T
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '059 How does Tributary-Delta adapt? Delta Tributary TD-Coarse: uniform expansion TD: focused expansion Expand or shrink the delta region Expand delta increases robustness Shrink delta lowers approximation error
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0510 Computing Aggregates in the Tributary-Delta Framework Tree Algorithm: Generate tree partial results 1. Each tree node 2. Each multi-path node 3. Nodes at the boundary Multi-path Algorithm: Generate multi-path partial results Conversion Function: Convert tree results to multi-path results
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0511 Example Aggregates Many useful aggregates can be readily computed within the Tributary-Delta framework –Missing piece: a suitable conversion function We provide conversion functions for several aggregates –Count –Sum, Average –Top-k –Uniform sample
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0512 Computation of “Count” Computation of “Count” 1.Tree Algorithm is simple 2.Multi-path Algorithm [AMS STOC’96] 3 1 a)T T T H: report 3 b)Probability of obtaining ‘i’ proportional to 2 -i c)To combine multi-path partial values, take the maximum d)Max. value is i, estimate is 2 i 3. Conversion function: receive count 3, repeat “coin toss” 3 times, and take maximum
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0513 Outline Background and motivation Tributary-Delta Simple aggregates in TD framework Frequent Items in TD framework Evaluation Related work and conclusion
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0514 Finding Frequent Items Tree Algorithm: –Previous work [ Greenwald, Khanna PODS ’04, Manjhi et al. ICDE ‘05 ] –Our tree algorithm achieves optimal bound for total communication Multi-path Algorithm: –Previous work [ Nath et al. SenSys ’04 ] –Our multi-path algorithm is more accurate than previous work Conversion Function
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0515 Formal Problem Statement … Approximate Answers Formulate problem as [ Manku, Motwani VLDB’02, Manjhi et al. ICDE ’05 ] Frequency Counts 1% 0.9% Find items that are more frequent than 1% with error 0.1%
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0516 Framework for finding Freq. Items 1. Add frequency counts from children 3. Drop counters that are below zero 2. Decrement frequency counts These steps are repeated at each internal node; decrements depend on height in the tree
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0517 How much to decrement at different levels? Error LeafRoot Exact Max possible error Height Minimizes communication on any link Need to balance two competing pressures: 1.Early reduction of data (near leaf) 2.Informed reduction of data (near root) Minimizes total communication Late Drop Early Drop Geometric decrease in decrement, e.g.: 0.5%, 0.25%, 0.125%,… 0.5%, 0.75%, 0.875%,…., =0.1%
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0518 Multi-path Algorithm for Freq. Items 1. Add Duplicate insensitive addition 3. Drop counters below zero 2. Decrement Duplicate insensitive subtraction 2. Drop counters below (rising) threshold Threshold is maintained based on careful analysis Paper has details on lowering communication
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0519 Outline Background and motivation Tributary-Delta Simple aggregates in TD framework Frequent Items in TD framework Evaluation Related works and conclusion
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0520 Evaluation Methodology The TAG Simulator [ Madden et al. OSDI ‘02 ] Topology: 600 random sensors in 20 x 20 – Base station is at the center Approaches: – Tree-based scheme: TAG – Multi-path scheme: Synopsis Diffusion [Nath et al. SenSys ‘04] – TD-Coarse: uniform expansion – TD: focused expansion
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0521 Effects of regional loss rate Loss rate = 0.05 Varying loss rate All four approaches use same energy
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0522 Effects of global loss rate Varying loss rate 1. Our methods effectively combine the benefits: perform better than either existing approach 2. All four approaches use same energy
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0523 Computation of frequent items False positives < 3% Loss rate = 0.05 Varying loss rate Data from real sensor deployment
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0524 Other results in paper Adaptation details Tree construction algorithm that reduces communication 2-approximation for total and maximum load, and extension to quantiles More extensive evaluation
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0525 Related Work Existing in-network aggregation algorithms –Tree: TinyDB [Madden et al. SIGMOD ’03] –Multi-path: Considine et al. ICDE ’04, Bawa et al. SIGMOD ’04, Nath et al. SenSys ‘04 Adapting to changes in the environment –Directed Diffusion [Intanagowiwat et al. MobiCOM ’00], TAG [Madden et al. OSDI ’02] Frequent items and quantiles –Manku, Motwani VLDB ’02, Greenwald, Khanna PODS ’04, Manjhi et al. ICDE ‘05
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0526 Conclusion Tributary-Delta: energy-efficient, and robust solution –Combines benefits of existing tree- and multi- path based approaches –Adapt to changing network conditions Algorithms for finding frequent items Results confirm the advantages –Error reduction is up to a factor of 3
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0527 Future Work Deployment in a real scenario — incorporate in TinyDB Add other aggregates to the suite of aggregates
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0528 Back-up slides!
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0529 Adaptation Details Ask application for a threshold on percentage contributing Base station gets overall numbers on % contributing < > Decrease delta regionIncrease delta region
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0530 Tree Construction Algorithm – (1/2) Ring 2 Tree links are subset of ring links Avoid expensive synchronization
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0531 Tree Construction Algorithm – (2/2) Ring 2 Opportunistic parent switching: Each node of height i+1 should have at least 2 nodes of height i Each i+1 height node pins any two of its height i nodes, and then flags itself. Any non-pinned node can switch parent to a non-flagged node
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0532 Multi-path over Rings Each node transmits once = optimal energy cost (same as Tree) Ring 2 A node is in ring i if it is i hop away from the base-station Broadcasts by nodes in ring i are received by nodes in ring i-1
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0533 A 2-approximation Solution Error LeafRoot Exact Max possible error Height Minimizes communication on any link Minimizes total communication Late Drop Early Drop 2-approx on both objectives
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0534 Minimizing total communication for quantiles Original algorithm by Greenwald, Khanna PODS ’04 Vary the size of quantiles in a geometric pattern, and the total communication is linear in the number of sensor nodes.
@ Carnegie Mellon Databases Amit Manjhi, SIGMOD '0535 Extensive Evaluation Evaluation of our frequent items tree algorithm Evaluation of our frequent items multi-path algorithm How quickly TD and TD-Coarse respond to changes in loss rates?