By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation
CS Gang Zhou 2 Outline Motivations, State of Art, Contributions The Q-Digest Scheme Queries on Q-Digest Experimental Evaluation Conclusions Be prepared! I have questions for you! Be prepared! I have questions for you!
CS Gang Zhou 3 Motivations Trade Computation for Communication Transmitting one bit over radio is at least three orders of magnitude more expensive in terms of energy consumption than executing a single instruction Support Aggregation Queries Need aggregated answer, not a single raw reading Quantile query Nth value Reverse quantile query Value Nth Consensus query Most frequent? Histogram
CS Gang Zhou 4 State of Art TinyDB project in Berkeley & Cougar project in Cornell Pros: Energy efficient in-network data aggregation Work very well in singleton sensor values MIN, MAX, AVERAGE, SUM, COUNT Cons: Do not deal with complex aggregate measures Median, Quantile, Reverse Quantile, Consensus [Zhao et. al. 2003] Algorithms for constructing summaries like MAX, AVG Focus more on network monitoring and maintenance [Przydatek et. al. 2003] Secure aggregation
CS Gang Zhou 5 Contributions Propose Q-Digest for Approximated Aggregation Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size Evaluate the performance of Q-Digest in Simulation
CS Gang Zhou 6 Roadmap Motivations, State of Art, Contributions The Q-Digest Scheme Queries on Q-Digest Experimental Evaluation Conclusions and Discussions
CS Gang Zhou 7 Properties of Q-Digest Each node v in tree T is a bucket; Whose range [v.min, v.max] defines the position and width of the bucket; Has counter count(v); Given the compression parameter K, a node v is in q-digest iff it satisfies: (1) If not a leaf, no high count; (2) If not the root, a node and its children should not have low count; A q-digest is a set of buckets of different sizes and their associated counts;
CS Gang Zhou 8 Building a Q-Digest Going bottom up to check whether any node violates digest property (2) If yes, delete itself and its sibling, and merge to its parent; Key feature of q-digest: Detailed information concerning data values which occur frequently are preserved in the digest, while less frequently occurring values are lumped into larger buckets resulting in information loss.
CS Gang Zhou 9 Merging Q-Digest Parent node merge Q1(n1,K) and Q2(n2,K) from children How about merging Q1(n1,k1) and Q2(n2,K2)? Each node has different communication ability Each node has different power level Powerful node can have bigger K while less powerful node can have smaller K value. Can we still get the same accuracy? Is that feasible?
CS Gang Zhou 10 Space Complexity and Error Bound (1/4) What dos it mean 3K? 3K bites? The root node does not satisfy property (2).?? 3K means 3K pairs
CS Gang Zhou 11 Space Complexity and Error Bound (2/4) What about the leaf node, which does not satisfy property (1)? It doesn’t matter, because a leaf node is not the ancestor of any node.
CS Gang Zhou 12 Space Complexity and Error Bound (3/4)
CS Gang Zhou 13 Space Complexity and Error Bound (4/4)
CS Gang Zhou 14 Representation of a Q-Digest Now to transmit the q-digest we send a set of tuple of the following form which requires a total of bits for each tuple.
CS Gang Zhou 15 Roadmap Motivations, State of Art, Contributions The Q-Digest Scheme Queries on Q-Digest Experimental Evaluation Conclusions and Discussions
CS Gang Zhou 16 Quantile Query(1/3) Quantile query: Given a fraction 0<q<1, find the value whose rank in sorted sequence of the n values is qn. Answer the query: Sort nodes in q-digest in increasing v.max; breaking ties by putting smaller ranges first; Scan the sorted list and add the counts of nodes; For some node v, the sum becomes more than qn, and the v.max is reported as the estimate of the quantile;
CS Gang Zhou 17 Quantile Query(2/3) The confidence factor Why need this? is the worst case error estimation, which only occurs for a very pathological input case What is it? Confidence factor is defined as: (maximum weight of any path from root to leaf in Q)/n
CS Gang Zhou 18 Confidence Factor Example N=15, k=5, = (maximum weight of any path from root to leaf in Q)/n = 7/15 < < = 3 * log8 / 3K = 3*3/3*5 = 9/15 = 3 * log8 / 3K = 3*3/3*5 = 9/15
CS Gang Zhou 19 Roadmap Motivations, State of Art, Contributions The Q-Digest Scheme Queries on Q-Digest Experimental Evaluation Conclusions and Discussions
CS Gang Zhou 20 Performance Evaluation Settings Routing tree Breadth first search tree Sensor field 1000 x 1000 area with 1000 sensor nodes 2000 x 2000 area with 4000 sensor nodes Sensor value Random Correlated : United States Geological Survey Compare with List scheme: List: Report all (value, count) back to base station; no in-network aggregation;
CS Gang Zhou 21 Error and Message Size 160 bytes message size can get 5% error 400 bytes message size can get 2% error
CS Gang Zhou 22 Total Data Transmission Q-digest transmit less data than list Random input needs more transmission than correlated data
CS Gang Zhou 23 Residual Power For every byte transmitted, one unit of unit of power is depleted. (How about reception?) In List, 0.02% nodes have residual power fraction less than ½. (???)
CS Gang Zhou 24 Conclusions Propose Q-Digest for Approximated Aggregation Provide Strict Theoretical Guarantees on the Approximation Quality of the Queries in Terms of the Message Size Evaluate the performance of Q-Digest in Simulation
CS Gang Zhou 25 Thank you!