MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer.

Slides:



Advertisements
Similar presentations
The Average Case Complexity of Counting Distinct Elements David Woodruff IBM Almaden.
Advertisements

S YSTEM -W IDE E NERGY M ANAGEMENT FOR R EAL -T IME T ASKS : L OWER B OUND AND A PPROXIMATION Xiliang Zhong and Cheng-Zhong Xu ICCAD 2006, ACM Trans. on.
Wavelet and Matrix Mechanism CompSci Instructor: Ashwin Machanavajjhala 1Lecture 11 : Fall 12.
CS4432: Database Systems II
Fast Algorithms For Hierarchical Range Histogram Constructions
Summarizing Distributed Data Ke Yi HKUST += ?. Small summaries for BIG data  Allow approximate computation with guarantees and small space – save space,
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
Augmenting Data Structures Advanced Algorithms & Data Structures Lecture Theme 07 – Part I Prof. Dr. Th. Ottmann Summer Semester 2006.
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
From Counting Sketches to Equi-Depth Histograms CS240B Notes from a EDBT11 paper entitled: A Fast and Space-Efficient Computation of Equi-Depth Histograms.
1 Distributed Adaptive Sampling, Forwarding, and Routing Algorithms for Wireless Visual Sensor Networks Johnsen Kho, Long Tran-Thanh, Alex Rogers, Nicholas.
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Probabilistic Threshold Range Aggregate Query Processing over Uncertain Data Wenjie Zhang University of New South Wales & NICTA, Australia Joint work:
Balanced Binary Search Trees
Target Tracking Algorithm based on Minimal Contour in Wireless Sensor Networks Jaehoon Jeong, Taehyun Hwang, Tian He, and David Du Department of Computer.
Dynamic Index Coding Broadcast Station N N Michael J. Neely, Arash Saber Tehrani, Zhen Zhang University of Southern California Paper available.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
Deterministic Wavelet Thresholding for Maximum-Error Metrics Minos Garofalakis Bell Laboratories Lucent Technologies 600 Mountain Avenue Murray Hill, NJ.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.
Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams Amit Manjhi, Suman Nath, Phillip B. Gibbons Carnegie Mellon University.
A Hierarchical Energy-Efficient Framework for Data Aggregation in Wireless Sensor Networks IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 55, NO. 3, MAY.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Flow Algorithms for Two Pipelined Filtering Problems Anne Condon, University of British Columbia Amol Deshpande, University of Maryland Lisa Hellerstein,
Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks Xueyan Tang School of Computer Engineering Nanyang Technological.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
Scalable Approximate Query Processing through Scalable Error Estimation Kai Zeng UCLA Advisor: Carlo Zaniolo 1.
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Paper By : Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong Instructor :
Computing and Communicating Functions over Sensor Networks A.Giridhar and P. R. Kumar Presented by Srikanth Hariharan.
07/21/2005 Senmetrics1 Xin Liu Computer Science Department University of California, Davis Joint work with P. Mohapatra On the Deployment of Wireless Sensor.
March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
On Energy-Efficient Trap Coverage in Wireless Sensor Networks Junkun Li, Jiming Chen, Shibo He, Tian He, Yu Gu, Youxian Sun Zhejiang University, China.
Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.
Energy-Efficient Monitoring of Extreme Values in Sensor Networks Loo, Kin Kong 10 May, 2007.
SIMPLE: Stable Increased Throughput Multi-hop Link Efficient Protocol For WBANs Qaisar Nadeem Department of Electrical Engineering Comsats Institute of.
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
The Cost of Fault Tolerance in Multi-Party Communication Complexity Binbin Chen Advanced Digital Sciences Center Haifeng Yu National University of Singapore.
Space-Efficient Online Computation of Quantile Summaries Michael Greenwald & Sanjeev Khanna University of Pennsylvania Presented by nir levy.
1 Online Computation and Continuous Maintaining of Quantile Summaries Tian Xia Database CCIS Northeastern University April 16, 2004.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University.
Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
Lower bounds on data stream computations Seminar in Communication Complexity By Michael Umansky Instructor: Ronitt Rubinfeld.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
1 the BSTree class  BSTreeNode has same structure as binary tree nodes  elements stored in a BSTree are a key- value pair  must be a class (or a struct)
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Data Driven Resource Allocation for Distributed Learning
The Stream Model Sliding Windows Counting 1’s
Computing and Compressive Sensing in Wireless Sensor Networks
Spatial Online Sampling and Aggregation
Data Integration with Dependent Sources
Ch 6: Heapsort Ming-Te Chi
CSE2331/5331 Topic 7: Balanced search trees Rotate operation
Approximation and Load Shedding Sampling Methods
Maintaining Stream Statistics over Sliding Windows
Presentation transcript:

MBG 1 PODS 04, June 2004 Power Conserving Computation of Order-Statistics over Sensor Networks Michael B. Greenwald & Sanjeev Khanna Dept. of Computer & Information Science University of Pennsylvania

MBG 2 PODS 04, June 2004 Sensor Networks Cheap => plentiful, large scale networks Low power => long life Easily deployed, wireless => hazardous/inaccessible locations Base Station w/connection to outside world => info flows to station & we can extract data, Self-configuring => just deploy, and network sets itself up

MBG 3 PODS 04, June 2004 Management & Use of Sensor Nets Power Consumption and Network Lifetime Battery replacement is difficult. Network lifetime may be a function of worst- case battery life. Dominant cost is per-byte cost of transmission Minimize worst-case power consumption at any node, maximize network lifetime.

MBG 4 PODS 04, June 2004 Management & Use of Sensor Nets Primary operation is data extraction Declarative DB-like interface Issue query, answers stream towards basestation Not necessarily consistent with power concerns COUNT, MIN, MAX easy to optimize: aggregate in network.

MBG 5 PODS 04, June 2004 Example of simple aggregate query: MAX Primary operation is data extraction Aggregate in network Transmit only max observation of all your children O(1) per node. But what about complicated, but natural, queries, e.g. MEDIAN?

MBG 6 PODS 04, June 2004 Richer queries: Order-statistics Exact-order-statistic query:  -quantile(S,  ) returns the element with rank r =  |S| Approximate order statistic query: approx-  -quantile(S, ,  ) returns an element with rank r, (1 -  )  |S| <= r <= (1 +  )  |S| Quantile summaries query: quantile-summary(S,  ) returns a summary Q, such that  -quantile(Q,  ) returns an element with rank r, (1 -  )  |S| <= r <= (1 +  )  |S|

MBG 7 PODS 04, June 2004 Goals Respond accurately to quantile queries Balance power consumption over network (load on any two nodes differs by at most a poly-log factor) Topology independent Structure solution in a manner amenable to standard optimizations: e.g. non-holistic/decomposable

MBG 8 PODS 04, June 2004 Naïve approaches 1.Send all |S| to basestation at root – If root has few childre, then  ( |S| ) cost 2.Sample w/probability 1/(|S|  2 ) – Uniform sampling – Lost samples – Only probabilistic “guarantee” Some way to handle summaries in such a way that they can be aggregated in the network?

MBG 9 PODS 04, June 2004 Preliminaries: Representation Record min possible rank (rmin) and max rank (rmax) for each stored value: Proposition: If for all 1 <= j < |Q|, rmax(j+1)- rmin(j) <= 2  |S|, then Q is an  -approximate summary. 1,1 2,2 3,3 4,4 5,5 6,6 … 10, … 189 1,1 4,20 23,23 37,37 51,54 … 100, … ,37 51,54 … … rmin(4) rmax(5)  =.1, |S| = 100 Median = Q(S,.5) return entry w/rank = 50 (.5-  )*100 = 40, (.5+  )*100 = <= 51 <= rank(44) <= 54 <= 60 Choose largest tuple s.t. rmin <= r - 

MBG 10 PODS 04, June 2004 Preliminaries: COMBINE and PRUNE Operations COMBINE : merge n quantile summaries into one PRUNE : reduce number of entries in Q to B+1 Q = COMBINE ({Q i }) –Sort entries, compute new rmin and rmax for each entry. –Let predecessor set for a given entry (v,rmin,rmax) consist of max entry v’ from each input summary, such that v’ = v) –rmin =  rmin’ for v’ in predecessor set –rmax = 1 +  (rmax’-1) for v’ in successor set Q’ = PRUNE (Q,B) –Query for each of the [0, 1/B, 2/B, …, 1] quantiles from Q, and return that set as Q’.

MBG 11 PODS 04, June 2004 Combining summaries COMBINE ({Q i }): merge n quantile summaries into one –Sort entries, and compute “merged” rmin and rmax. Theorem: Let Q = COMBINE ({Q i }), then |S| =  |S i |, |Q| =  |Q i |, and  <= max(  i ). … 11,13 12,16 … … … … 8,10 11,12 … … … … 121, ,140 … … … |S|=300,  =.01 |S|=400,  =.01 |S|=1000,  =.02 … 140, ,166 … … … |S|=1700,  =.02

MBG 12 PODS 04, June 2004 Pruning Summary PRUNE (Q,B): reduce number of entries in Q to B+1 Theorem: Let Q’ = PRUNE (Q,B), then |S| =  |S i |, |Q’| = B+1, and  ’ <=  + 1/(2B) –2  ’|S| = (2  + (1/B))|S| –  ’ =  + 1/(2B) … 160, ,353 … … … |S|=1700,  =.02, 2  |S| = 35 [(1/B)-  (1/B)+  ] [(2/B)-  (2/B)+  ] 1/B  

MBG 13 PODS 04, June 2004 Naïve approaches 1.Send all |S| to basestation at root – If root has few childre, then  ( |S| ) cost 2.Sample w/probability 1/(|S|  2 ) – Uniform sampling – Lost samples – Only probabilistic “guarantee” 3.COMBINE all children, and PRUNE before sending to parent. – How do you choose B? – If h known and small (say, log |S|), then B=h/  and this has cost O(h/  ) – But if topology is deep (linear?), this can be O(|S|). Need an algorithm that is impervious to pathological topologies

MBG 14 PODS 04, June 2004 Topology Independence Intuition: PRUNE based on the number of observations, not on the topology. Only call PRUNE when |S| doubles. then the # of PRUNE operations is bounded by log(|S|) If the number of PRUNE operations is known, then we can choose a value of B that is guaranteed to never violate the precision guarantee. If there are log(|S|) PRUNEs, then B=log(|S|)/(2  ). If there are restrictions on when we can perform PRUNE, then the number/size of the summaries we transmit to our parents will be larger than B.

MBG 15 PODS 04, June 2004 Algorithm The class of a quantile summary Q’ is floor(log (|S’|)) Algorithm: –Receive all summaries from children. –COMBINE and PRUNE all summaries of each class with each other. –Transmit the resulting summaries to your parent. Analysis –At most log(|S|) classes –B = log(|S|)/(2  ). –Total communication cost to parent = log(|S|) log(|S|)/(2  ). Theorem: There is a distributed algorithm to compute an  -approximate summary with a maximum transmission load of O(log 2 (|S|)/(2  ) at each node.

MBG 16 PODS 04, June 2004 An Improved Algorithm New Algorithm: If we know h in advance, then keep only the log(8h/  ) largest classes. Look only at min and max of each deleted summary, and merge into the largest class, introducing an error of at most  /2h. Maximum transmission load is O((log(|S|)log(h/  ))/  ) Details in paper….

MBG 17 PODS 04, June 2004 Computing Exact Order-Statistics Initially (can choose any  ): lbound = - , hbound = + , offset = 0, rank =  |S|, p = 1; Each round: –query for 1. Q = an  -approximate summary of [lbound,hbound] 2. offset = exact rank of lbound. –r = rank-offset; lbound = Q(r-  p |S|/4), hbound = Q(r+  p |S|/4); p = p+1; After at most p = log 1/  (|S|) passes, Q is exact, and we can return the element with rank rank. Theorem: There is a distributed algorithm to compute exact order statistics with max transmission load of O(log 3 (|S|)) values at each node.

MBG 18 PODS 04, June 2004 Practical Concerns Amenable to general optimization techniques Exact: more precise e increases msg size, but reduces # of passes Simulator of Madden & Stanek Worst-case assumptions for our algorithm Summary Median

MBG 19 PODS 04, June 2004 Related Work Power mgmt: e.g. [Madden et al, SIGMOD ‘03], QUERY LIFETIME, sample rate adjusted to meet lifetime requirement. Multi-resolution histograms: [Hellerstein et al, IPSN ‘03], wavelet histograms, no guarantees, multiple passes for finer resolution. Streaming data: [GK, SIGMOD ‘01], [MRL: Manku et al, SIGMOD 98] avg-case cost, [CM: Cormode, Muthukrishnan, LATIN ‘04] avg- case cost, probabilistic, space blowup. Multi-pass algorithms for exact order- statistics, based on [Munro & Paterson, Theoretical Computer Science, 80]

MBG 20 PODS 04, June 2004 Conclusions Substantial improvement in worst-case per- node transmission cost over existing algorithms. Topology independent In simulation, nodes consume less power than our worst-case analysis Some observations on streaming data vs. sensor networks:….

MBG 21 PODS 04, June 2004 Sensor networks vs. Streaming data If sensor values change slowly => multiple passes. Uniform sampling of sensor networks is hard; unknown density, loss can eliminate entire subtree. Sensor nets are inherently parallel; no single processor sees all values. To compare 2 disjoint subtrees either (a) cost in communication and memory, or (b) aggregate values in one subtree before seeing other --- discarding info. Sensor net model is strictly harder than streaming: Input at any node is summary of prior data plus new datum

MBG 22 PODS 04, June 2004 Questions?

MBG 23 PODS 04, June 2004 Power Consumption of Nodes in Sensor Net nJ to send a single bit. No noticeable startup penalty. A 5mW processor running at 4MHz uses 4- 5nJ per instruction. Transmitting a single-bit costs as much as instructions.

MBG 24 PODS 04, June 2004 Optimizations Conservative Pruning: Discard superfluous i/B quantiles, if rmax(i+1)-rmin(i-1) < 2  |S| Early Combining: COMBINE and PRUNE different classes, as long as sum of |S i | increases class of largest input summary