Copyright ©2004 Carlos Guestrin www.cs.cmu.edu/~guestrin VLDB 2004 Efficient Data Acquisition in Sensor Networks Presented By Kedar Bellare (Slides adapted.

Slides:



Advertisements
Similar presentations
Is Random Model Better? -On its accuracy and efficiency-
Advertisements

Bayesian Belief Propagation
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Davide Mottin, Senjuti Basu Roy, Alice Marascu, Yannis Velegrakis, Themis Palpanas, Gautam Das A Probabilistic Optimization Framework for the Empty-Answer.
A Hierarchical Multiple Target Tracking Algorithm for Sensor Networks Songhwai Oh and Shankar Sastry EECS, Berkeley Nest Retreat, Jan
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
Partially Observable Markov Decision Process (POMDP)
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
Exact Inference in Bayes Nets
CMPUT 466/551 Principal Source: CMU
David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.
Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Non-myopic Informative Path Planning in Spatio-Temporal Models Alexandra Meliou Andreas Krause Carlos Guestrin Joe Hellerstein.
Optimal Nonmyopic Value of Information in Graphical Models Efficient Algorithms and Theoretical Limits Andreas Krause, Carlos Guestrin Computer Science.
Tributaries and Deltas: Efficient and Robust Aggregation in Sensor Network Streams Amit Manjhi, Suman Nath, Phillip B. Gibbons Carnegie Mellon University.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Lecture 5 (Classification with Decision Trees)
Development of Empirical Models From Process Data
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Maximum Likelihood (ML), Expectation Maximization (EM)
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Lecture II-2: Probability Review
Ensemble Learning (2), Tree and Forest
Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.
Fundamentals of Hypothesis Testing: One-Sample Tests
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar.
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
11/25/2015 Wireless Sensor Networks COE 499 Localization Tarek Sheltami KFUPM CCSE COE 1.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Robust Estimation With Sampling and Approximate Pre-Aggregation Author: Christopher Jermaine Presented by: Bill Eberle.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Data Mining and Decision Support
2005 Unbinned Point Source Analysis Update Jim Braun IceCube Fall 2006 Collaboration Meeting.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Chance Constrained Robust Energy Efficiency in Cognitive Radio Networks with Channel Uncertainty Yongjun Xu and Xiaohui Zhao College of Communication Engineering,
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
Distributed database approach,
The Design of an Acquisitional Query Processor For Sensor Networks
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Bayesian Models in Machine Learning
Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Kalman Filters Switching Kalman Filter
Group 9 – Data Mining: Data
Variable Elimination Graphical Models – Carlos Guestrin
Kalman Filters Gaussian MNs
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Copyright ©2004 Carlos Guestrin VLDB 2004 Efficient Data Acquisition in Sensor Networks Presented By Kedar Bellare (Slides adapted from Carlos Guestrin)

Copyright ©2004 Carlos Guestrin VLDB 2004 Papers A. Deshpande, C. Guestrin, S. Madden, J. Hellerstein, W.Hong. "Model-Driven Data Acquisition in Sensor Networks," In the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, August 2004."Model-Driven Data Acquisition in Sensor Networks," A. Deshpande, C. Guestrin, W. Hong, S. Madden. "Exploiting Correlated Attributes in Acquisitional Query Processing" In the 21st International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, April "Exploiting Correlated Attributes in Acquisitional Query Processing"

Copyright ©2004 Carlos Guestrin VLDB 2004 Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie Mellon University 3 MIT 4 Intel Research - Berkeley

Copyright ©2004 Carlos Guestrin VLDB 2004 Every time step Analogy: Sensor net as a database TinyDB Query Distribute query Collect query answer or data SQL-style query Declarative interface: Sensor nets are not just for PhDs Decrease deployment time Data aggregation: Can reduce communication

Copyright ©2004 Carlos Guestrin VLDB 2004 Every time step Limitations of existing approach TinyDB Query Distribute query Collect data New Query SQL-style query Redo process every time query changes Query distribution: Every node must receive query (even when approximate answer needed) Data collection: Every node must wake up at every time step Data loss ignored No quality guarantees Data inefficient – ignoring correlations

Copyright ©2004 Carlos Guestrin VLDB 2004 Sensor net data is correlated Spatial-temporal correlation Inter-attributed correlation Data is not i.i.d. shouldnt ignore missing data Observing one sensor information about other sensors (and future values) Observing one attribute information about other attributes

Copyright ©2004 Carlos Guestrin VLDB 2004 t SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations New Query posterior belief Strengths of model-based data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries

Copyright ©2004 Carlos Guestrin VLDB 2004 Benefits of Statistical Models More robust interpretation of sensor net readings Account for biases in spatial sampling Identify faulty sensors Extrapolate missing values of sensors More efficient data acquisition Lesser number of attributes to observe Reuse of information between queries Exploit correlations – acquire data when model not able to answer query with acceptable confidence More complex queries Probabilistic queries

Copyright ©2004 Carlos Guestrin VLDB 2004 Issues introduced by Models Optimization problem Given query and model, choose data acquisition plan to best refine answer Two dependencies – statistical benefit of acquiring reading AND system costs Any non-trivial statistical model can capture first dependency Improving model-driven estimates for nearby nodes Connectivity of wireless sensnet affects second dependency

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic models and queries Users perspective: Query SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} System selects and observes subset of nodes Observed nodes: {3,6,8} Query result Node Temp Conf.98%95%100%99%95%100%98%100%

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic models - Illustration Node 0 – Interface between user and sensor net No need to query entire network Model chooses to observe voltage even though query is temperature

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic models (Contd.) Why did model choose to observe voltage instead of temperature? Correlation in Value Cost differential

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic models and queries Joint distribution P(X 1,…,X n ) Probabilistic query Example: Value of X 2 ± with prob. > 1- Prob. below 1- ? Observe attributes Example: Observe X 1 =18 P(X 2 |X 1 =18) Higher prob., could answer query Learn from historical data

Copyright ©2004 Carlos Guestrin VLDB 2004 Dynamic models: filtering Joint distribution at time t Observe attributes Example: Observe X 1 =18 Condition on observations t Fewer obs. in future queries Example: Kalman filter Learn from historical data

Copyright ©2004 Carlos Guestrin VLDB 2004 Kalman Filtering Transition model maintained for each hour of the day – mod(t,24) Evolution of system over time from to Compute using simple marginalization Next obtain posterior distribution for observations including that at (t+1)

Copyright ©2004 Carlos Guestrin VLDB 2004 Kalman Filtering (Contd.) Transition model is learned by first computing joint density Then use conditioning rule to compute the transition model

Copyright ©2004 Carlos Guestrin VLDB 2004 Supported queries Value query X i ± with prob. at least 1- SELECT and Range query X i [a,b] with prob. at least 1- which sensors have temperature greater than 25°C ? Aggregation average ± of subset of attribs. with prob. > 1- combine aggregation and selection probability > 10 sensors have temperature greater than 25°C ? Queries require solution to integrals Many queries computed in closed-form Some require numerical integration/sampling

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic queries Range queries – First marginalize multivariate gaussian – Done by dropping entries being marginalized Compute confidence of query using error function If confidence is less, make observation to improve confidence Conditioning a gaussian on value of some attributes gives another gaussian

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic queries (Contd.) Value Query Easy to estimate Also determine confidence interval for given error bound and observe attributes if needed Posterior mean can be obtained directly from mean vector conditioned on observed o

Copyright ©2004 Carlos Guestrin VLDB 2004 Probabilistic queries (Contd.) Average aggregates If we are interested in average over attributes A Define random variable The pdf of Y is given by: where 1[:] is the indicator function Once P(Y=y|o) is defined simply define a value query for random variable Y Other complex aggregate queries can be similarly answered by constructing new random variables PDF of sum of gaussians is a gaussian where: expected mean is the mean of expected values variance is weighted sum of variances Xi plus covariances of Xi and Xj

Copyright ©2004 Carlos Guestrin VLDB 2004 t SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations posterior belief What sensors do we observe ? How do we collect observations?

Copyright ©2004 Carlos Guestrin VLDB 2004 Acquisition costs Attributes have different acquisition costs Exploit correlation through probabilistic model Must consider networking cost cheaper?

Copyright ©2004 Carlos Guestrin VLDB 2004 Network model and plan format Assume known (quasi-static) network topology Define traversal using (1.5-approximate) TSP C t (S ) is expected cost of TSP (lossy communication) Cost of collecting subset S of sensor values: C(S ) = C a (S )+ C t (S ) Goal: Find subset S that is sufficient to answer query at minimum cost C(S )

Copyright ©2004 Carlos Guestrin VLDB 2004 Choosing observation plan Is a subset S sufficient? X i 2 [a,b] with prob. > 1- If we observe S =s : R i (s ) = max{ P(X i 2 [a,b] | s ), 1-P(X i 2 [a,b] | s )} Value of S is unknown: R i (S ) = P(s ) R i (s ) ds Optimization problem:

Copyright ©2004 Carlos Guestrin VLDB 2004 Observation Plan General optimization problem is NP-hard Two algorithms Exhaustive search – exponential Greedy search Begin with empty observation plan Compute benefit R and cost C for added attribute If confidence reached, choose attribute with minimum cost Else add the attribute which has maximum benefit/cost ratio Repeat until you reach desired confidence

Copyright ©2004 Carlos Guestrin VLDB 2004 t SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation

Copyright ©2004 Carlos Guestrin VLDB 2004 Exploiting correlated attributes Extension of single plan to conditional plan Useful when cost of acquisition non-negligible Correlations exist between one or more attributes Queries of the form multi-predicate range queries Query evaluation can become cheaper by observing additional attributes If additional attributes are low-cost Reject tuple with high confidence without expensive acquisition – substantial performance gains

Copyright ©2004 Carlos Guestrin VLDB 2004 Conditional Plans

Copyright ©2004 Carlos Guestrin VLDB 2004 Conditional Plans (Contd.) Simple binary decision trees Each interior node n_j specifies binary conditioning predicate (depends on only single attribute value) Choose conditional plan with minimum expected cost

Copyright ©2004 Carlos Guestrin VLDB 2004 Cost of Conditional Plans Optimal plan Traversal cost Expected plan cost

Copyright ©2004 Carlos Guestrin VLDB 2004 Issues in Conditional Plans Need to estimate P(Tj|t) Naïve method is to scan historical data for each computation – expensive Cost model Only acquisition cost taken into account Transmission cost Size of plan to fit in RAM Add into the cost model Authors only focus on limiting plan sizes

Copyright ©2004 Carlos Guestrin VLDB 2004 Architecture

Copyright ©2004 Carlos Guestrin VLDB 2004 Optimal Conditional Plan Problem is hard Even if we are given conditional probabilities (by oracle) complexity is #P-hard – reduction from 3-SAT Even if we try to optimize our plan with respect to set of d tuples D problem is NP-complete – reduction from complexity of binary decision trees Exhaustive search Depth first search With caching and Pruning Also heuristic solutions using greedy

Copyright ©2004 Carlos Guestrin VLDB 2004 Example: Intel Berkeley Lab deployment

Copyright ©2004 Carlos Guestrin VLDB 2004 Experimental results Redwood trees and Intel Lab datasets Learned models from data Static model Dynamic model – Kalman filter, time-indexed transition probabilities Evaluated on a wide range of queries

Copyright ©2004 Carlos Guestrin VLDB 2004 Cost versus Confidence level

Copyright ©2004 Carlos Guestrin VLDB 2004 Obtaining approximate values Query: True temperature value ± epsilon with confidence 95%

Copyright ©2004 Carlos Guestrin VLDB 2004 Approximate range queries Query: Temperature in [T 1,T 2 ] with confidence 95%

Copyright ©2004 Carlos Guestrin VLDB 2004 Comparison to other methods

Copyright ©2004 Carlos Guestrin VLDB 2004 Intel Lab traversals

Copyright ©2004 Carlos Guestrin VLDB 2004 t SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation Extensions More complex queries Other probabilistic models More advanced planning Outlier detection Dynamic networks Continuous queries …

Copyright ©2004 Carlos Guestrin VLDB 2004 Conclusions Model-driven data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries Basis for future sensor network systems

Copyright ©2004 Carlos Guestrin VLDB 2004 Discussion Questions What other models apart from multivariate gaussian can be used? If other models are used will their solution be in closed form? Model-driven techniques are suitable only if test data is same as training data. Will solution be adaptable if test region is different from training region? Optimization problem is hard and expensive to compute even with heuristics. Will it work for real-time data analysis? Outlier detection is not supported for model-driven acquisition. Is there any way to do it for model-based sensor networks? If in general your needed confidence on the query is low then some nodes may not be queried at all?