ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar.

Slides:



Advertisements
Similar presentations
Copyright ©2004 Carlos Guestrin VLDB 2004 Efficient Data Acquisition in Sensor Networks Presented By Kedar Bellare (Slides adapted.
Advertisements

Efficient Processing of Top- k Queries in Uncertain Databases Ke Yi, AT&T Labs Feifei Li, Boston University Divesh Srivastava, AT&T Labs George Kollios,
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Fast Algorithms For Hierarchical Range Histogram Constructions
Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks By C. K. Toh.
Dynamic Bayesian Networks (DBNs)
David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Department of Computer Science, University of Maryland, College Park, USA TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein.
1 Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
The Fourth WIM Meeting 1 Active Nearest Neighbor Queries for Moving Objects Jan Kolar, Igor Timko.
Ensemble Learning: An Introduction
Geographic Gossip: Efficient Aggregations for Sensor Networks Author: Alex Dimakis, Anand Sarwate, Martin Wainwright University: UC Berkeley Venue: IPSN.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks Maurice Chu, Horst Haussecker and Feng Zhao Xerox Palo.
Efficient Join Processing over Uncertain Data - By Reynold Cheng, et all. Presented By Lydia & Usha.
Extending Network Lifetime for Precision-Constrained Data Aggregation in Wireless Sensor Networks Xueyan Tang School of Computer Engineering Nanyang Technological.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
Optimizing Lifetime for Continuous Data Aggregation With Precision Guarantees in Wireless Sensor Networks Xueyan Tang and Jianliang Xu IEEE/ACM TRANSACTIONS.
Experimental Evaluation
SIGMOD'061 Energy-Efficient Monitoring of Extreme Values in Sensor Networks Adam Silberstein Kamesh Munagala Jun Yang Duke University.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
SIGMOD’03 Evaluating Probabilistic Queries over Imprecise Data Reynold Cheng, Dmitri V. Kalashnikov, Sunil Prabhakar Department of Computer Science, Purdue.
On Self Adaptive Routing in Dynamic Environments -- A probabilistic routing scheme Haiyong Xie, Lili Qiu, Yang Richard Yang and Yin Yale, MR and.
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Lecture II-2: Probability Review
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Active Learning for Probabilistic Models Lee Wee Sun Department of Computer Science National University of Singapore LARC-IMS Workshop.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
2015/10/1 A color-theory-based energy efficient routing algorithm for mobile wireless sensor networks Tai-Jung Chang, Kuochen Wang, Yi-Ling Hsieh Department.
1 A Bayesian Method for Guessing the Extreme Values in a Data Set Mingxi Wu, Chris Jermaine University of Florida September 2007.
1 BitHoc: BitTorrent for wireless ad hoc networks Jointly with: Chadi Barakat Jayeoung Choi Anwar Al Hamra Thierry Turletti EPI PLANETE 28/02/2008 MAESTRO/PLANETE.
The Scientific Method Formulation of an H ypothesis P lanning an experiment to objectively test the hypothesis Careful observation and collection of D.
Department of Computer Science City University of Hong Kong Department of Computer Science City University of Hong Kong 1 A Statistics-Based Sensor Selection.
Searching for Extremes Among Distributed Data Sources with Optimal Probing Zhenyu (Victor) Liu Computer Science Department, UCLA.
Experimental Evaluation of Learning Algorithms Part 1.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
Energy-Efficient Signal Processing and Communication Algorithms for Scalable Distributed Fusion.
REECH ME: Regional Energy Efficient Cluster Heads based on Maximum Energy Routing Protocol Prepared by: Arslan Haider. 1.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
SCALABLE INFORMATION-DRIVEN SENSOR QUERYING AND ROUTING FOR AD HOC HETEROGENEOUS SENSOR NETWORKS Paper By: Maurice Chu, Horst Haussecker, Feng Zhao Presented.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Sampling and estimation Petter Mostad
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Cross-Layer Scheduling for Power Efficiency in Wireless Sensor Networks Mihail L. Sichitiu Department of Electrical and Computer Engineering North Carolina.
1 Scalable Probabilistic Databases with Factor Graphs and MCMC Michael Wick, Andrew McCallum, and Gerome Miklau VLDB 2010.
1 Low Latency Multimedia Broadcast in Multi-Rate Wireless Meshes Chun Tung Chou, Archan Misra Proc. 1st IEEE Workshop on Wireless Mesh Networks (WIMESH),
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss Pedro Domingos, Michael Pazzani Presented by Lu Ren Oct. 1, 2007.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Introduction to Wireless Sensor Networks
Distributed database approach,
A paper on Join Synopses for Approximate Query Answering
Distributions cont.: Continuous and Multivariate
Networks and Communication Systems Department
Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

ICS280 Presentation by Suraj Nagasrinivasa (1) Evaluating Probabilistic Queries over Imprecise Data (SIGMOD 2003) by R Cheng, D Kalashnikov, S Prabhakar (2) Model-Driven Data Acquisition in Sensor Networks (VLDB 2004) by A Deshpande, C Guestrin, J Hellerstein, W Hong, S Madden Acknowledgements: Dmitri Kalashnikov and Michal Kapalka

In typical sensor applications... Sensors monitor external environment continuously Sensor readings are sent back to the application Decisions are often made based on these readings

However, we face uncertainty… Typically, DB/server collects sensor readings DB cannot store “true” sensor value at all points in time  Scarce battery power  Limited network bandwidth So, readings recorded at discrete time points Value of phenomenon continuously changing As a result, DB stored reading is mostly obsolete

Scenario: Answering Minimum Query with discrete DB stored readings x 0 < y 0 : x is minimum y 1 < x 1 : y is minimum Wrong query result xy x0x0 x1x1 y0y0 y1y1 Recorded Temperature Current Temperature

Scenario: Answering Minimum Query with error-bound readings I x certainly gives the minimum temperature reading Recorded Temperature Bound for Current Temperature xy x0x0 y0y0

Scenario: Answering Minimum Query with error-bound readings II Both x and y have a chance of yielding the minimum value Which one has a higher probability? Recorded Temperature Bound for Current Temperature xy x0x0 y0y0

Probabilistic Queries Based on variation characteristics of sensor value over time:  Bounds can be estimated for possible values  Probability distribution of values defined within bounds Evaluate probability for query answers Probabilistic queries give a correct answer, instead of a potentially incorrect answer

Rest of the paper… Notation & Uncertainty Model Classification of Probabilistic Queries Evaluating Probabilistic Queries Quality of Probabilistic Queries Object Refreshment Policies Experimental Results

Notation T: A set of DB objects (e.g. sensors) a: Dynamic attribute (e.g. pressure) T i : i th object of T T i.a(t): Value of ‘a’ in ‘T i ’ at time ‘t’

Uncertainty Model [l i (t)u i (t)] T i.a(t) f i (x,t) – uncertainty pdf Uncertainty Interval U i (t) Can be extended in ‘ n ’ dimensions

Classification of Probabilistic Queries Type of Result  Value-based: returns single value E.g. Minimum query ([l,u], pdf)  Entity-based: returns set of objects E.g. Range query ({(T i, p i ), p i >0}) Aggregation  Non-Aggregate: query result for an object is independent of other objects E.g. Range query  Aggregate: query result computed from set of objects E.g. Nearest Neighbor query

Classification of Probabilistic Queries Value-based answerEntity-based answer Non- aggregate VSingleQ What is the temperature of sensor x? ERQ Which sensor has temperature between 10F and 30F? AggregateVAvgQ, VSumQ, VMinQ, VMaxQ What is the average temperature of the sensors? ENNQ, EMinQ, EMaxQ Which sensor gives the highest temperature? Query evaluation algorithms and quality metrics are developed for each class

ENNQ algorithm… Projection, Pruning, Bounding & Evaluation

ENNQ algorithm

Quality of Probabilistic Result Introduce a notion of “quality of answer” Proposed metrics for different classes of queries "Is reading of sensor i in range [l,u] ?" regular range query "yes" or "no" with 100% probabilistic query ERQ yes with pi = 95%: OK yes with pi = 5%: OK (95% it is not in [l, u]) yes with pi = 50%: NOT OK (not certain!)

Quality for Entity-Aggregate Queries "Which sensor, among n, has the minimum reading?" Recall  Result set R = {(Ti, pi)} e.g. {(T1, 30%), (T2, 40%), (T3, 30%)}  B is interval, bounding all possible values e.g. minimum is somewhere in B = [10,20] Our metrics for aggregate queries Min, Max, NN  objects cannot be treated independently as in ERQ metric  uniform distribution (in result set) is the worst case  metrics are based on entropy

Quality for Entity-Aggregate Queries H(X) entropy of random variable X (X1,…,Xn with p(X1),…, p(Xn))  entropy is smallest (i.e., 0) iff  i : p(Xi) = 1  entropy is largest (i.e., log2(n)) iff all Xi's are equally likely

Improving Answer Quality Is important to pick right update policies that will help improve answer quality  Global Choice Glb_RR (pick random)  Local Choice Loc_RR (pick random) MaxUnc (heuristic chooses max. uncertainty interval ) MinExpEntropy (heuristic choose object with minimum expected entropy)

Experiments: Simulation Set-up 1 server, 1000 sensors, limited network bandwidth, “Min” queries tested Queries arrival is a Poisson distribution Each query over a random set of 100 sensors

Results

Conclusions Probabilistic Querying for handling inherent uncertainty in sensor DBs Classification, Algorithms and Quality of Answer metrics for various query types Very general model of uncertainty which makes the algorithms not directly implement-able in any sensor network Besides, in order to achieve any reasonable energy- efficiency in sensor networks, application and network requirements that dictate sensor nodes to be awake have to be tightly coordinated. Especially in the case of multi-hop routing

Outline for ‘Model Driven Data Acquisition for Sensor Networks’ Introduction  Motivation for Model-Based Queries  Framework Concept  Model Example – Multivariate Gaussian Algorithm  Resolving Model-Based Queries  Incorporating Dynamicity  Observation Plan / Cost model Experiments  BBQ System  Results Conclusions

Motivation for Model-Based Queries Declarative Queries adopted as key programming paradigm for large sensor nets However, interpreting sensor nets as databases results in two major problems:  Misinterpretation of Data Physically observable world is a set of continuous phenomenon in both time and space Sensor readings are UNLIKELY to be random samples  Inefficient approximate queries If sensor readings are not “true” values, need for quantifying uncertainty to provide reliable answers

Motivation for Model-Based Queries Paper Contribution: To incorporate statistical models of real-world processes into sensor net query processing architecture Models help in:  Accounting for biases in spatial sampling  Identifying sensors providing faulty data  Extrapolating values for missing sensors

Framework Concept Goal: Given a query and model, to devise an efficient data acquisition plan to provide “best” possible answer Major dependencies:  Correlations between sensors captured by the statistical model Correlation between attributes for given sensor Correlation between sensors for given attribute  Specific connectivity of the wireless network

Framework Concept Observation Plan parameters * Correlations in Value * Cost Differential

Framework Concept

Model Example – Multivariate Gaussian

Resolving Model-Based Queries (Range Queries)

Resolving Model-Based Queries (Value Queries) To compute value of X i with maximum error ‘e’ and confidence ‘1-delta’:  Compute mean of X i (where o – observations)  As in range queries, find probability :

Range Queries for Gaussian Projection for Gaussian is simple – just drop unnecessary values from mean and variance matrix  The integral has to be computed.

Incorporating Dynamicity Use historical measurements to improve confidence of answers Given pdf in time ‘t’ Compute pdf at time ‘t+1’

Incorporating Dynamicity Assumption: Markovian Model Dynamicity summarized by “transition model”

Observation Plan / Cost Model What is the cost of making ‘o’ observations? C(o) = acquisition cost + transmission cost Acquisition cost: constant for each attribute Transmission cost:  Network graph  Edge weights (link quality)  Paths taken could be sub-optimal

Observation Plan / Cost Model A set of attributes (‘theta’) to observe are determined by computing expected benefit And finding… This, being similar to the traveling salesman’s problem, is best dealt with heuristic algorithms

BBQ System BBQ: A Tiny-Model Query System Uses Multivariate Gaussians Has 24 transition models – for different hour of day

Results Experiment: 11 sensors on a tree, measurements, 2/3 used for training and 1/3 for tests Methodology  BBQ builds a model based on training data  One random query / hour taken – possible observations and model is updated  The answer is compared to the measured value Compare with two other methods  TinyDB: Each query broadcasted over sensor networks using an overlay tree  Approximate-Caching: Base station maintains a view of the sensor readings

Results

Conclusion Approximate queries can be well optimized, but model of physical phenomenon is needed Defining an appropriate model is a challenge The framework works well for “fairly steady” sensor data values Statistical model is largely static with refinements to the model based on incoming queries and observations made as a result