Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.

Slides:



Advertisements
Similar presentations
Dynamic Spatial Mixture Modelling and its Application in Cell Tracking - Work in Progress - Chunlin Ji & Mike West Department of Statistical Sciences,
Advertisements

State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Uncertain Data Management for Sensor Networks Amol Deshpande, University of Maryland (joint work w/ Bhargav Kanagal, Prithviraj Sen, Lise Getoor, Sam Madden)
Representing and Querying Correlated Tuples in Probabilistic Databases
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
Dynamic Bayesian Networks (DBNs)
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Sensitivity Analysis & Explanations for Robust Query Evaluation in Probabilistic Databases Bhargav Kanagal, Jian Li & Amol Deshpande.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Sérgio Pequito Phd Student
Nonlinear and Non-Gaussian Estimation with A Focus on Particle Filters Prasanth Jeevan Mary Knox May 12, 2006.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Robust Monte Carlo Localization for Mobile Robots
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Probabilistic Databases Amol Deshpande, University of Maryland.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.
Bayesian Filtering for Robot Localization
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
Markov Localization & Bayes Filtering
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
Recap: Reasoning Over Time  Stationary Markov models  Hidden Markov models X2X2 X1X1 X3X3 X4X4 rainsun X5X5 X2X2 E1E1 X1X1 X3X3 X4X4 E2E2 E3E3.
Inferring High-Level Behavior from Low-Level Sensors Don Peterson, Lin Liao, Dieter Fox, Henry Kautz Published in UBICOMP 2003 ICS 280.
Probabilistic Robotics Bayes Filter Implementations.
OLAP : Blitzkreig Introduction 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema :
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
-Arnaud Doucet, Nando de Freitas et al, UAI
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Exploring Microsimulation Methodologies for the Estimation of Household Attributes Dimitris Ballas, Graham Clarke, and Ian Turton School of Geography University.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
Particle Filtering. Sensors and Uncertainty Real world sensors are noisy and suffer from missing data (e.g., occlusions, GPS blackouts) Use sensor models.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Network Arnaud Doucet Nando de Freitas Kevin Murphy Stuart Russell.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks Arnaud Doucet, Nando de Freitas, Kevin Murphy and Stuart Russell CS497EA presentation.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 17: Boltzmann Machines as Probabilistic Models Geoffrey Hinton.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Introduction to Sampling based inference and MCMC
Probabilistic reasoning over time
Probabilistic Data Management
Course: Autonomous Machine Learning
Probabilistic Reasoning Over Time
Graduate School of Information Sciences, Tohoku University
Probabilistic Databases
Chapter14-cont..
Finding Periodic Discrete Events in Noisy Streams
Wellington Cabrera Advisor: Carlos Ordonez
Probabilistic reasoning over time
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 7
Presentation transcript:

Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande University of Maryland

GPS Motivation Enormous amounts of Streaming data generated by Sensor Networks and other Measurement Infrastructure. Large Number of Data Stream processing systems – Focus on efficient query processing – Do not support many essential stream processing tasks. SENSOR NETWORKS NETWORK MONITORING RFID DEVICES INDUSTRIAL MONITORING Hidden Variables  Working status of a wireless sensor  GPS: Infer transportation mode based on observations of position Hidden Variables  Working status of a wireless sensor  GPS: Infer transportation mode based on observations of position 1Inferring hidden variables 2Eliminating measurement noise 3Probabilistically modeling high level events from low level sensor reading. Noise  Noise is common in all measurement devices  mass-produced, low cost Noise  Noise is common in all measurement devices  mass-produced, low cost High level events  Was there was a “shoplifting” ?  “She went for a meeting” High level events  Was there was a “shoplifting” ?  “She went for a meeting”

Essential Stream Processing Tasks Cannot be expressed as SQL queries Majority of the processing will be done outside the database All the above tasks can be seen as Filtering & Smoothing operations on dynamic probabilistic models (DPMs) We show how to extend a DBMS to support the application of DPMs

Dynamic Probabilistic Models TIMESTATUS (INFERRED) TEMP (INFERRED) 23 Working:0.8 Failed: Working:0.9 Failed: Working:0.4 Failed:0.6 OUTPUT DPM BASED VIEW A generative model that describes how observations would have been generated. Specified using probability distributions. O t = T t + small noise or O t is random based on Status Used to infer values of unobserved variable TIME = 1TIME = 2 Working. Status Real Temp Obs. Temp O3O3 S1 T1 O1O1 S2 T2 O2O2 S3 T3 FILTERINGSMOOTHING COMPUTE NEW PROBABILITY DISTRIBUTION p(S3, T3) (BAYESIAN INFERENCE) UPDATE OLD PROBABILITY DISTRIBUTIONS p(S1,T1) p(S2,T2) (BAYESIAN INFERENCE) TIME = 3 P(S1=‘W’ )= 0.8, P(S1=‘F’) = 0.2 P(T1) P(S2|S1) P(T2|T1) P(O2|S2,T2) WF W F 0.9 S1=‘W’ O1=T1 + sensor noise P(O1|S1,T1) S1=‘F’ O1 = some random value

DPM-Based Views & Particle based Representation TIMESTATUS (INFERRED) TEMP (INFERRED) 23 Working: 0.8 Failed: Working : 0.9 Failed : 0.1 idTIMESTATUSTEMPweight... ……… 123W W W F F …………… PARTICLE TABLE DPM BASED VIEW DPM-Based View  Probabilistic View  Key Attribute -> time  Correlations Particle-based Representation  Probability distribution stored as a set of samples with weights attached.  Particle Filtering + Particle Smoothing - algorithms directly on particles Inter-tuple Intra-tuple Advantages of Particle Filtering  Accuracy proportional to 1/N (independent of attributes)  Efficient query processing without altering query processing machinery  They neatly fit in database tables

System Architecture QUERY TRANSFORMER VIEW MANAGER PARTICLE TABLE DPM based View USER 1325 UPDATE MANAGER CREATE VIEW dpmview DPM hmm.dpm STREAM sensors sensors

We currently support single-table Select, Project & Aggregate queries UPDATE MANAGER Query Processing QUERY TRANSFORMER VIEW MANAGER DPM based View PARTICLE TABLE USER SELECT temp, time FROM dpmview WHERE status = “W” WITH CONFIDENCE 0.95 SELECT time, SUM(temp*weight) FROM particles p1 GROUP BY time HAVING 0.95 < SELECT SUM(weight) FROM particles p2 WHERE p1.time = p2.time AND status = ‘W’.

Implementation & Exp Results Applying DPMs to data is critical Inference using our system is accurate (< 2% error, N = 10) and efficient (20ms/tuple, N = 20) Implemented in Java using Apache Derby Data Sets 1.Intel Lab Sensor Dataset 2.Simulated GPS Moving Objects Dataset

Conclusions & Future Work Conclusions  We extend a database system that can apply DPMs to data in real time  We have developed algorithms to execute SQL-style queries on DPMs Ongoing Work  Representing inter-tuple correlations using Markov chains  Query processing algorithms that are aware of such correlations

Related Work / References Probabilistic Databases – Nilesh N. Dalvi and Dan Suciu. Efficient query evaluation on probabilistic databases. In VLDB, – R. Cheng, D. V. Kalashnikov, and S. Prabhakar. Evaluating probabilistic queries over imprecise data. In SIGMOD, Data Stream Processing – D. Carney et al. Monitoring streams - a new class of data management applications. In VLDB, – Sirish Chandrasekaran et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR, Bayesian Networks & Inference Algorithms – Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, – Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo methods in practice. Springer, – Kevin Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, UC Berkeley, 2002.