Probabilistic Databases Amol Deshpande, University of Maryland.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Opportunity Knocks: A Community Navigation Aid Henry Kautz Don Patterson Dieter Fox Lin Liao University of Washington Computer Science & Engineering.
Research Challenges in the CarTel Mobile Sensor System Samuel Madden Associate Professor, MIT.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Uncertain Data Management for Sensor Networks Amol Deshpande, University of Maryland (joint work w/ Bhargav Kanagal, Prithviraj Sen, Lise Getoor, Sam Madden)
Representing and Querying Correlated Tuples in Probabilistic Databases
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
GPS and Multi-Week Data Collection of Activity-Travel Patterns Harry Timmermans Eindhoven University of Technology 4/19/2015.
Modeling Uncertainty over time Time series of snapshot of the world “state” we are interested represented as a set of random variables (RVs) – Observable.
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Accelerometer-based Transportation Mode Detection on Smartphones
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Information Retrieval in Practice
Information Agents for Autonomous Acquisition of Sensor Network Data A. Rogers and N. R. Jennings University of Southampton, UK M. A. Osborne and S. J.
Part 2 of 3: Bayesian Network and Dynamic Bayesian Network.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Improving the Accuracy of Continuous Aggregates & Mining Queries Under Load Shedding Yan-Nei Law* and Carlo Zaniolo Computer Science Dept. UCLA * Bioinformatics.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Non-invasive Techniques for Human Fatigue Monitoring Qiang Ji Dept. of Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute
Data Mining – Intro.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Bayesian Filtering for Robot Localization
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
WaveScope – An Adaptive Wireless Sensor Network System for High Data- Rate Applications PIs: Hari Balakrishan (MIT) Sam Madden (MIT) Kevin Amaratunga (Metis.
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
From Bayesian Filtering to Particle Filters Dieter Fox University of Washington Joint work with W. Burgard, F. Dellaert, C. Kwok, S. Thrun.
Making Sense of Sensors Henry Kautz Department of Computer Science & Engineering University of Washington, Seattle, WA Funding for this research is provided.
Visual Discovery Management: Divide and Conquer Abhishek Mukherji, Professor Elke A. Rundensteiner, Professor Matthew O. Ward XMDVTool, Department of Computer.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Inferring High-Level Behavior from Low-Level Sensors Don Peterson, Lin Liao, Dieter Fox, Henry Kautz Published in UBICOMP 2003 ICS 280.
Recognizing Activities of Daily Living from Sensor Data Henry Kautz Department of Computer Science University of Rochester.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
Learning and Inferring Transportation Routines By: Lin Liao, Dieter Fox and Henry Kautz Best Paper award AAAI’04.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Gap-filling and Fault-detection for the life under your feet dataset.
Processing Sequential Sensor Data The “John Krumm perspective” Thomas Plötz November 29 th, 2011.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
QUIZ!!  In HMMs...  T/F:... the emissions are hidden. FALSE  T/F:... observations are independent given no evidence. FALSE  T/F:... each variable X.
Inferring High-Level Behavior from Low-Level Sensors Donald J. Patterson, Lin Liao, Dieter Fox, and Henry Kautz.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
University of Jyväskylä Department of Mathematical Information Technology ICANNGA 2009 Mining Time Series State Changes with Prototype Based Clustering.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
Uncertain Observation Times Shaunak Chatterjee & Stuart Russell Computer Science Division University of California, Berkeley.
EN : Adv. Storage and TP Systems MauveDB: Model-based User Views.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
CSC400W Honors Project Proposal Understanding ocean surface features from satellite images Jared Tilanus Nemanja Spasic.
Learning and Inferring Transportation Routines Lin Liao, Don Patterson, Dieter Fox, Henry Kautz Department of Computer Science and Engineering University.
Probabilistic Robotics Probability Theory Basics Error Propagation Slides from Autonomous Robots (Siegwart and Nourbaksh), Chapter 5 Probabilistic Robotics.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Probabilistic Reasoning over Time
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Probabilistic Databases
Data Warehousing Data Mining Privacy
Presentation transcript:

Probabilistic Databases Amol Deshpande, University of Maryland

Overview V.S. Subrahmanian ProbView, PXML, Temporal Probabilistic Databases, Probabilistic Aggregates Lise Getoor Statistical Relational Learning, Probabilistic Relational Models, Entity Resolution Amol MauveDB: Statistical Modeling in Databases, Correlated tuples in probabilistic databases

Overview of Today’s Presentation Model-based Views/MauveDB [Amol] Statistical Relational Learning [Lise] Representing arbitrarily correlated data and processing queries over it [Prithviraj]

Overview of Today’s Presentation Model-based Views/MauveDB [Amol] Goal: Making it easy to continuously apply statistical models to streaming data Current focus on designing declarative interfaces, and on efficient maintenance algorithms Less on the “probabilistic databases” issues Statistical Relational Learning [Lise] Representing arbitrarily correlated data and processing queries over it [Prithviraj]

Motivation Unprecedented, and rapidly increasing, instrumentation of our every-day world Huge data volumes generated continuously that must be processed in real-time Typically imprecise, unreliable and incomplete data Measurement noises, low success rates, failures etc… Wireless sensor networks RFID Distributed measurement networks (e.g. GPS) Industrial Monitoring

Data Processing Step 1 Process data using a statistical/probabilistic model Regression and interpolation models To eliminate spatial or temporal biases, handle missing data, prediction Filtering techniques (e.g. Kalman Filters), Bayesian Networks To eliminate measurement noise, to infer hidden variables etc Regression/interpolation models Temperature monitoring Kalman Filters et GPS Data

A Motivating Example Inferring “transportation mode”/ “activities” [Henry Kautz et al] Using easily obtainable sensor data, e.g. GPS, RFID proximity data Can do much if we can infer these automatically office home Have access to noisy “GPS” data Infer the transportation mode: walking, running, in a car, in a bus

Motivating Example Inferring “transportation mode”/ “activities” [Henry Kautz et al] Using easily obtainable sensor data, e.g. GPS, RFID proximity data Can do much if we can infer these automatically office home Preferred end result: Clean path annotated with transportation mode

Dynamic Bayesian Network Use a “generative model” for describing how the observations were generated Time = t MtMt XtXt OtOt Transportation Mode: Walking, Running, Car, Bus True velocity and location Observed location Need conditional probability distributions e.g. a distribution on (velocity, location) given the transportation mode Prior knowledge or learned from data

Dynamic Bayesian Network Use a “generative model” for describing how the observations were generated Time = t MtMt XtXt OtOt Transportation Mode: Walking, Running, Car, Bus True velocity and location Observed location Time = t+1 M t+1 X t+1 O t+1

Dynamic Bayesian Network Given a sequence of observations (O t ), find the most likely M t ’s that explain it. Or could provide a probability distribution on the possible M t ’s. Time = t MtMt XtXt OtOt Transportation Mode: Walking, Running, Car, Bus True velocity and location Observed location Time = t+1 M t+1 X t+1 O t+1

Statistical Modeling of Sensor Data No support in database systems --> Database ends up being used as a backing store With much replication of functionality Very inefficient, not declarative… How can we push statistical modeling inside a database system ?

Abstraction: Model-based Views An abstraction analogous to traditional database views Present the output of the application of model as a database view That the user can query as with normal database views

Example DBN View UserTimeLocationModeprob John5pm(x’1, y’1)Walking0.9 John5pm(x’1, y’1)Car0.1 John5:05pm(x’2, y’2)Walking0 John5:05pm(x’2, y’2)Car1 UserTimeLocation John5pm(x1, y1) John5:05pm(x2, y2) Original noisy GPS data User view of the data - Smoothed locations - Inferred variables User e.g. select count(*) group by mode sliding window 5 minutes Application of the model/inference is pushed inside the database Opens up many optimization opportunities e.g. can do inference lazily when queried etc

Correlations UserTimeLocationModeprob John5pm(x’1, y’1)Walking0.9 John5pm(x’1, y’1)Car0.1 John5:05pm(x’2, y’2)Walking0 John5:05pm(x’2, y’2)Car1 User Strong and complex correlations across tuples - Mutual exclusivity - Temporal correlations

MauveDB: Status Written in the Apache Derby Java open source database system Support for Regression- and Interpolation-based views Neither produce probabilistic data SIGMOD 2006 (w/ Sam Madden) Currently building support for views based on Dynamic Bayesian networks [Bhargav] Kalman Filters, HMMs etc Initial focus on the user interfaces and efficient inference Will generate probabilistic data; may not be able to do anything too sophisticated with it

Research Challenges/Future Work Generalizing to arbitrary models ? Develop APIs for adding arbitrary models Try to minimize the work of the model developer Probabilistic databases Uncertain data with complex correlation patterns Query processing, query optimization View maintenance in presence of high-rate measurement streams

Thanks !! Mauve == Model-based User Views