Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.

Slides:



Advertisements
Similar presentations
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Advertisements

Uncertain Data Management for Sensor Networks Amol Deshpande, University of Maryland (joint work w/ Bhargav Kanagal, Prithviraj Sen, Lise Getoor, Sam Madden)
Cleaning Uncertain Data with Quality Guarantees Reynold Cheng, Jinchuan Chen, Xike Xie 2008 VLDB Presented by SHAO Yufeng.
Online Filtering, Smoothing & Probabilistic Modeling of Streaming Data In short, Applying probabilistic models to Streams Bhargav Kanagal & Amol Deshpande.
LAHAR: Extracting Events from Probabilistic Streams Chris Re, Julie Letchner, Magdalena Balazinska and Dan Suciu University of Washington.
Sensor Network 教育部資通訊科技人才培育先導型計畫. 1.Introduction General Purpose  A wireless sensor network (WSN) is a wireless network using sensors to cooperatively.
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Big Data Management and Analytics Introduction Spring 2015 Dr. Latifur Khan 1.
Chapter 15 Probabilistic Reasoning over Time. Chapter 15, Sections 1-5 Outline Time and uncertainty Inference: ltering, prediction, smoothing Hidden Markov.
A Generic Framework for Handling Uncertain Data with Local Correlations Xiang Lian and Lei Chen Department of Computer Science and Engineering The Hong.
Information Retrieval in Practice
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Dunja Mladenić Marko Grobelnik Jožef Stefan Institute, Slovenia.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Model-Driven Data Acquisition in Sensor Networks - Amol Deshpande et al., VLDB ‘04 Jisu Oh March 20, 2006 CS 580S Paper Presentation.
Probabilistic Databases Amol Deshpande, University of Maryland.
Sensor Data Management with Model-based View LSIR, EPFL.
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
Adaptive Stream Resource Management Using Kalman Filters Aug UCLA DB seminar.
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
Overview of Search Engines
Esri International User Conference | San Diego, CA Technical Workshops | Esri Tracking Solutions: Working with real-time data Adam Mollenkopf David Kaiser.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
WaveScope – An Adaptive Wireless Sensor Network System for High Data- Rate Applications PIs: Hari Balakrishan (MIT) Sam Madden (MIT) Kevin Amaratunga (Metis.
The Pulse of UCF James Doty EEL 6788 University of Central Florida 19 April 2010.
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Science & Technology Centers Program Center for Science of Information Bryn Mawr Howard MIT Princeton Purdue Stanford Texas A&M UC Berkeley UC San Diego.
Multimedia Databases (MMDB)
Speaker: Oscar Corcho Building Semantic Sensor Webs and Applications ESWC 2011 Tutorial 29 May 2011.
Openlab Workshop on Data Analytics 16 th of November 2012 Axel Voitier – CERN EN-ICE.
Chapter 1 Introduction to Data Mining
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Visual Discovery Management: Divide and Conquer Abhishek Mukherji, Professor Elke A. Rundensteiner, Professor Matthew O. Ward XMDVTool, Department of Computer.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Creating New Business Value with Big Data Attivio Active Intelligence Engine®
Sensor Database System Sultan Alhazmi
Big Data Analytics Large-Scale Data Management Big Data Analytics Data Science and Analytics How to manage very large amounts of data and extract value.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
High-integrity Sensor Networks Mani Srivastava UCLA.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
1 Chapter 15 Probabilistic Reasoning over Time. 2 Outline Time and UncertaintyTime and Uncertainty Inference: Filtering, Prediction, SmoothingInference:
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
CISC 849 : Applications in Fintech Namami Shukla Dept of Computer & Information Sciences University of Delaware iCARE : A Framework for Big Data Based.
Yanlei Diao, University of Massachusetts Amherst Capturing Data Uncertainty in High- Volume Stream Processing Yanlei Diao, Boduo Li, Anna Liu, Liping Peng,
EN : Adv. Storage and TP Systems MauveDB: Model-based User Views.
Provenance in Sensornet Republishing Unkyu Park and John Heidemann University of Southern California Information Science Institute June 18, 2008.
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB
Big Data Javad Azimi May First of All… Sorry about the language  Feel free to ask any question Please share similar experiences.
Retele de senzori EEMon Electrical Energy Monitoring System.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
SNS COLLEGE OF TECHNOLOGY
Introduction C.Eng 714 Spring 2010.
Probabilistic Data Management
Data Warehousing and Data Mining
Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Probabilistic Databases
Data Warehousing Data Mining Privacy
REED : Robust, Efficient Filtering and Event Detection
Big DATA.
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland

Motivation Unprecedented, and rapidly increasing, instrumentation of our every-day world Wireless sensor networks RFID Distributed measurement networks (e.g. GPS) Industrial Monitoring

Sensor Data Processing: Now Database timeidtemp 10am120 10am221.. … 10am729 Table raw-data Sensor Network 1.Extract all readings into a file 2.Run MATLAB/R/other data processing tools 3.Write output to a file/back to the database 4.Write data processing tools to process/aggregate the output (maybe using DB) 5.Decide new data to acquire User Repeat

Sensor Data Processing: What we want Database timeidtemp 10am120 10am221.. … 10am729 Table raw-data Sensor Network Models to be applied to data in real-time (at least simple ones) User timeidtemp 10am120 10am221.. … 10am729 Table processed-data Tasks Data Continuous (standing) queries e.g. alert monitoring Results to continuous queries Ad hoc queries (possibly against processed, modeled data)

Data Management Challenges Very, very large scale Spatio-temporal querying essential Need new indexing techniques, data description formats, techniques for “data ingest” (cleaning the data etc) Much work in scientific data management E.g. SkyServerSkyServer Data is typically imprecise, unreliable, or incomplete (data quality) Measurement noise, failures in sensor/GPS data High message loss rate in wireless/RFID Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

Data Management Challenges Data is generated continuously and must be processed in real-time (distributed data streams) Need different query processing paradigms Typically very high data rates Must be able to handle a large number of continuous queries efficiently Much recent work on “Data Streams” Research systems: TelegraphCQ [Berkeley], STREAM [Stanford], Aurora [Brown/MIT/Brandeis] etc… Commercial systems: Streambase, TruViso, … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

Data Management Challenges Need for real-time statistical modeling of data Eliminate spatial/temporal biases, handle missing data through extrapolation (e.g. regression, interpolation models) Filter measurement noise (e.g. Kalman Filters) Infer hidden variables, pattern recognition (e.g. HMMs) Fault or anomaly detection Forecasting/prediction (e.g. ARIMA) Regression/interpolation models Temperature monitoring Kalman Filters … GPS Data

Data Management Challenges The applications have strong acquisitional aspects Data has to be actively acquired as needed Typically high data acquisition costs(e.g. energy consumption in battery- powered devices) Data provenance Being able to trace something back to its origins Data exploration and visualization Data interoperability Data security and privacy … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.

My Research Interests Managing imprecise and incomplete data Support statistical modeling and querying of sensor data in relational databases Clean, declarative abstractions Real-time processing of streaming data Probabilistic databases Store and query data annotated with probabilities Energy-efficient algorithms for wireless sensornets Data acquisition, target monitoring, data compression.. In-network query processing

MauveDB Written using Apache Derby Java open source DBMS Supports an abstraction called model-based views Declarative specification of models to be applied Can query the output of the models using SQL Models kept updated as new data/measurements arrive A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

MauveDB A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

MauveDB Written using Apache Derby Java open source DBMS Supports an abstraction called model-based views Declarative specification of models to be applied Can query the output of the models using SQL Models kept updated as new data/measurements arrive Status: Support for Regression- and Interpolation-based views Currently building support for views based on Dynamic Bayesian networks (Kalman Filters, HMMs etc) Ongoing work: Query processing and optimization, continuous queries APIs for arbitrary models … A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008

Probabilistic Databases Motivation: Increasing amounts of uncertain data From sensor networks Imprecise data, data with confidence/accuracy bounds Human-observed data Statistical modeling/machine learning Many models provide a distribution over a set of labels (e.g. HMMs) Information extraction from text Social networks How to manage and query such data in relational databases ? Different types of uncertainties Complex correlation patterns Much work in database community over last few years P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007

Thanks ! Questions ?