Download presentation
Presentation is loading. Please wait.
Published byAntony Hancock Modified over 9 years ago
1
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland
2
Motivation Unprecedented, and rapidly increasing, instrumentation of our every-day world Wireless sensor networks RFID Distributed measurement networks (e.g. GPS) Industrial Monitoring
3
Sensor Data Processing: Now Database timeidtemp 10am120 10am221.. … 10am729 Table raw-data Sensor Network 1.Extract all readings into a file 2.Run MATLAB/R/other data processing tools 3.Write output to a file/back to the database 4.Write data processing tools to process/aggregate the output (maybe using DB) 5.Decide new data to acquire User Repeat
4
Sensor Data Processing: What we want Database timeidtemp 10am120 10am221.. … 10am729 Table raw-data Sensor Network Models to be applied to data in real-time (at least simple ones) User timeidtemp 10am120 10am221.. … 10am729 Table processed-data Tasks Data Continuous (standing) queries e.g. alert monitoring Results to continuous queries Ad hoc queries (possibly against processed, modeled data)
5
Data Management Challenges Very, very large scale Spatio-temporal querying essential Need new indexing techniques, data description formats, techniques for “data ingest” (cleaning the data etc) Much work in scientific data management E.g. SkyServerSkyServer Data is typically imprecise, unreliable, or incomplete (data quality) Measurement noise, failures in sensor/GPS data High message loss rate in wireless/RFID Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
6
Data Management Challenges Data is generated continuously and must be processed in real-time (distributed data streams) Need different query processing paradigms Typically very high data rates Must be able to handle a large number of continuous queries efficiently Much recent work on “Data Streams” Research systems: TelegraphCQ [Berkeley], STREAM [Stanford], Aurora [Brown/MIT/Brandeis] etc… Commercial systems: Streambase, TruViso, … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
7
Data Management Challenges Need for real-time statistical modeling of data Eliminate spatial/temporal biases, handle missing data through extrapolation (e.g. regression, interpolation models) Filter measurement noise (e.g. Kalman Filters) Infer hidden variables, pattern recognition (e.g. HMMs) Fault or anomaly detection Forecasting/prediction (e.g. ARIMA) Regression/interpolation models Temperature monitoring Kalman Filters … GPS Data
8
Data Management Challenges The applications have strong acquisitional aspects Data has to be actively acquired as needed Typically high data acquisition costs(e.g. energy consumption in battery- powered devices) Data provenance Being able to trace something back to its origins Data exploration and visualization Data interoperability Data security and privacy … Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
9
My Research Interests Managing imprecise and incomplete data Support statistical modeling and querying of sensor data in relational databases Clean, declarative abstractions Real-time processing of streaming data Probabilistic databases Store and query data annotated with probabilities Energy-efficient algorithms for wireless sensornets Data acquisition, target monitoring, data compression.. In-network query processing
10
MauveDB Written using Apache Derby Java open source DBMS Supports an abstraction called model-based views Declarative specification of models to be applied Can query the output of the models using SQL Models kept updated as new data/measurements arrive A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
11
MauveDB A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
12
MauveDB Written using Apache Derby Java open source DBMS Supports an abstraction called model-based views Declarative specification of models to be applied Can query the output of the models using SQL Models kept updated as new data/measurements arrive Status: Support for Regression- and Interpolation-based views Currently building support for views based on Dynamic Bayesian networks (Kalman Filters, HMMs etc) Ongoing work: Query processing and optimization, continuous queries APIs for arbitrary models … A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006 B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
13
Probabilistic Databases Motivation: Increasing amounts of uncertain data From sensor networks Imprecise data, data with confidence/accuracy bounds Human-observed data Statistical modeling/machine learning Many models provide a distribution over a set of labels (e.g. HMMs) Information extraction from text Social networks How to manage and query such data in relational databases ? Different types of uncertainties Complex correlation patterns Much work in database community over last few years P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007
14
Thanks ! Questions ?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.