Download presentation
Presentation is loading. Please wait.
1
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie Mellon University 3 MIT 4 Intel Research - Berkeley
2
Sensor networks and distributed systems A collection of devices that can sense, actuate, and communicate over a wireless network Available resources 4 MHz, 8 bit CPU 40 Kbps wireless 3V battery (lasts days or months) Sensors for temperature, humidity, pressure, sound, magnetic fields, acceleration, visible and ultraviolet light, etc. Analogous issues in other distributed systems, including streams and the Internet
3
Leach's Storm Petrel Real deployments Great Duck Island Redwoods Precision agriculture Fabrication monitoring
4
Example: Intel Berkeley Lab deployment
5
Every time step Analogy: Sensor net as a database TinyDB Query Distribute query Collect query answer or data SQL-style query Declarative interface: Sensor nets are not just for PhDs Decrease deployment time Data aggregation: Can reduce communication
6
Every time step Limitations of existing approach TinyDB Query Distribute query Collect data New Query SQL-style query Redo process every time query changes Query distribution: Every node must receive query Data collection: Every node must wake up at every time step Data loss ignored No quality guarantees Data inefficient – ignoring correlations
7
Sensor net data is correlated Spatial-temporal correlation Inter-attributed correlation Data is not i.i.d. shouldn’t ignore missing data Observing one sensor information about other sensors (and future values) Observing one attribute information about other attributes
8
tt SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations New Query posterior belief Strengths of model-based data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries
9
Probabilistic models and queries User’s perspective: Query SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} System selects and observes subset of nodes Observed nodes: {3,6,8} Query result Node12345678 Temp.17.318.117.416.119.221.317.516.3 Conf.98%95%100%99%95%100%98%100%
10
Probabilistic models and queries Joint distribution P(X 1,…,X n ) Probabilistic query Example: Value of X 2 ± with prob. > 1- Prob. below 1- ? Observe attributes Example: Observe X 1 =18 P(X 2 |X 1 =18) Higher prob., could answer query Learn from historical data
11
Dynamic models: filtering Joint distribution at time t Observe attributes Example: Observe X 1 =18 Condition on observations t Fewer obs. in future queries Example: Kalman filter Learn from historical data
12
Supported queries Value query X i ± with prob. at least 1- SELECT and Range query X i [a,b] with prob. at least 1- which sensors have temperature greater than 25°C ? Aggregation average ± of subset of attribs. with prob. > 1- combine aggregation and selection probability > 10 sensors have temperature greater than 25°C ? Queries require solution to integrals Many queries computed in closed-form Some require numerical integration/sampling
13
tt SQL-style query with desired confidence Model-driven data acquisition: overview Probabilistic Model Query Data gathering plan Condition on new observations posterior belief What sensors do we observe ? How do we collect observations?
14
Acquisition costs Attributes have different acquisition costs Exploit correlation through probabilistic model Must consider networking cost 1 2 63 45 cheaper?
15
Network model and plan format Assume known (quasi-static) network topology Define traversal using (1.5-approximate) TSP C t (S ) is expected cost of TSP (lossy communication) 1 2 63 45 7 8 12 9 1011 Cost of collecting subset S of sensor values: C(S ) = C a (S )+ C t (S ) Goal: Find subset S that is sufficient to answer query at minimum cost C(S )
16
Choosing observation plan Is a subset S sufficient? X i 2 [a,b] with prob. > 1- If we observe S =s : R i (s ) = max{ P(X i 2 [a,b] | s ), 1-P(X i 2 [a,b] | s )} Value of S is unknown: R i (S ) = P(s ) R i (s ) ds Optimization problem:
17
tt SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation
18
Experimental results Redwood trees and Intel Lab datasets Learned models from data Static model Dynamic model – Kalman filter, time-indexed transition probabilities Evaluated on a wide range of queries
19
Cost versus Confidence level
20
Obtaining approximate values Query: True temperature value ± epsilon with confidence 95%
21
Approximate range queries Query: Temperature in [T 1,T 2 ] with confidence 95%
22
Comparison to other methods
23
Intel Lab traversals
24
tt SQL-style query with desired confidence BBQ system Probabilistic Model Query Data gathering plan Condition on new observations posterior belief Value Range Average Multivariate Gaussians Learn from historical data Equivalent to Kalman filter Simple matrix operations Exhaustive or greedy search Factor 1.5 TSP approximation Extensions More complex queries Other probabilistic models More advanced planning Outlier detection Dynamic networks Continuous queries …
25
Conclusions Model-driven data acquisition Observe fewer attributes Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries Basis for future sensor network systems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.