1 Data Quality and Query Cost in Wireless Sensor Networks David Yates, Erich Nahum, Jim Kurose, and Prashant Shenoy IEEE PerCom 2008
2 Papers Data Quality and Query Cost in Wireless Sensor Networks IEEE PerSeNS 2007 Data Quality and Query Cost in Wireless Sensor Networks IEEE PerCom 2008 with analysis of performance trend
3 Outline Introduction Caching and Lookup Policies Data Quality and Query Cost Discussion of Results Performance Trends when value deviation is most important when end-to-end delay is most important Conclusion
4 Introduction (1/4) Data-centric WSNs Environmental and infrastructure monitoring Commercial and industrial sensing Performance Metrics accuracy total system end-to-end delay the quality of the data provided to sensor networks applications
5 Introduction (2/4) Sensor Network Deployment Example: Sensor Field Routers and switches Monitoring and control center What if the gateway is augmented with storage? Data server / Gateway (and cache) Data Acquisition and Caching
6 Introduction (3/4) Data Server or Gateway with a Cache: cache hit vs. cache miss
7 Introduction (4/4) system delay the time between a query arriving and corresponding reply departing from zero for a cache hit value deviation the unsigned difference between the data value in and the true value at location i
8 Caching and Lookup Policies Precise Policies and Approximate Policies Cache Utilization Full Not Available All hits age threshold parameter: All misses Greedy Policies cache entries are never deleted, updated, or replaced Greedy age lookups ( ) Greedy distance lookups ( ) Median-of-3 lookups ( ) Precise Policies Simple lookups ( ), Piggybacked queries ( ) Spatial Locality
9 Data Quality and Query Cost Quality Measurement Data Quality linear combination of normalized system delay and normalized value deviation : relative importance Softmax normalization Small values indicate better data quality! Z-score normalization
10 Data Quality and Query Cost Simulated Changes to the Environment (1/2) 3-dimensional sensor field Rectangular planes on six faces sensors Four base stations are placed on the X-Y plane These base stations are connected to the gateway server that has the common cache. The sensors always communicate with their closest base station. 6 unit 8 unit 4 unit X Y Z
11 Data Quality and Query Cost Simulated Changes to the Environment (2/2) One-way communication to and from minimum cost to query a location: 2 units (query and reply) maximum delay to query a location: 2 seconds normalization constantdistance normalization constant distance
12 Data Quality and Query Cost Trace-driven Changes to the Environment Intel Lab Dataset 2-dimensional field 54 Mica2Dot sensors light intensity: the most dynamically changing of sensor values Assume the sensors always communicate with their closest base station. Sensor Field Intel Berkeley Research Lab
13 Data Quality and Query Cost Query Workload Model (1/2) Query Workload Model periodic arrival process random arrival process The superposition of two query processes polling component slowly scans the sensor field at fixed rate the period of the polling component of the query workload: random component queries to different locations in the sensor field average query arrival rate of the random component:
14 Data Quality and Query Cost Query Workload Model (2/2) Simulated changes to the environment exponentially distributed inter-arrival times with mean 90 queries per second Trace-driven changes to the environment 0.9 queries per second 9 queries/second 0.09 queries/second
15 Discussion of Results Simulated Testing Dataset A. Jindal and K. Psounis Reference: Modeling Spatially-correlated Sensor Network Data, SECON 2004 Modeling Spatially Correlated Data in Sensor Networks, TOSN 2006 Download Tools Download Tools
16 Discussion of Results Query Cost vs. Data Quality Trade-off Query Cost vs. Data Quality Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = % cache hit 0% cache hit linear trade-off
17 Discussion of Results Query Cost vs. End-to-End Delay Query Cost vs. End-to-End Delay Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = 0.1 an increase in the normalized delay term!
18 Discussion of Results Query Cost vs. Data Quality Trade-off Query Cost vs. Data Quality Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = 0.9 No trade-off the best performance
19 Discussion of Results Hit Ratios, Query Costs, and End-to-End Delays Hit Ratios, Query Costs, and End-to-End Delays Correlated changes over 1000 locationsTrace-driven changes over 54 locations, 90 queries/second T = 90, 0.9 queries/second Hit ratio Query CostEnd-to-End Delay
20 Discussion of Results Query Cost vs. Value Deviation Query Cost vs. Value Deviation Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = 0.1 increase the dispersion
21 Discussion of Results Whether Delay or Value Deviation? Query Cost vs. Data Quality Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = 0.1 Quality is more important. Cost is at a premium. value deviation is more important than delay
22 Discussion of Results Whether Delay or Value Deviation? Query Cost vs. Data Quality value deviation is more important than delay Correlated changes over 1000 locationsTrace-driven changes over 54 locations A = 0.9 Getting the fast response time of a cache “hit” is worthwhile!
23 Performance Trends When Value Deviation is Most Important Query Cost vs. Data Quality Correlated changes / sec A = of 1000 Correlated changes / sec 90 of of 1000 A = 0.1 linear trade-off The results are robust! value deviation is more important than delay
24 Performance Trends When Value Deviation is Most Important Value Deviation vs. Data Quality Correlated changes / sec A = of 1000 Correlated changes / sec 90 of of 1000 A = 0.1 strong positive correlation! Environment Changes Value Deviation value deviation is more important than delay
25 Performance Trends When Value Deviation is Most Important Query Cost vs. Data Quality Trace-driven changes A = Queries/second Trace-driven changes 9 Queries/second A = Queries/second Trace-driven changes linear trade-off value deviation is more important than delay
26 Performance Trends When Value Deviation is Most Important Value Deviation vs. Data Quality Trace-driven changes A = Queries/second Trace-driven changes 9 Queries/second A = Queries/second Trace-driven changes strong positive correlation! value deviation is more important than delay
27 Performance Trends When System Delay is Most Important Query Cost vs. Data Quality Correlated changes / sec A = of 1000 Correlated changes / sec 90 of of 1000 A = 0.9 delay is more important than value deviation the best performance The results are robust! No trade-off
28 Performance Trends When System Delay is Most Important End-to-End Delay vs. Data Quality Correlated changes / sec A = of 1000 Correlated changes / sec 90 of of 1000 A = 0.9 delay is more important than value deviation strong positive correlation!
29 Performance Trends When System Delay is Most Important Query Cost vs. Data Quality A = 0.9 delay is more important than value deviation Trace-driven changes 90 Queries/second Trace-driven changes 9 Queries/second0.9 Queries/second Trace-driven changes the best performance
30 Conclusion We measure the benefit and cost of seven different caching and lookup policies. when delay drives data quality when value deviation drives data quality Query Cost vs. Data Quality linear trade-off cost vs. accuracy and/or cost vs. delay are also linear The performance trends generally remain the same. with the environment changes on query cost and data quality performance