The Design of an Acquisitional Query Processor for Sensor Networks CS851 Presentation 2005 Presented by: Gang Zhou University of Virginia
Outline Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
Application Structure Queries submitted in PC Queries submitted in PC Parsed, optimized in PC Parsed, optimized in PC Disseminated and processed in network Disseminated and processed in network Results flow back through the routing tree Results flow back through the routing tree
Design Goals Provide a query processor-like interface to sensor networks Provide a query processor-like interface to sensor networks Use acquisitional techniques to reduce power consumption compared to traditional passive systems Use acquisitional techniques to reduce power consumption compared to traditional passive systems
How? What is meant by acquisitional techniques? What is meant by acquisitional techniques? Where, when, and how often data is acquired and delivered to query processing operators Where, when, and how often data is acquired and delivered to query processing operators Four related questions Four related questions When should samples be taken? When should samples be taken? What sensors have relevant data? What sensors have relevant data? In what order should samples be taken? In what order should samples be taken? Is it worth to process and relay samples? Is it worth to process and relay samples?
What’s the big deal? Radio is expensive Radio is expensive Sensing takes significant energy Sensing takes significant energy Four Energy Levels: Four Energy Levels: Snoozing Snoozing Processing Processing Processing and receiving Processing and receiving Transmitting Transmitting
Roadmap Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
An Acquisitional Query Language SQL-like queries in the form of SELECT- FROM-WHERE SQL-like queries in the form of SELECT- FROM-WHERE SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s Sensors viewed as a single table Sensors viewed as a single table Columns are sensor data Columns are sensor data Rows are individual sensors Rows are individual sensors Unbounded, continuous data stream of values Unbounded, continuous data stream of values
Why Windows? Sensors table is an unbounded, continuous data stream Sensors table is an unbounded, continuous data stream Operations such as sort and symmetric join are not allowed on streams Operations such as sort and symmetric join are not allowed on streams They are allowed on bounded subsets of the stream (windows) They are allowed on bounded subsets of the stream (windows)
Windows Windows in TinyDB are fixed-size materialization points over sensor streams. Windows in TinyDB are fixed-size materialization points over sensor streams. Materialization points can be used in queries Materialization points can be used in queries Example CREATE STORAGE POINT recentlight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE INTERVAL 10s) SELECT COUNT(*) FROM sensors AS s, recentlight AS r1 WHERE r.nodeid = s.nodeid AND s.light < r1.light SAMPLE INTERVAL 10s Example CREATE STORAGE POINT recentlight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE INTERVAL 10s) SELECT COUNT(*) FROM sensors AS s, recentlight AS r1 WHERE r.nodeid = s.nodeid AND s.light < r1.light SAMPLE INTERVAL 10s
Temporal Aggregation Why Aggregation? Why Aggregation? Reduce the quantity of data that must be transmitted through the network Reduce the quantity of data that must be transmitted through the network Example Example SELECT WINAVG (volume, 30s, 5s) FROM sensors SAMPLE INTERVAL 1s SELECT WINAVG (volume, 30s, 5s) FROM sensors SAMPLE INTERVAL 1s Report the average volume over the last 30 seconds once every 5 seconds, sampling once per second Report the average volume over the last 30 seconds once every 5 seconds, sampling once per second How about spacial aggregation or spacial- temporal aggregation? How about spacial aggregation or spacial- temporal aggregation?
Event-Based Queries An alternative to continuous polling for data An alternative to continuous polling for data Example ON EVENT bird-detector(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s Example ON EVENT bird-detector(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s Currently, events are only signaled on the local node. How about a fully distributed event propagation system? What is the gain? What is the pay? Is it worthy doing?
Lifetime-Based Queries Example SELECT nodeid, accel FROM sensors LIFETIME 30 days Example SELECT nodeid, accel FROM sensors LIFETIME 30 days The query specifies that the network should The query specifies that the network should Run for as least 30 days Run for as least 30 days Sampling light and acceleration sensors as quick as possible and still maintains the life time goal Sampling light and acceleration sensors as quick as possible and still maintains the life time goal
Lifetime-Based Queries Nodes perform cost-based analysis in order to determine data rate for each node Nodes perform cost-based analysis in order to determine data rate for each node
Lifetime-Based Queries Tested a mote with a 24 week query Tested a mote with a 24 week query Sample rate was 15.2 seconds per sample Sample rate was 15.2 seconds per sample Took 9 voltage readings over 12 days Took 9 voltage readings over 12 days Reasonable to drop the first two data? Reasonable to use data from the first 12 days to fit a line which covers 168 days?
Roadmap Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
Power-Aware Optimization Where? Where? Queries optimized by base station before dissemination Queries optimized by base station before dissemination why? why? Cost-based optimization to yield lowest overall power consumption Cost-based optimization to yield lowest overall power consumption Cost dominated by sampling and transmitting Cost dominated by sampling and transmitting How? How? Optimizer focuses on ordering joins, selections, and sampling on individual nodes Optimizer focuses on ordering joins, selections, and sampling on individual nodes
Reordering Sampling and Predicates Consider the query SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLE INTERVAL 1s Consider the query SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLE INTERVAL 1s Three options Three options Measure accel and mag; then process select Measure accel and mag; then process select Measure mag; filter; then measure accel Measure mag; filter; then measure accel Measure accel; filter; then measure mag Measure accel; filter; then measure mag First option always more expensive. First option always more expensive. Second option an order of magnitude more expensive than third Second option an order of magnitude more expensive than third Second option can be cheaper if the predicate is highly selective Second option can be cheaper if the predicate is highly selective
Example 2 Another example SELECT WINMAX (light, 8s, 8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Another example SELECT WINMAX (light, 8s, 8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Unless mag > x is very selective, it is cheaper to check if current light is greater than max Unless mag > x is very selective, it is cheaper to check if current light is greater than max Reordering is called exemplary aggregate pushdown Reordering is called exemplary aggregate pushdown
Event Query Batching Have a query Have a query ON EVENT e (nodeid) ON EVENT e (nodeid) SELECT a1 SELECT a1 FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k Every time e occurs, an instance of the internal query is started. Every time e occurs, an instance of the internal query is started. Multiple independent instances at the same time, independent sampling and data delivering Multiple independent instances at the same time, independent sampling and data delivering
Solution: Solution: Convert event e into a event stream Convert event e into a event stream Rewrite the internal query as a sliding window join between the event stream and sensors Rewrite the internal query as a sliding window join between the event stream and sensors ON EVENT e (nodeid) SELECT a1 SELECT a1 FROM sensors AS s FROM sensors AS s WHERE s.nodeid = e.nodeid WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k SAMPLE INTERVAL d FOR k ON EVENT s.a1 FROM sensors AS s, events AS e FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid WHERE s.nodeid = e.nodeid AND e.type = e AND e.type = e AND s.time – e.time e.time AND s.time – e.time e.time SAMPLE INTERVAL d SAMPLE INTERVAL d
Roadmap Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
Semantic Routing Trees Why SRT? Why SRT? It is a routing tree designed to allow each node to efficiently determine if any of the nodes below it will need to participate in a given query over some constant attributes. It is a routing tree designed to allow each node to efficiently determine if any of the nodes below it will need to participate in a given query over some constant attributes. Used to prune the routing tree. Used to prune the routing tree. What is SRT? What is SRT? An SRT is an index over constant attribute A that can be used to locate nodes that have data relevant to the query. An SRT is an index over constant attribute A that can be used to locate nodes that have data relevant to the query. It is an overlay on the network. It is an overlay on the network.
How to use CRT? How to use CRT? When a query q with a predicate over A arrives at node n, n checks whether any child’s value of A overlaps the query range of A in q: When a query q with a predicate over A arrives at node n, n checks whether any child’s value of A overlaps the query range of A in q: If yes, prepare to receive results and forward the query If yes, prepare to receive results and forward the query If no, do not forward q If no, do not forward q Is query q applied locally: Is query q applied locally: If yes, execute the query If yes, execute the query If not, ignored If not, ignored
How to build CRT? How to build CRT? Flood the SRT build request down the network Flood the SRT build request down the network Re-transmitted by every mote until every mote hears it Re-transmitted by every mote until every mote hears it If a node has no children If a node has no children Choose a parent p; report the value of A to p Choose a parent p; report the value of A to p If a node has children If a node has children Forward the request, and wait for reply Forward the request, and wait for reply Upon reply from children, choose a parent p; report to p the range of values of A which it and its descendents cover Upon reply from children, choose a parent p; report to p the range of values of A which it and its descendents cover Since each constant attribute A may have a separate SRT, is the scheme scalable? Since each constant attribute A may have a separate SRT, is the scheme scalable?
Evaluation of SRT SRT are limited to constant attributes SRT are limited to constant attributes Even so, maintenance is required Even so, maintenance is required Possible to use for non-constant attributes but cost can be prohibitive Possible to use for non-constant attributes but cost can be prohibitive
Evaluation of SRT Compared three different strategies for building tree, random, closest, and cluster Compared three different strategies for building tree, random, closest, and cluster Random: pick a random parent from the nodes with reliable communication Random: pick a random parent from the nodes with reliable communication Closest: pick the parent whose attribute value (index attribute) is closest Closest: pick the parent whose attribute value (index attribute) is closest Cluster: by snooping siblings’ parent selection, each node try to pick the right parent, to minimize the spread of attribute values underneath all of its available parents Cluster: by snooping siblings’ parent selection, each node try to pick the right parent, to minimize the spread of attribute values underneath all of its available parents Report results for two different sensor value distributions, random and geographic Report results for two different sensor value distributions, random and geographic Random: each attribute value is randomly selected from the interval [0,1000] Random: each attribute value is randomly selected from the interval [0,1000] Geographic: values among neighbor are highly correlated Geographic: values among neighbor are highly correlated
SRT Results The Cluster scheme is superior to the random scheme and the closest scheme. With the geographic distribution, the performance of the cluster scheme is close to the optimal. Where is the data of SRT’s overhead?
Roadmap Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
Processing Queries Queries have been optimized and distributed, what more can we do? Queries have been optimized and distributed, what more can we do? Aggregate data that is sent back to the root Aggregate data that is sent back to the root Prioritize data that needs to be sent Prioritize data that needs to be sent Naïve - FIFO Naïve - FIFO Winavg – average the two results at the queue’s head to make room for the new data Winavg – average the two results at the queue’s head to make room for the new data Delta – Send result with most changes Delta – Send result with most changes Adapt data rates and power consumption Adapt data rates and power consumption
Prioritization Comparison Sample rate was K times faster than delivery rate. Sample rate was K times faster than delivery rate. Readings generated by shaking the sensor Readings generated by shaking the sensor In this example, K = 4 In this example, K = 4 Delta seems to be better Delta seems to be better
Adaptation Not safe to assume that network channel is uncontested Not safe to assume that network channel is uncontested TinyDB reduces packets sent as channel contention rises TinyDB reduces packets sent as channel contention rises
Adaptation
Roadmap Application Structure & Design Goals Application Structure & Design Goals Acquisitional Query Language Acquisitional Query Language Power-Aware Optimization Power-Aware Optimization Power Sensitive Dissemination and Routing Power Sensitive Dissemination and Routing Processing Queries Processing Queries Conclusions and Future Work Conclusions and Future Work Discussion Discussion
Conclusions & Future Work Conclusions: Conclusions: Design of an acquisitional query processor for data collection in sensor networks Design of an acquisitional query processor for data collection in sensor networks Evaluation in the context of TinyDB Evaluation in the context of TinyDB Future Work: Future Work: Selectivity of operators based upon range of sensor Selectivity of operators based upon range of sensor Exemplary aggregate pushdown Exemplary aggregate pushdown More sophisticated prioritization schemes More sophisticated prioritization schemes Better re-optimization of sample rate based upon acquired data Better re-optimization of sample rate based upon acquired data
Discussion Is this the best way (right way?) to look at a sensor network? Is this the best way (right way?) to look at a sensor network? Is their approximation of battery lifetime sufficient? Is their approximation of battery lifetime sufficient? Was their evaluation of SRT good enough? Was their evaluation of SRT good enough?