The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong
Motes Mica Mote 4Mhz, 8 bit Atmel RISC uProc 40 kbit Radio 4 K RAM, 128 K Program Flash, 512 K Data Flash AA battery pack Based on TinyOS* *Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS
Sensor Net Sample Apps Traditional monitoring apparatus. Earthquake monitoring in shake- test sites. Vehicle detection: sensors along a road, collect data about passing vehicles. Habitat Monitoring: Storm petrels on Great Duck Island, microclimates on James Reserve.
Programming Sensor Nets Is Hard Months of lifetime required from small batteries Months of lifetime required from small batteries 3-5 days naively; can’t recharge often 3-5 days naively; can’t recharge often Interleave sleep with processing Interleave sleep with processing –Lossy, low-bandwidth, short range communication »Nodes coming and going »~20% 5m »Multi-hop –Remote, zero administration deployments –Highly distributed environment –Limited Development Tools »Embedded, LEDs for Debugging! Need high level abstractions! instructions per bit transmitted! High-Level Abstraction Is Needed!
A Solution: Declarative Queries Users specify the data they want Users specify the data they want Simple, SQL-like queries Simple, SQL-like queries Using predicates, not specific addresses Using predicates, not specific addresses Same spirit as Cougar – Our system: TinyDB Same spirit as Cougar – Our system: TinyDB Challenge is to provide: Challenge is to provide: Expressive & easy-to-use interface Expressive & easy-to-use interface High-level operators High-level operators Well-defined interactions Well-defined interactions “Transparent Optimizations” that many programmers would miss “Transparent Optimizations” that many programmers would miss Sensor-net specific techniques Sensor-net specific techniques Power efficient execution framework Power efficient execution framework Question: do sensor networks change query processing? Question: do sensor networks change query processing? Yes!
Overview Goals Goals Acquisitional Query Language Acquisitional Query Language Optimizations Optimizations Future Work Future Work Conclusions Conclusions
Goals Provide a query processor-like interface to sensor networks Provide a query processor-like interface to sensor networks Use acquisitional techniques to reduce power consumption compared to traditional passive systems Use acquisitional techniques to reduce power consumption compared to traditional passive systems
How? What is meant by acquisitional techniques? What is meant by acquisitional techniques? Where, when, and how often. Where, when, and how often. Four related questions Four related questions When should samples be taken? When should samples be taken? What sensors have relevant data? What sensors have relevant data? In what order should samples be taken? In what order should samples be taken? Is it worth it? Is it worth it?
What’s the big deal? Radio consumes as much power as the CPU Radio consumes as much power as the CPU Transmitting one bit of data consumes as much energy as 1000 CPU instructions! Transmitting one bit of data consumes as much energy as 1000 CPU instructions! Message sizes in TinyDB are by default 48 bytes Message sizes in TinyDB are by default 48 bytes Sensing takes significant energy Sensing takes significant energy
An Acquisitional Query Language SQL-like queries in the form of SELECT- FROM-WHERE SQL-like queries in the form of SELECT- FROM-WHERE Support for selection, join, projection, and aggregation Support for selection, join, projection, and aggregation Also support for sampling, windowing, and sub- queries Also support for sampling, windowing, and sub- queries Not mentioned is the ability to log data and actuate physical hardware Not mentioned is the ability to log data and actuate physical hardware
An Acquisitional Query Language Example: SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s Example: SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s Sensors viewed as a single table Sensors viewed as a single table Columns are sensor data Columns are sensor data Rows are individual sensors Rows are individual sensors
Queries as a Stream Sensors table is an unbounded, continuous data stream Sensors table is an unbounded, continuous data stream Operations such as sort and symmetric join are not allowed on streams Operations such as sort and symmetric join are not allowed on streams They are allowed on bounded subsets of the stream (windows) They are allowed on bounded subsets of the stream (windows)
Windows Windows in TinyDB are fixed-size materialization points Windows in TinyDB are fixed-size materialization points Materialization points can be used in queries Materialization points can be used in queries Example CREATE STORAGE POINT recentlight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE INTERVAL 10s) SELECT COUNT(*) FROM sensors AS s, recentlight AS r1 WHERE r.nodeid = s.nodeid AND s.light < r1.light SAMPLE INTERVAL 10s Example CREATE STORAGE POINT recentlight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE INTERVAL 10s) SELECT COUNT(*) FROM sensors AS s, recentlight AS r1 WHERE r.nodeid = s.nodeid AND s.light < r1.light SAMPLE INTERVAL 10s
Temporal Aggregation Example SELECT WINAVG(volume, 30s, 5s) FROM sensors SAMPLE INTERVAL 1s Example SELECT WINAVG(volume, 30s, 5s) FROM sensors SAMPLE INTERVAL 1s Receive only 6 results from each sensor instead of 30 Receive only 6 results from each sensor instead of 30
Event-Based Queries An alternative to continuous polling for data An alternative to continuous polling for data Example ON EVENT bird-detector(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s Example ON EVENT bird-detector(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s
Lifetime-Based Queries Example SELECT nodeid, accel FROM sensors LIFETIME 30 days Example SELECT nodeid, accel FROM sensors LIFETIME 30 days Nodes perform cost-based analysis in order to determine data rate Nodes perform cost-based analysis in order to determine data rate Nodes must transmit at the root’s rate or at an integral divisor of it Nodes must transmit at the root’s rate or at an integral divisor of it
Lifetime-Based Queries Tested a mote with a 24 week query Tested a mote with a 24 week query Sample rate was 15.2 seconds per sample Sample rate was 15.2 seconds per sample Took 9 voltage readings over 12 days Took 9 voltage readings over 12 days
Optimization Three phases to queries Three phases to queries Creation of query Creation of query Dissemination of query Dissemination of query Execution of query Execution of query TinyDB makes optimizations at each step TinyDB makes optimizations at each step
Power-Based Optimization Queries optimized by base station before dissemination Queries optimized by base station before dissemination Cost-based optimization to yield lowest overall power consumption Cost-based optimization to yield lowest overall power consumption Cost dominated by sampling and transmitting Cost dominated by sampling and transmitting Optimizer focuses on ordering joins, selections, and sampling on individual nodes Optimizer focuses on ordering joins, selections, and sampling on individual nodes
Metadata Each node contains metadata about its attributes Each node contains metadata about its attributes Nodes periodically send metadata to root Nodes periodically send metadata to root Metadata also contains information about aggregate functions Metadata also contains information about aggregate functions Information about cost, time to fetch, and range is used in query optimization Information about cost, time to fetch, and range is used in query optimization
Using Metadata Consider the query SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLE INTERVAL 1s Consider the query SELECT accel, mag FROM sensors WHERE accel > c1 AND mag > c2 SAMPLE INTERVAL 1s Order of magnitude difference between sample costs Order of magnitude difference between sample costs Three options Three options Measure accel and mag, then process select Measure accel and mag, then process select Measure mag, filter, then measure accel Measure mag, filter, then measure accel Measure accel, filter, then measure mag Measure accel, filter, then measure mag First option always more expensive. Second option an order of magnitude more expensive than third First option always more expensive. Second option an order of magnitude more expensive than third Second option can be cheaper if the predicate is highly selective Second option can be cheaper if the predicate is highly selective
Using Metadata Another example SELECT WINMAX(light, 8s, 8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Another example SELECT WINMAX(light, 8s, 8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Unless mag > x is very selective, it is cheaper to check if current light is greater than max Unless mag > x is very selective, it is cheaper to check if current light is greater than max Reordering is called exemplary aggregate pushdown Reordering is called exemplary aggregate pushdown
Event Query Batching Multiple instances of an event-based query can be running at the same time Multiple instances of an event-based query can be running at the same time Optimization based on rewriting as a sliding window join between events and sensors Optimization based on rewriting as a sliding window join between events and sensors
Dissemination Optimization Build semantic routing tree (SRT) Build semantic routing tree (SRT) SRT nodes choose parents based on semantic properties as well as link quality SRT nodes choose parents based on semantic properties as well as link quality Parent nodes keep track of the ranges of values for children Parent nodes keep track of the ranges of values for children
Evaluation of SRT SRT are limited to constant attributes SRT are limited to constant attributes Even so, maintenance is required Even so, maintenance is required Possible to use for non-constant attributes but cost can be prohibitive Possible to use for non-constant attributes but cost can be prohibitive
Evaluation of SRT Compared three different strategies for building tree, random, closest, and cluster Compared three different strategies for building tree, random, closest, and cluster Report results for two different sensor value distributions, random and geographic Report results for two different sensor value distributions, random and geographic
SRT Results
Query Execution Queries have been optimized and distributed, what more can we do? Queries have been optimized and distributed, what more can we do? Aggregate data that is sent back to the root Aggregate data that is sent back to the root Prioritize data that needs to be sent Prioritize data that needs to be sent Naïve - FIFO Naïve - FIFO Winavg – Average top queue entries Winavg – Average top queue entries Delta – Send result with most change Delta – Send result with most change Adapt data rates and power consumption Adapt data rates and power consumption
Prioritization Comparison Sample rate was K times faster than delivery rate. Sample rate was K times faster than delivery rate. Readings generated by shaking the sensor Readings generated by shaking the sensor In this example, K = 4 In this example, K = 4
Adaptation Not safe to assume that network channel is uncontested Not safe to assume that network channel is uncontested TinyDB reduces packets sent as channel contention rises TinyDB reduces packets sent as channel contention rises
Future Work More sophisticated prioritization schemes More sophisticated prioritization schemes Better re-optimization of sample rate based upon acquired data Better re-optimization of sample rate based upon acquired data
Contributions & Summary Declarative Queries via TinyDB Declarative Queries via TinyDB Simple, data-centric programming abstraction Simple, data-centric programming abstraction Known to work for monitoring, tracking, mapping Known to work for monitoring, tracking, mapping Sensor network contributions Sensor network contributions Network as a single queryable entity Network as a single queryable entity Power-aware, in-network query processing Power-aware, in-network query processing Taxonomy: Extensible aggregate optimizations Taxonomy: Extensible aggregate optimizations Query processing contributions Query processing contributions Acquisitional Query Processing Acquisitional Query Processing Framework for new issues in acquisitional systems, e.g.: Framework for new issues in acquisitional systems, e.g.: Sampling as an operator Sampling as an operator Languages, indices, approximations to control Languages, indices, approximations to control when, where, and what data is acquired + processed by the system Consideration of database, network, and device issues Consideration of database, network, and device issues