Query Processing for Sensor Networks Yong Yao and Johannes Gehrke (Presentation: Anne Denton March 8, 2003)
Outline What sensor networks are we talking about? What are the issues? What are the choices? Network issues Routing Database issues Query plans Related work
What Sensor Networks are we talking about? Commercially available: Size: a few cubic inches Projected according to Moore’s law: ¼ inch available soon (not sure sure if Moore talked about batteries …) Operating system Embedded version of Linux (redhat) or Windows ce.net Wireless multi-hop RF radio Powered by batteries (LAN-attached with permanent power sources exist also)
Berkeley MICA Mote Note related work to Gehrke’s is done at Berkeley (TinyDB)
Issues Wireless Limited QoS Latency with high variance Limited bandwith Frequently drops packets Power consumption 1 year idle 1 week under full load Computation Limited memory and computing power Uncertainty in sensor readings
Supported Sensors Temperature Light Magnetometers Accelerometers Microphones
Example Uses Buildings “Is Yong in his office” “Is there an empty seat in the meeting room” Biology Find out about existence of specific species of bird Map bird’s trail MICA Mote developed under DARPA grant …
Choices Query layer should be declarative Abstract user from physical details (Why are database people interested …) In-Network processing Preservation of energy and bandwidth Ratio of sending 1 bit vs. executing one instruction 220 to 2900 depending on architecture Different trade-offs => job of query layer Long-term, e.g., monitoring environment Short-term, e.g., battlefield Query Proxy between network and application layer (bypasses routing layer to some extent) Must be closely linked with network layer
More Choices Special nodes to access network Gateway nodes Noise requires “fusing” of data Aggregation important Queries need DURATION and EVERY Event-oriented model (triggers) desirable but not implemented
In-Network Aggregation Why? Energy to transmit is heaviest burden Partial aggregation Possible for algebraic aggregate operators (MAX, MIN, SUM, AVG) Impossible for holistic operator (MEDIAN) Otherwise: packet merging
Synchronization Necessary for partial aggregation and packet merging AVG and SUM are duplicate sensitive aggregate operators: Spanning tree MIN and MAX are not duplicate sensitive DAG may be sufficient Pragmatic approach to synchronization Problem: Predictions may fail due to network reorganization or query results bi-directional prediction
Routing Differences to wired network Everybody has to share the routing job Network is unstable Many ad-hoc routing algorithms exist Routing layer in protocol stack Database approach requires changes to routing protocol Gehrke points out that that’s not unusual: Database file-access also bypasses operating system to some extent
Changes to Routing Protocol Intercepting of packets to achieve Packet merging Partial aggregation Differences in communication pattern Communication with leader rather than point-to- point Knowledge about neighbors Route initialization and maintainance …
Query Plans Example query “What is the quietest open classroom in Upson Hall” 2 levels of aggregation Compute average value for each qualified class room Select minimum average over all class rooms Query plan has Flow blocks Leader nodes Differences to traditional optimizers Focus on communication cost Flow block instead of relational operator
Flow blocks Task Collect data Perform computations Parameters Set of source nodes Leader selection policy Routing structure, e.g., DAG, tree Computation
Query Optimization Example SELECTD.gid, AVG(D.value) FROMSensorData D GROUP BYD.gid HAVINGAVG(D.value)>Threshold Flow block for each group Good if nodes in group physically close In-Network Aggregation Single flow block for all Better if nodes in group are interspersed No In-Network Aggregation possible Packet merging more efficient
Experiments Using a simulator IEEE as MAC layer Prove energy decrease from in-Network aggregation and packet merging Extra delay overcompensated by reduced collisions … prove that the rest works too
Summary Interesting database as well as network issues No data mining issues in this paper (although I could think of some …)