Data Management for Sensor Networks Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems April 4, 2005
2 Administrivia Please send me an updating your project status Next readings: Wednesday – read and summarize the Brin and Page paper
3 Today’s Trivia Question
4 Sensors and Sensor Networks Trends: Cameras and other sensors are very cheap Microprocessors and microcontrollers can be very small Wireless networks are easy to build Why not instrument the physical world with tiny wireless sensors and networks? Vision: “Smart dust” Berkeley motes, RF tags, cameras, camera phones, temperature sensors, etc. Today we already see pieces of this: Penn buildings and SCADA system 250+ surveillance cameras through campus
5 What Can We Do with Sensor Networks? Many “passive” monitoring applications: Environmental monitoring: temperature in different parts of a building air quality etc. Law enforcement: Video feeds and anomalous behavior Research studies: Study ocean temperature, currents Monitor status of eggs in endangered birds’ nests ZebraNet Fun: Record sporting events or performances from every angle (video & audio) Ultimately, build reactive systems as well: robotics, Mars landers, …
6 Some Challenges Highly distributed! May have thousands of nodes Know about a few nodes within proximity; may not know location Nodes’ transmissions may interfere with one another Power and resource constraints Most of these devices are wireless, tiny, battery-powered Can only transmit data every so often Limited CPU, memory – can’t run sophisticated code High rate of failure Collisions, battery failures, sensor calibration, …
7 The Target Platform Most sensor network research argues for the Berkeley mote as a target platform: Mote: 4MHz, 8-bit CPU 128KB RAM 512KB Flash memory 40kbps radio, 100 ft range Sensors: Light, temperature, microphone Accelerometer Magnetometer
8 Sensor Net Data Acquisition First: build routing tree Second: begin sensing and aggregation
9 Sensor Net Data Acquisition (Sum) First: build routing tree Second: begin sensing and aggregation (e.g., sum)
10 Sensor Net Data Acquisition (Sum) First: build routing tree Second: begin sensing and aggregation (e.g., sum)
11 Sensor Network Research Routing: need to aggregate and consolidate data in a power-efficient way Ad hoc routing – generate routing tree to base station Generally need to merge computation with routing Robustness: need to combine info from many sensors to account for individual errors What aggregation functions make sense? Languages: how do we express what we want to do with sensor networks? Many proposals here
12 A First Try: Tiny OS and nesC TinyOS: a custom OS for sensor nets, written in nesC Assumes low-power CPU Very limited concurrency support: events (signaled asynchronously) and tasks (cooperatively scheduled) Applications built from “components” Basically, small objects without any local state Various features in libraries that may or may not be included interface Timer { command result_t start(char type, uint32_t interval); command result_t stop(); event result_t fired(); }
13 Drawbacks of this Approach Need to write very low-level code for sensor net behavior Only simple routing policies are built into TinyOS – some of the routing algorithms may have to be implemented by hand Has required many follow-up papers to fill in some of the missing pieces, e.g., Hood (object tracking and state sharing), …
14 An Alternative “Much” of the computation being done in sensor nets looks like what we were discussing with STREAM Today’s sensor networks look a lot like databases, pre-Codd Custom “access paths” to get to data One-off custom-code So why not look at mapping sensor network computation to SQL? Not very many joins here, but significant aggregation Now the challenge is in picking a distribution and routing strategy that provides appropriate guarantees and minimizes power usage
15 TinyDB and TinySQL Treat the entire sensor network as a universal relation Each type of sensor data is a column in a global table Tuples are created according to a sample interval (separated by epochs) (Implications of this model?) SELECT nodeid, light, temp FROM sensors SAMPLE INTERVAL 1s FOR 10s
16 Storage Points and Windows Like Aurora, STREAM, can materialize portions of the data: CREATE STORAGE POINT recentlight SIZE 8 AS (SELECT nodeid, light FROM sensors SAMPLE INTERVAL 10s) and we can use windowed aggregates: SELECT WINAVG(volume, 30s, 5s) FROM sensors SAMPLE INTERVAL 1s
17 Events ON EVENT bird-detect(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s How do we know about events? Contrast to UDFs? triggers?
18 Power and TinyDB Cost-based optimizer tries to find a query plan to yield lowest overall power consumption Different sensors have different power usage Try to order sampling according to selectivity (sounds familiar?) Assumption of uniform distribution of values over range Batching of queries (multi-query optimization) Convert a series of events into a stream join – does this resemble anything we’ve seen recently? Also need to consider where the query is processed…
19 Dissemination of Queries Based on semantic routing tree idea SRT build request is flooded first Node n gets to choose its parent p, based on radio range from root Parent knows its children Maintains an interval on values for each child Forwards requests to children as appropriate Maintenance: If interval changes, child notifies its parent If a node disappears, parent learns of this when it fails to get a response to a query
20 Query Processing Mostly consists of sleeping! Wake briefly, sample, and compute operators, then route onwards Nodes are time synchronized Awake time is proportional to the neighborhood size (why?) Computation is based on partial state records Basically, each operation is a partial aggregate value, plus the reading from the sensor
21 Load Shedding & Approximation What if the router queue is overflowing? Need to prioritize tuples, drop the ones we don’t want FIFO vs. averaging the head of the queue vs. delta-proportional weighting Later work considers the question of using approximation for more power efficiency If sensors in one region change less frequently, can sample less frequently (or fewer times) in that region If sensors change less frequently, can sample readings that take less power but are correlated (e.g., battery voltage vs. temperature) Thursday, 4:30PM, DB Group Meeting, I’ll discuss some of this work
22 The Future of Sensor Nets? TinySQL is a nice way of formulating the problem of query processing with motes View the sensor net as a universal relation Can define views to abstract some concepts, e.g., an object being monitored But: What about when we have multiple instances of an object to be tracked? Correlations between objects? What if we have more complex data? More CPU power? What if we want to reason about accuracy?