1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.

1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler WMCSA June 21, 2002

2 Motivation: Sensor Nets and In-Network Query Processing Many Sensor Network Applications are Data Oriented Queries Natural and Efficient Data Processing Mechanism – Easy (unlike embedded C code) – Enable optimizations through abstraction Aggregates Common Case – E.g. Which rooms are in use? In-network processing a must – Sensor networks power and bandwidth constrained – Communication dominates power cost – Not subject to Moore’s law!

3 Overview Background – Sensor Networks Our Approach: Tiny Aggregation (TAG) – Overview – Expressiveness – Illustration – Optimizations – Grouping Current Status & Future Work

5 Background: Sensor Networks A collection of small, radio-equipped, battery powered, networked microprocessors – Typically Ad-hoc & Multihop Networks – Single devices unreliable – Very low power; tiny batteries power for months Apps: Environment Monitoring, Personal Nets, Object Tracking Data processing plays a key role!

6 Berkeley Mica Motes & TinyOS TinyOS operating system (services) 4Mhz Processor 4K RAM, 512K EEPROM, 128K code space Single channel CSMA half-duplex radio @ 40kbits – Lossy: 20% loss @ 5ft in Ganesan et al. – Communication Very Expensive: 800 instrs/bit

8 The Tiny Aggregation (TAG) Approach Push declarative queries into network – Impose a hierarchical routing tree onto the network Divide time into epochs Every epoch, sensors evaluate query over local sensor data and data from children – Aggregate local and child data – Each node transmits just once per epoch – Pipelined approach increases throughput Depending on aggregate function, various optimizations can be applied

9 SQL Primer SQL is an established declarative language; not wedded to it – Some extensions clearly necessary, e.g. for sample rates We adopt a basic subset: ‘sensors’ relation (table) has – One column for each reading-type, or attribute – One row for each externalized value May represent an aggregation of several individual readings SELECT {agg n (attr n ), attrs} FROM sensors WHERE {selPreds} GROUP BY {attrs} HAVING {havingPreds} EPOCH DURATION s SELECT AVG(light) FROM sensors WHERE sound < 100 GROUP BY roomNo HAVING AVG(light) < 50

10 Aggregation Functions Standard SQL supports “the basic 5”: – MIN, MAX, SUM, AVERAGE, and COUNT We support any function conforming to: Agg n ={f merge, f init, f evaluate } F merge {, }  f init {a 0 }  F evaluate { }  aggregate value (Merge associative, commutative!) Example: Average AVG merge {, }  AVG init {v}  AVG evaluate { }  S 1 /C 1 Partial Aggregate

11 Query Propagation TAG propagation agnostic – Any algorithm that can: Deliver the query to all sensors Provide all sensors with one or more duplicate free routes to some root Paper describes simple flooding approach – Query introduced at a root; rebroadcast by all sensors until it reaches leaves – Sensors pick parent and level when they hear query – Reselect parent after k silent epochs Query P:0, L:1 2 1 5 3 4 6 P:1, L:2 P:3, L:3 P:2, L:3 P:4, L:4

12 Illustration: Pipelined Aggregation 1 2 3 4 5 SELECT COUNT(*) FROM sensors Depth = d

13 Illustration: Pipelined Aggregation 12345 111111 1 2 3 4 5 1 1 1 1 1 Sensor # Epoch # Epoch 1 SELECT COUNT(*) FROM sensors

14 Illustration: Pipelined Aggregation 12345 111111 231221 1 2 3 4 5 1 2 2 1 3 Sensor # Epoch # Epoch 2 SELECT COUNT(*) FROM sensors

15 Illustration: Pipelined Aggregation 12345 111111 231221 341321 1 2 3 4 5 1 2 3 1 4 Sensor # Epoch # Epoch 3 SELECT COUNT(*) FROM sensors

16 Illustration: Pipelined Aggregation 12345 111111 231221 341321 451321 1 2 3 4 5 1 2 3 1 5 Sensor # Epoch # Epoch 4 SELECT COUNT(*) FROM sensors

17 Illustration: Pipelined Aggregation 12345 111111 231221 341321 451321 551321 1 2 3 4 5 1 2 3 1 5 Sensor # Epoch # Epoch 5 SELECT COUNT(*) FROM sensors

18 Discussion Result is a stream of values – Ideal for monitoring scenarios One communication / node / epoch – Symmetric power consumption, even at root New value on every epoch – After d-1 epochs, complete aggregation Given a single loss, network will recover after at most d-1 epochs With time synchronization, nodes can sleep between epochs, except during small communication window Note: Values from different epochs combined – Can be fixed via small cache of past values at each node – Cache size at most one reading per child x depth of tree 1 2 3 4 5

19 Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Some aggregates require dramatically more state!

20 Optimization: Channel Sharing Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate – E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report – Applies to all such exemplary aggregates Learn about query advertisements it missed – If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. Root doesn’t have to explicitly rebroadcast query!

21 Optimization: Hypothesis Testing Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. – E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. – Works for any linear aggregate function How is hypothesis computed? – Blind guess – Statistically informed guess – Observation over first few levels of tree / rounds of aggregate

22 Optimization: Use Multiple Parents For duplicate insensitive (e.g. MAX), or partitionable (e.g. COUNT) aggregates, – Send (part of) aggregate to all parents – Decreases variance Dramatically, when there are lots of parents No extra cost, since all messages broadcast

23 Grouping Value-based, complete partitioning of records If query is grouped, sensors apply predicate to local readings on each epoch Aggregate records tagged with group When a child record (with group) is received: – If it belongs to a stored group, merge with existing record for that group – If not, just store it At the end of each epoch, transmit one record per group Number of groups may exceed available storage – Can evict groups for aggregation at root!

25 Status & Future Work Status – Simple simulator Complete set of experiments, including behavior of algorithms in the face of loss – Generalization of algorithms beyond complete pipelining – Taxonomy of aggregates to allow optimizations on functional properties – Basic implementation (shown in demo) Future work – Expressiveness issues Aggregates over temporal data Nested queries, e.g MAX(AVG(1000 readings) @ each node) – Correctness Issues in The Face Of Loss How does the user know which nodes are and are not included in an aggregate?

26 Summary Declarative queries for aggregates – Straightforward, familiar interface – Enables optimizations Snooping techniques for exemplary aggregates Multiple parents for partitionable aggregates Pipelined, epoch based algorithm – Streaming Results – Symmetric communication – Low-power friendly

27 Questions?

28 Grouping GROUP BY expr – expr is an expression over one or more attributes Evaluation of expr yields a group number Each reading is a member of exactly one group Example: SELECT max(light) FROM sensors GROUP BY TRUNC(temp/10) Sensor IDLightTempGroup 145252 227282 366343 468373 Groupmax(light) 245 368 Result:

29 Having HAVING preds – preds filters out groups that do not satisfy predicate – versus WHERE, which filters out tuples that do not satisfy predicate – Example: SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100 Yields all groups with temperature under 100

30 Group Eviction Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! – Choose one or more groups to forward up tree – Rely on nodes further up tree, or root, to recombine groups properly – What policy to choose? Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. Experiments suggest: – Policy matters very little – Evicting as many groups as will fit into a single message is good

31 Simulation Environment Java-based simulation & visualization for validating algorithms, collecting data. Coarse grained event based simulation – Sensors arranged on a grid, radio connectivity by Euclidian distance – Communication model Lossless: All neighbors hear all messages Lossy: Messages lost with probability that increases with distance Symmetric links No collisions, hidden terminals, etc.

32 Simulation Screenshot

33 Experimental Results Experiments with simulator – Performance of basic TAG – Benefits of hypothesis testing – Effect of loss Most experiments in terms of bytes or messages sent, since message transmission is the dominant cost – Depends on radio being turned off between epochs and aggregation functions being cheap

34 Experiment: Basic TAG Dense Packing, Ideal Communication

35 Experiment: Hypothesis Testing Uniform Value Distribution, Dense Packing, Ideal Communication

36 Experiment: Effects of Loss

37 Experiment: Benefit of Cache

38 Pipelined Aggregates After query propagates, during each epoch: – Each sensor samples local sensors once – Combines them with PSRs from children – Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root – d = Depth of the routing tree – If desired, partial state from top k levels could be output in k th epoch To avoid combining PSRs from different epochs, sensors must cache values from children 1 23 4 5 Value from 5 produced at time t arrives at 1 at time (t+3) Value from 2 produced at time t arrives at 1 at time (t+1)

39 Pipelining Example 1 2 43 5 SIDEpochAgg. SIDEpochAgg. SIDEpochAgg.

40 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 SIDEpochAgg. 101 SIDEpochAgg. 301 501 Epoch 0

41 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 SIDEpochAgg. 101 111 202 SIDEpochAgg. 301 501 311 511 Epoch 1

42 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 221 421 312 SIDEpochAgg. 101 111 202 121 204 SIDEpochAgg. 301 501 311 511 321 521 Epoch 2

43 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 221 421 312 SIDEpochAgg. 101 111 202 121 204 SIDEpochAgg. 301 501 311 511 321 521 Epoch 3

44 Pipelining Example 1 2 43 5 Epoch 4

45 Optimization: Delta Compression If a sensor’s reading is unchanged from previous epoch, it need not transmit. – Parents assume value is unchanged – Leverage child value cache – Periodic heartbeats to handle disconnection Extension: if a sensor’s reading is unchanged by more than some threshold, it need not transmit – Similar to hypothesis testing with AVERAGE – Really future work: See C. Olsten, “Best-Effort Cache Synchronization”, SIGMOD 2002.

46 Taxonomy of Aggregates TAG insight: classifying aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied PropertyExamplesAffects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss MonotonicCOUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping

1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.

Similar presentations

Presentation on theme: "1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.

Similar presentations

Presentation on theme: "1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler."— Presentation transcript:

Similar presentations

About project

Feedback