Download presentation
Presentation is loading. Please wait.
1
1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler WMCSA June 21, 2002
2
2 Motivation: Sensor Nets and In-Network Query Processing Many Sensor Network Applications are Data Oriented Queries Natural and Efficient Data Processing Mechanism – Easy (unlike embedded C code) – Enable optimizations through abstraction Aggregates Common Case – E.g. Which rooms are in use? In-network processing a must – Sensor networks power and bandwidth constrained – Communication dominates power cost – Not subject to Moore’s law!
3
3 Overview Background – Sensor Networks Our Approach: Tiny Aggregation (TAG) – Overview – Expressiveness – Illustration – Optimizations – Grouping Current Status & Future Work
4
4 Overview Background – Sensor Networks Our Approach: Tiny Aggregation (TAG) – Overview – Expressiveness – Illustration – Optimizations – Grouping Current Status & Future Work
5
5 Background: Sensor Networks A collection of small, radio-equipped, battery powered, networked microprocessors – Typically Ad-hoc & Multihop Networks – Single devices unreliable – Very low power; tiny batteries power for months Apps: Environment Monitoring, Personal Nets, Object Tracking Data processing plays a key role!
6
6 Berkeley Mica Motes & TinyOS TinyOS operating system (services) 4Mhz Processor 4K RAM, 512K EEPROM, 128K code space Single channel CSMA half-duplex radio @ 40kbits – Lossy: 20% loss @ 5ft in Ganesan et al. – Communication Very Expensive: 800 instrs/bit
7
7 Overview Background – Sensor Networks Our Approach: Tiny Aggregation (TAG) – Overview – Expressiveness – Illustration – Optimizations – Grouping Current Status & Future Work
8
8 The Tiny Aggregation (TAG) Approach Push declarative queries into network – Impose a hierarchical routing tree onto the network Divide time into epochs Every epoch, sensors evaluate query over local sensor data and data from children – Aggregate local and child data – Each node transmits just once per epoch – Pipelined approach increases throughput Depending on aggregate function, various optimizations can be applied
9
9 SQL Primer SQL is an established declarative language; not wedded to it – Some extensions clearly necessary, e.g. for sample rates We adopt a basic subset: ‘sensors’ relation (table) has – One column for each reading-type, or attribute – One row for each externalized value May represent an aggregation of several individual readings SELECT {agg n (attr n ), attrs} FROM sensors WHERE {selPreds} GROUP BY {attrs} HAVING {havingPreds} EPOCH DURATION s SELECT AVG(light) FROM sensors WHERE sound < 100 GROUP BY roomNo HAVING AVG(light) < 50
10
10 Aggregation Functions Standard SQL supports “the basic 5”: – MIN, MAX, SUM, AVERAGE, and COUNT We support any function conforming to: Agg n ={f merge, f init, f evaluate } F merge {, } f init {a 0 } F evaluate { } aggregate value (Merge associative, commutative!) Example: Average AVG merge {, } AVG init {v} AVG evaluate { } S 1 /C 1 Partial Aggregate
11
11 Query Propagation TAG propagation agnostic – Any algorithm that can: Deliver the query to all sensors Provide all sensors with one or more duplicate free routes to some root Paper describes simple flooding approach – Query introduced at a root; rebroadcast by all sensors until it reaches leaves – Sensors pick parent and level when they hear query – Reselect parent after k silent epochs Query P:0, L:1 2 1 5 3 4 6 P:1, L:2 P:3, L:3 P:2, L:3 P:4, L:4
12
12 Illustration: Pipelined Aggregation 1 2 3 4 5 SELECT COUNT(*) FROM sensors Depth = d
13
13 Illustration: Pipelined Aggregation 12345 111111 1 2 3 4 5 1 1 1 1 1 Sensor # Epoch # Epoch 1 SELECT COUNT(*) FROM sensors
14
14 Illustration: Pipelined Aggregation 12345 111111 231221 1 2 3 4 5 1 2 2 1 3 Sensor # Epoch # Epoch 2 SELECT COUNT(*) FROM sensors
15
15 Illustration: Pipelined Aggregation 12345 111111 231221 341321 1 2 3 4 5 1 2 3 1 4 Sensor # Epoch # Epoch 3 SELECT COUNT(*) FROM sensors
16
16 Illustration: Pipelined Aggregation 12345 111111 231221 341321 451321 1 2 3 4 5 1 2 3 1 5 Sensor # Epoch # Epoch 4 SELECT COUNT(*) FROM sensors
17
17 Illustration: Pipelined Aggregation 12345 111111 231221 341321 451321 551321 1 2 3 4 5 1 2 3 1 5 Sensor # Epoch # Epoch 5 SELECT COUNT(*) FROM sensors
18
18 Discussion Result is a stream of values – Ideal for monitoring scenarios One communication / node / epoch – Symmetric power consumption, even at root New value on every epoch – After d-1 epochs, complete aggregation Given a single loss, network will recover after at most d-1 epochs With time synchronization, nodes can sleep between epochs, except during small communication window Note: Values from different epochs combined – Can be fixed via small cache of past values at each node – Cache size at most one reading per child x depth of tree 1 2 3 4 5
19
19 Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Some aggregates require dramatically more state!
20
20 Optimization: Channel Sharing Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate – E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report – Applies to all such exemplary aggregates Learn about query advertisements it missed – If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. Root doesn’t have to explicitly rebroadcast query!
21
21 Optimization: Hypothesis Testing Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. – E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. – Works for any linear aggregate function How is hypothesis computed? – Blind guess – Statistically informed guess – Observation over first few levels of tree / rounds of aggregate
22
22 Optimization: Use Multiple Parents For duplicate insensitive (e.g. MAX), or partitionable (e.g. COUNT) aggregates, – Send (part of) aggregate to all parents – Decreases variance Dramatically, when there are lots of parents No extra cost, since all messages broadcast
23
23 Grouping Value-based, complete partitioning of records If query is grouped, sensors apply predicate to local readings on each epoch Aggregate records tagged with group When a child record (with group) is received: – If it belongs to a stored group, merge with existing record for that group – If not, just store it At the end of each epoch, transmit one record per group Number of groups may exceed available storage – Can evict groups for aggregation at root!
24
24 Overview Background – Sensor Networks Our Approach: Tiny Aggregation (TAG) – Overview – Expressiveness – Illustration – Optimizations – Grouping Current Status & Future Work
25
25 Status & Future Work Status – Simple simulator Complete set of experiments, including behavior of algorithms in the face of loss – Generalization of algorithms beyond complete pipelining – Taxonomy of aggregates to allow optimizations on functional properties – Basic implementation (shown in demo) Future work – Expressiveness issues Aggregates over temporal data Nested queries, e.g MAX(AVG(1000 readings) @ each node) – Correctness Issues in The Face Of Loss How does the user know which nodes are and are not included in an aggregate?
26
26 Summary Declarative queries for aggregates – Straightforward, familiar interface – Enables optimizations Snooping techniques for exemplary aggregates Multiple parents for partitionable aggregates Pipelined, epoch based algorithm – Streaming Results – Symmetric communication – Low-power friendly
27
27 Questions?
28
28 Grouping GROUP BY expr – expr is an expression over one or more attributes Evaluation of expr yields a group number Each reading is a member of exactly one group Example: SELECT max(light) FROM sensors GROUP BY TRUNC(temp/10) Sensor IDLightTempGroup 145252 227282 366343 468373 Groupmax(light) 245 368 Result:
29
29 Having HAVING preds – preds filters out groups that do not satisfy predicate – versus WHERE, which filters out tuples that do not satisfy predicate – Example: SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100 Yields all groups with temperature under 100
30
30 Group Eviction Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! – Choose one or more groups to forward up tree – Rely on nodes further up tree, or root, to recombine groups properly – What policy to choose? Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. Experiments suggest: – Policy matters very little – Evicting as many groups as will fit into a single message is good
31
31 Simulation Environment Java-based simulation & visualization for validating algorithms, collecting data. Coarse grained event based simulation – Sensors arranged on a grid, radio connectivity by Euclidian distance – Communication model Lossless: All neighbors hear all messages Lossy: Messages lost with probability that increases with distance Symmetric links No collisions, hidden terminals, etc.
32
32 Simulation Screenshot
33
33 Experimental Results Experiments with simulator – Performance of basic TAG – Benefits of hypothesis testing – Effect of loss Most experiments in terms of bytes or messages sent, since message transmission is the dominant cost – Depends on radio being turned off between epochs and aggregation functions being cheap
34
34 Experiment: Basic TAG Dense Packing, Ideal Communication
35
35 Experiment: Hypothesis Testing Uniform Value Distribution, Dense Packing, Ideal Communication
36
36 Experiment: Effects of Loss
37
37 Experiment: Benefit of Cache
38
38 Pipelined Aggregates After query propagates, during each epoch: – Each sensor samples local sensors once – Combines them with PSRs from children – Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root – d = Depth of the routing tree – If desired, partial state from top k levels could be output in k th epoch To avoid combining PSRs from different epochs, sensors must cache values from children 1 23 4 5 Value from 5 produced at time t arrives at 1 at time (t+3) Value from 2 produced at time t arrives at 1 at time (t+1)
39
39 Pipelining Example 1 2 43 5 SIDEpochAgg. SIDEpochAgg. SIDEpochAgg.
40
40 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 SIDEpochAgg. 101 SIDEpochAgg. 301 501 Epoch 0
41
41 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 SIDEpochAgg. 101 111 202 SIDEpochAgg. 301 501 311 511 Epoch 1
42
42 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 221 421 312 SIDEpochAgg. 101 111 202 121 204 SIDEpochAgg. 301 501 311 511 321 521 Epoch 2
43
43 Pipelining Example 1 2 43 5 SIDEpochAgg. 201 401 211 411 302 221 421 312 SIDEpochAgg. 101 111 202 121 204 SIDEpochAgg. 301 501 311 511 321 521 Epoch 3
44
44 Pipelining Example 1 2 43 5 Epoch 4
45
45 Optimization: Delta Compression If a sensor’s reading is unchanged from previous epoch, it need not transmit. – Parents assume value is unchanged – Leverage child value cache – Periodic heartbeats to handle disconnection Extension: if a sensor’s reading is unchanged by more than some threshold, it need not transmit – Similar to hypothesis testing with AVERAGE – Really future work: See C. Olsten, “Best-Effort Cache Synchronization”, SIGMOD 2002.
46
46 Taxonomy of Aggregates TAG insight: classifying aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied PropertyExamplesAffects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss MonotonicCOUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping
47
47
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.