Querying Sensor Networks

Querying Sensor Networks
Sam Madden UC Berkeley October 2, UCLA

Introduction Programming Sensor Networks Is Hard
Especially if you want to build a “real” application Declarative Queries Are Easy And, can be faster and more robust than most applications!

Overview Overview of Declarative Systems TinyDB
Features Demo Challenges+ Research Issues Language Optimizations The Next Step

Features Demo Challenges + Research Issues Language Optimizations The Next Step

Declarative Queries: SQL
SQL is the traditional declarative language used in databases SELECT {sel-list} FROM {tables} WHERE {pred} GROUP BY {pred} HAVING {pred} SELECT dept.name, AVG(emp.salary) FROM emp,dept WHERE emp.dno = dept.dno AND (dept.name=“Accounting” OR dept.name=“Marketing”) GROUP BY dept.name

Declarative Queries for Sensor Networks
ON EVENT bird_detect(loc) AS bd SELECT AVG(s.light), AVG(s.temp) FROM sensors AS s WHERE dist(bd.loc,s.loc) < 10m SAMPLE PERIOD 1s for 10 [Coming soon!] 3 Examples: SELECT nodeid, light FROM sensors WHERE light > 400 SAMPLE PERIOD 1s 1 2 SELECT AVG(volume) FROM sensors WHERE light > 400 GROUP BY roomNo HAVING AVG(volume) > 200 Rooms w/ volume > 200

General Declarative Advantages
Data Independence Not required to specify how or where, just what. Of course, can specify specific addresses when needed Transparent Optimization System is free to explore different algorithms, locations, orders for operations

Data Independence In Sensor Networks
Vastly simplifies execution for large networks Since locations are described by predicates Operations are over groups Enables tolerance to faults Since system is free to choose where and when operations happen

Optimization In Sensor Networks
Optimization Goal : Power! Where to process data In network Outside network Hybrid How to process data Predicate & Join Ordering Index Selection How to route data Semantically Driven Routing

Features Demo Challenges + Research Issues Language Optimizations The Next Step

TinyDB A distributed query processor for networks of Mica motes
Available today! Goal: Eliminate the need to write C code for most TinyOS users Features Declarative queries Temporal + spatial operations Multihop routing In-network storage

TinyDB @ 10000 Ft Query {A,B,C,D,E,F}
(Almost) All Queries are Continuous and Periodic {B,D,E,F} Written in SQL With Extensions For : Sample rate Offline delivery Temporal Aggregation {D,E,F}

TinyDB Demo

Applications + Early Adopters
Some demo apps: Network monitoring Vehicle tracking “Real” future deployments: Environmental GDI (and James Reserve?) Generic Sensor Kit Parking Lot Monitor Demo!

TinyDB Architecture (Per node)
SelOperator AggOperator TupleRouter: Fetches readings (for ready queries) Builds tuples Applies operators Deliver results (up tree) TupleRouter Network AggOperator: Combines local & neighbor readings SelOperator: Filters readings Radio Stack Schema TinyAllloc Schema: “Catalog” of commands & attributes (more later) TinyAlloc: Reusable memory allocator!

TinyAlloc Handle Based Compacting Memory Allocator
For Catalog, Queries Handle h; call MemAlloc.alloc(&h,10); … (*h)[0] = “Sam”; call MemAlloc.lock(h); tweakString(*h); call MemAlloc.unlock(h); call MemAlloc.free(h); Free Bitmap Heap Master Pointer Table Free Bitmap Heap Master Pointer Table Free Bitmap Heap Master Pointer Table Free Bitmap Heap Master Pointer Table User Program Compaction

Schema Attribute & Command IF
At INIT(), components register attributes and commands they support Commands implemented via wiring Attributes fetched via accessor command Catalog API allows local and remote queries over known attributes / commands. Demo of adding an attribute, executing a command.

Features Demo Challenges + Research Issues Language Optimizations Quality

? ? ? ? ? 3 Questions ? ? Is this approach expressive enough?
Can this approach be efficient enough? Are the answers this approach gives good enough?

Q1: Expressiveness Simple data collection satisfies most users
How much of what people want to do is just simple aggregates? Anecdotally, most of it EE people want filters + simple statistics (unless they can have signal processing) However, we’d like to satisfy everyone!

Query Language New Features: Joins Event-based triggers
Via extensible catalog In network & nested queries Split-phase (offline) delivery Via buffers

Sample Query 1 Bird counter: CREATE BUFFER birds(uint16 cnt) SIZE 1
ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE

Sample Query 2 Birds that entered and left within time t of each other: ON EVENT bird-leave AND bird-enter WITHIN t SELECT bird-leave.time, bird-leave.nest WHERE bird-leave.nest = bird-enter.nest ONCE

Sample Query 3 Delta compression: SELECT light FROM buf, sensors
WHERE |s.light – buf.light| > t OUTPUT INTO buf SAMPLE PERIOD 1s

Sample Query 4 Offline Delivery + Event Chaining
CREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16 yAccel) SIZE 1000 PARTITION BY NODE SELECT xAccel, yAccel FROM SENSORS WHERE xAccel > t OR yAccel > t SIGNAL shake_start(…) SAMPLE PERIOD 1s ON EVENT shake_start(…) SELECT loc, xAccel, yAccel FROM sensors OUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel) SAMPLE PERIOD 10ms

Event Based Processing
Enables internal and chained actions Language Semantics Events are inter-node Buffers can be global Implementation plan Events and buffers must be local Since n-to-n communication not (well) supported Next: operator expressiveness

Operator Expressiveness: Aggregate Framework
Standard SQL supports “the basic 5”: MIN, MAX, SUM, AVERAGE, and COUNT We support any function conforming to: Aggn={fmerge, finit, fevaluate} Fmerge{<a1>,<a2>}  <a12> finit{a0}  <a0> Fevaluate{<a1>}  aggregate value (Merge associative, commutative!) Partial Aggregate Example: Average AVGmerge {<S1, C1>, <S2, C2>}  < S1 + S2 , C1 + C2> AVGinit{v}  <v,1> AVGevaluate{<S1, C1>}  S1/C1 From Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).

Isobar Finding

Temporal Aggregates TAG was about “spatial” aggregates
Inter-node, at the same time Want to be able to aggregate across time as well Two types: Windowed: AGG(size,slide,attr) Decaying: AGG(comb_func, attr) Demo! slide =2 size =4 … R1 R2 R3 R4 R5 R6 …

Expressiveness Review
Internal & nested queries With logging of results for offline delivery Event based processing Extensible aggregates Spatial & temporal On to Question 2: What about efficiency?

Q2: Efficiency Metric: power consumption
Goal: reduce communication, which dominates cost 800 instrs/bit! Standard approach: in-network processing, sleeping whenever you can…

But that’s not good enough…
What else can we do to bring down costs? Sleep Even More? Events are key Apply automatic optimization! Semantically driven routing …and topology construction Operator placement + ordering Adaptive data delivery

TAG In-network processing Exploitation of operator semantics
Reduces costs depending on type of aggregates Exploitation of operator semantics Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002 (to appear).

Illustration: Pipelined Aggregation
SELECT COUNT(*) FROM sensors 1 2 3 4 5 Depth = d

SELECT COUNT(*) FROM sensors 1 Epoch 1 1 2 3 4 5 Sensor # 1 2 3 4 5 1 1 1 Epoch # 1

Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10
Neighbors = ~20 Some aggregates require dramatically more state!

Taxonomy of Aggregates
TAG insight: classify aggregates according to various functional properties Yields a general set of optimizations that can automatically be applied Property Examples Affects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss Monotonic COUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping

Optimization: Channel Sharing
Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report Applies to all exemplary, monotonic aggregates Learn about query advertisements it missed If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. Root doesn’t have to explicitly rebroadcast query!

Optimization: Hypothesis Testing
Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. Depends on monotonicity How is hypothesis computed? Blind guess Statistically informed guess Observation over first few levels of tree / rounds of aggregate

Optimization: Use Multiple Parents
For duplicate insensitive aggregates Or aggregates that can be expressed as a linear combination of parts Send (part of) aggregate to all parents Decreases variance Dramatically, when there are lots of parents A B C A B C 1/2 A B C A B C A B C 1

TAG Summary In Query Processing A Win For Many Aggregate Functions
By exploiting general functional properties of operators, many optimizations are possible Requires new aggregates to be tagged with their properties Up next: non-aggregate query processing optimizations – a flavor of things to come!

Attribute Driven Topology Selection
Observation: internal queries often over local area* Or some other subset of the network E.g. regions with light value in [10,20] Idea: build topology for those queries based on values of range-selected attributes Requires range attributes, connectivity to be relatively static * Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level Naming. SOSP, 2001.

Attribute Driven Query Propagation
SELECT … WHERE a > 5 AND a < 12 Precomputed intervals == “Query Dissemination Index” 4 [1,10] [20,40] [7,15] 1 2 3

Attribute Driven Parent Selection
Even without intervals, expect that sending to parent with closest value will help 1 2 3 [1,10] [7,15] [20,40] [3,6]  [1,10] = [3,6] [3,7]  [7,15] = ø [3,7]  [20,40] = ø 4 [3,6]

Hot off the press…

Operator Placement & Ordering
Observation: Nested queries, triggers, and joins can often be re-ordered Ordering can dramatically affect the amount of work you do Lots of standard database tricks here

Operator Ordering Example 1
SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s Cost (in J) of sampling mag >> cost of sampling light Correct ordering (unless pred1 is very selective): 1. Sample light 2. Apply pred2 3. Sample mag 4. Apply pred1

Operator Ordering Example 2
ON EVENT bird-enter(…) WHERE pred1(event) SELECT light WHERE pred2(light) FROM sensors SAMPLE INTERVAL 5s FOR 30s “Every time an event occurs that satisfies pred1, sample lights once every 5 seconds for 30 seconds and report the samples that satisfy pred2” Note: makes all samples in phase in sample window “Sample light once every 5 seconds. For every sample that satisfies pred2, check and see if any events that satisfy pred1 have occurred in the last 30 seconds.” SELECT s.light FROM bird-enter-events[30s] AS e, sensors AS s WHERE e.time < s.time AND pred1(e) AND pred2(s.light) SAMPLE INTERVAL 5s

Adaptivity For Contention
Observation: Under high contention, radios deliver fewer total packets than under low contention. Insight: Don’t allow radios to be highly contested. Drop or aggregate instead. Higher throughput Choice over what gets lost Based on semantics!

Adaptivity for Power Conservation
For many applications, exact sample rate doesn’t matter But network lifetime does! Idea: adaptively adjust sample rate & extent of aggregation based on lifetime goal and observed power consumption

Efficiency Summary Power is the important metric TAG
In-network processing Exploit semantics of network and operators Channel sharing Hypothesis testing Using multiple parents Indexing for dissemination + collection of data Placement and Operator Ordering Adaptive Sampling

Q3: Answer Quality Lots of possibilities for improving quality
Multi-path routing When applicable Transactional delivery a.k.a. custody transfer Link-layer retransmission Caching Failure still possible in all modes Open question: what’s the right quality metric?

Diffusion as TinyDB Foundation?
Claim: diffusion is an infrastructure upon which TinyDB could be built Via declarative language, TinyDB is able to provide semantic guarantees and transparent optimization Operators can be reordered Any tuple can be routed to any operator No (important) duplicates will be produced At what cost? Diffusion can: Adjust better to loss Exploit well-connected networks Provide n-m routing, instead of n-1 routing Might allow global buffers, events, etc.

Summary Declarative queries are the right interface for data collection in sensor nets! In network processing and optimization make approach viable Big query language improvements coming soon… Event driven & internal queries Adaptive sampling + query indexes for performance! TinyDB Available Today –

Questions?

Grouping GROUP BY expr expr is an expression over one or more attributes Evaluation of expr yields a group number Each reading is a member of exactly one group Example: SELECT max(light) FROM sensors GROUP BY TRUNC(temp/10) Result: Sensor ID Light Temp Group 1 45 25 2 27 28 3 66 34 4 68 37 Group max(light) 2 45 3 68

Having HAVING preds preds filters out groups that do not satisfy predicate versus WHERE, which filters out tuples that do not satisfy predicate Example: SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100 Yields all groups with temperature under 100

Group Eviction Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! Choose one or more groups to forward up tree Rely on nodes further up tree, or root, to recombine groups properly What policy to choose? Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. Experiments suggest: Policy matters very little Evicting as many groups as will fit into a single message is good

Simulation Environment
Java-based simulation & visualization for validating algorithms, collecting data. Coarse grained event based simulation Sensors arranged on a grid, radio connectivity by Euclidian distance Communication model Lossless: All neighbors hear all messages Lossy: Messages lost with probability that increases with distance Symmetric links No collisions, hidden terminals, etc.

Simulation Screenshot

Experiment: Basic TAG Dense Packing, Ideal Communication

Experiment: Hypothesis Testing
Uniform Value Distribution, Dense Packing, Ideal Communication

Experiment: Effects of Loss

Experiment: Benefit of Cache

Pipelined Aggregates After query propagates, during each epoch: Each sensor samples local sensors once Combines them with PSRs from children Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root d = Depth of the routing tree If desired, partial state from top k levels could be output in kth epoch To avoid combining PSRs from different epochs, sensors must cache values from children Value from 2 produced at time t arrives at 1 at time (t+1) 1 2 3 4 5 Value from 5 produced at time t arrives at 1 at time (t+3)

Pipelining Example 1 2 3 4 5 SID Epoch Agg. SID Epoch Agg. SID Epoch

Pipelining Example Epoch 0 1 2 <4,0,1> 3 4 <5,0,1> 5 SID
Agg. 1 Epoch 0 1 SID Epoch Agg. 2 1 4 2 <4,0,1> 3 4 <5,0,1> SID Epoch Agg. 3 1 5 5

Pipelining Example Epoch 1 1 <2,0,2> 2 <3,0,2>
SID Epoch Agg. 1 2 Epoch 1 1 SID Epoch Agg. 2 1 4 3 <2,0,2> 2 <3,0,2> <4,1,1> 3 4 <5,1,1> SID Epoch Agg. 3 1 5 5

Pipelining Example <1,0,3> Epoch 2 1 <2,0,4> 2
SID Epoch Agg. 1 2 4 Epoch 2 1 SID Epoch Agg. 2 1 4 3 <2,0,4> 2 <3,1,2> <4,2,1> 3 4 <5,2,1> SID Epoch Agg. 3 1 5 2 5

SID Epoch Agg. 1 2 4 Epoch 3 1 SID Epoch Agg. 2 1 4 3 <2,1,4> 2 <3,2,2> <4,3,1> 3 4 <5,3,1> SID Epoch Agg. 3 1 5 2 5

<3,3,2> <4,4,1> 3 4 <5,4,1> 5

Our Stream Semantics One stream, ‘sensors’ We control data rates
Joins between that stream and buffers are allowed Joins are always landmark, forward in time, one tuple at a time Result of queries over ‘sensors’ either a single tuple (at time of query) or a stream Easy to interface to more sophisticated systems Temporal aggregates enable fancy window operations

Formal Spec. ON EVENT <event> [<boolop> <event>... WITHIN <window>] [SELECT {<expr>|agg(<expr>)|temporalagg(<expr>)} FROM [sensors | <buffer> | events]] [WHERE {<pred>}] [GROUP BY {<expr>}] [HAVING {<pred>}] [ACTION [<command> [WHERE <pred>] | BUFFER <bufname> SIGNAL <event>({<params>}) | (SELECT ... ) [INTO BUFFER <bufname>]]] [SAMPLE PERIOD <seconds> [FOR <nrounds>] [INTERPOLATE <expr>] [COMBINE {temporal_agg(<expr>)}] | ONCE]

Buffer Commands [AT <pred>:]
CREATE [<type>] BUFFER <name> ({<type>}) PARTITION BY [<expr>] SIZE [<ntuples>,<nseconds>] [AS SELECT ... [SAMPLE PERIOD <seconds>]] DROP BUFFER <name>

Querying Sensor Networks

Similar presentations

Presentation on theme: "Querying Sensor Networks"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Querying Sensor Networks

Similar presentations

Presentation on theme: "Querying Sensor Networks"— Presentation transcript:

Similar presentations

About project

Feedback