1 Querying Sensor Networks Sam Madden UC Berkeley December 13 th, New England Database Seminar.

Slides:



Advertisements
Similar presentations
System Integration and Performance
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Trickle: Code Propagation and Maintenance Neil Patel UC Berkeley David Culler UC Berkeley Scott Shenker UC Berkeley ICSI Philip Levis UC Berkeley.
한국기술교육대학교 컴퓨터 공학 김홍연 TinyDB : An Acquisitional Query Processing System for Sensor Networks. - Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein,
Overview: Chapter 7  Sensor node platforms must contend with many issues  Energy consumption  Sensing environment  Networking  Real-time constraints.
Towards a Sensor Network Architecture: Lowering the Waistline Culler et.al. UCB.
1 Sensor Network Databases Ref: Wireless sensor networks---An information processing approach Feng Zhao and Leonidas Guibas (chapter 6)
1 Querying Sensor Networks Sam Madden UC Berkeley.
Monday, June 01, 2015 ARRIVE: Algorithm for Robust Routing in Volatile Environments 1 NEST Retreat, Lake Tahoe, June
Programming Vast Networks of Tiny Devices David Culler University of California, Berkeley Intel Research Berkeley
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.
Scaling Down Robert Grimm New York University. Scaling Down in One Slide  Target devices (roughly)  Small form factor  Battery operated  Wireless.
Reconfigurable Sensor Networks Chris Elliott Honours in Digital Systems Charles Greif and Nandita Bhattacharjee.
Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.
A Survey of Wireless Sensor Network Data Collection Schemes by Brett Wilson.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
Taming the Underlying Challenges of Reliable Multihop Routing in Sensor Networks.
Systems Wireless EmBedded Macroprogramming Eric Brewer (with help from David Gay, Rob von Behren, and Phil Levis)
1 Acquisitional Query Processing in TinyDB Sam Madden UC Berkeley NEST Winter Retreat 2003.
Adaptive Self-Configuring Sensor Network Topologies ns-2 simulation & performance analysis Zhenghua Fu Ben Greenstein Petros Zerfos.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS Presented by Akash Kapoor SAMUEL MADDEN, MICHAEL J. FRANKLIN, JOSEPH HELLERSTEIN, AND WEI HONG.
15-744: Computer Networking L-13 Sensor Networks.
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Paper By : Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong Instructor :
A System Architecture for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong.
TinyOS By Morgan Leider CS 411 with Mike Rowe with Mike Rowe.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.
1 Pradeep Kumar Gunda (Thanks to Jigar Doshi and Shivnath Babu for some slides) TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden,
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Authors: Samuel Madden, Michael Franklin, Joseph Hellerstein Presented by: Vikas Motwani CSE.
1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.
INT 598 Data Management for Sensor Networks Silvia Nittel Spatial Information Science & Engineering University of Maine Fall 2006.
Sensor Database System Sultan Alhazmi
The Design of an Acquisitional Query Processor for Sensor Networks CS851 Presentation 2005 Presented by: Gang Zhou University of Virginia.
CS542 Seminar – Sensor OS A Virtual Machine For Sensor Networks Oct. 28, 2009 Seok Kim Eugene Seo R. Muller, G. Alonso, and D. Kossmann.
한국기술교육대학교 컴퓨터 공학 김홍연 Habitat Monitoring with Sensor Networks DKE.
Query Processing for Sensor Networks Yong Yao and Johannes Gehrke (Presentation: Anne Denton March 8, 2003)
Opportunities in High-Rate Wireless Sensor Networking Hari Balakrishnan MIT CSAIL
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
ResTAG: Resilient Event Detection with TinyDB Angelika Herbold -Western Washington University Thierry Lamarre -ENSEIRB Systems Software Laboratory, OGI.
Aggregation and Secure Aggregation. Learning Objectives Understand why we need aggregation in WSNs Understand aggregation protocols in WSNs Understand.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
In-Network Query Processing on Heterogeneous Hardware Martin Lukac*†, Harkirat Singh*, Mark Yarvis*, Nithya Ramanathan*† *Intel.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Aggregation and Secure Aggregation. [Aggre_1] Section 12 Why do we need Aggregation? Sensor networks – Event-based Systems Example Query: –What is the.
Sep Multiple Query Optimization for Wireless Sensor Networks Shili Xiang Hock Beng Lim Kian-Lee Tan (ICDE 2007) Presented by Shan Bai.
1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.
Building Wireless Efficient Sensor Networks with Low-Level Naming J. Heihmann, F.Silva, C. Intanagonwiwat, R.Govindan, D. Estrin, D. Ganesan Presentation.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.
- Pritam Kumat - TE(2) 1.  Introduction  Architecture  Routing Techniques  Node Components  Hardware Specification  Application 2.
TAG: a Tiny AGgregation service for ad-hoc sensor networks Authors: Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong Presenter: Mingwei.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Demetrios Zeinalipour-Yazti (Univ. of Cyprus)
Introduction to Wireless Sensor Networks
Querying Sensor Networks
Distributed database approach,
The Design of an Acquisitional Query Processor For Sensor Networks
Trickle: Code Propagation and Maintenance
Querying Sensor Networks
Distributing Queries Over Low Power Sensor Networks
Querying Sensor Networks
REED : Robust, Efficient Filtering and Event Detection
Aggregation.
Presentation transcript:

1 Querying Sensor Networks Sam Madden UC Berkeley December 13 th, New England Database Seminar

2 TinyDB Introduction What is a sensor network? Programming sensor nets is hard! Declarative queries are easy –TinyDB : In-network processing via declarative queries Example: »Vehicle tracking application: 2 weeks for 2 students »Vehicle tracking query: took 2 minutes to write, worked just as well! SELECT nodeid FROM sensors WHERE mag > thresh EPOCH DURATION 64ms

3 Overview Sensor Networks Why Queries in Sensor Nets TinyDB –Features –Demo Focus: Acquisitional Query Processing

4 Overview Sensor Networks Why Queries in Sensor Nets TinyDB –Features –Demo Focus: Acquisitional Query Processing

5 Device Capabilities “Mica Motes” –8bit, 4Mhz processor »Roughly a PC AT –40kbit CSMA radio –4KB RAM, 128K flash, 512K EEPROM –Sensor board expansion slot »Standard board has light & temperature sensors, accelerometer, magnetometer, microphone, & buzzer Other more powerful platforms exist –E.g. UCLA WINS, Medusa, MIT Cricket, Princeton Zebranet Trend towards smaller devices –“Smart Dust” – Kris Pister, et al.

6 Sensor Net Sample Apps Habitat Monitoring. Storm petrels on great duck island, microclimates on James Reserve. Traditional monitoring apparatus. Earthquake monitoring in shake- test sites. Vehicle detection: sensors dropped from UAV along a road, collect data about passing vehicles, relay data back to UAV.

7 Metric: Communication Lifetime from one pair of AA batteries –2-3 days at full power –6 months at 2% duty cycle Communication dominates cost –< few mS to compute –30mS to send message Our metric: communication!

8 TinyOS Operating system from David Culler’s group at Berkeley C-like programming environment Provides messaging layer, abstractions for major hardware components –Split phase highly asynchronous, interrupt- driven programming model Hill, Szewczyk, Woo, Culler, & Pister. “Systems Architecture Directions for Networked Sensors.” ASPLOS See

9 Communication In Sensor Nets Radio communication has high link-level losses –typically about 5m Ad-hoc neighbor discovery Tree-based routing A B C D F E

10 Overview Sensor Networks Why Queries in Sensor Nets TinyDB –Features –Demo Acquisitional Query Processing

11 Declarative Queries for Sensor Networks Examples: SELECT nodeid, light FROM sensors WHERE light > 400 EPOCH DURATION 1s 1 EpochNodeidLightTempAccelSound 01455xxx 02389xxx 11422xxx 12405xxx Sensors

12 Aggregation Queries SELECT roomNo, AVG(sound) FROM sensors GROUP BY roomNo HAVING AVG(sound) > 200 EPOCH DURATION 10s Rooms w/ sound > SELECT AVG(sound) FROM sensors EPOCH DURATION 10s EpochAVG(sound) EpochroomNoAVG(sound)

13 Declarative Benefits In Sensor Networks Reduces Complexity –Locations as predicates –Operations are over groups Fault Tolerance –Control of when & where Data independence –Control of representation & storage »Indices, join location, materialization points, RAM vs EEPROM, etc.

14 Computing In Sensor Nets Is Hard –Limited power »Power-based query optimization »Routing indices »In-network computation »Exploitation of operator semantics (TAG, OSDI 2002) –Lossy, low-bandwidth communication »In-network computation »Caching, retransmission, etc. »Data prioritization –Remote, zero administration, long lived deployments »Ad-hoc (fault-tolerant) networking »Lifetime estimation »Semantically aware routing (TAG) –Limited processing capabilities, storage

15 Overview Sensor Networks Why Queries in Sensor Nets TinyDB –Features –Demo Focus: Tiny Aggregation The Next Step

16 TinyDB A distributed query processor for networks of Mica motes –Available today! Goal: Eliminate the need to write C code for most TinyOS users Features –Declarative queries –Temporal + spatial operations –Multihop routing –In-network storage

17 A B C D F E Ft Query {D,E,F} {B,D,E,F} {A,B,C,D,E,F} Written in SQL-Like Language With Extensions For : Sample rate Offline delivery Temporal Aggregation (Almost) All Queries are Continuous and Periodic

18 TinyDB Demo

19 Applications + Early Adopters Some demo apps: –Network monitoring –Vehicle tracking “Real” future deployments: –Environmental GDI (and James Reserve?) –Generic Sensor Kit –Building Monitoring Demo!

20 Benefit of TinyDB SELECT COUNT(light) SAMPLE PERIOD 4s Cost metric = #msgs 16 nodes 150 Epochs In-net loss rates: 5% Centralized loss: 15% Network depth: 4

21 TinyDB Architecture (Per node) Radio Stack Schema TinyAllloc TupleRouter AggOperator SelOperator Network TupleRouter: Fetches readings (for ready queries) Builds tuples Applies operators Deliver results (up tree) AggOperator: Combines local & neighbor readings SelOperator: Filters readings Schema: “Catalog” of commands & attributes (more later) TinyAlloc: Reusable memory allocator! ~10,000 Lines C Code ~5,000 Lines Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program)

22 Catalog & Schema Manager Attribute & Command IF –Components register attributes and commands they support »Commands implemented via wiring »Attributes fetched via accessor command –Catalog API allows local and remote queries over known attributes / commands. Sensor specific metadata –Power to access attributes –Time to access attribute

23 Overview Sensor Networks Why Queries in Sensor Nets TinyDB –Features –Demo Acquisitional Query Processing

24 Acquisitional Query Processing Cynical question: what’s really different about sensor networks? –Low Power? –Lots of Nodes? –Limited Processing Capabilities? Laptops! Distributed DBs! Moore’s Law! Being a little bit facetious, but…

25 Answer Long running queries on physically embedded devices that control when and and with what frequency data is collected! Versus traditional systems where data is provided a priori

26 ACQP: What’s Different? How does the user control acquisition? –Rates or lifetimes. –Event-based triggers How should the query be processed? –Sampling as an operator! –Events as joins Which nodes have relevant data? –Semantic Routing Tree »Nodes that are queried together route together Which samples should be transmitted? –Pick most “valuable”?

27 ACQP How does the user control acquisition? –Rates or lifetimes. –Event-based triggers How should the query be processed? –Sampling as an operator! –Events as joins Which nodes have relevant data? –Semantic Routing Tree »Nodes that are queried together route together Which samples should be transmitted? –Pick most “valuable”?

28 Lifetime Queries Lifetime vs. sample rate SELECT … LIFETIME 30 days SELECT … LIFETIME 10 days MIN SAMPLE INTERVAL 1s Implies not all data is xmitted

29 Processing Lifetimes At root –Compute SAMPLE PERIOD that satisfies lifetime –If it exceeds MIN SAMPLE PERIOD (MSP), use MSP and compute transmission rate At other nodes –Use root’s values or slower Root = bottleneck –Multiple roots? –Adaptive roots?

30 Lifetime Based Queries

31 Event Based Processing ACQP – want to initiate queries in response to events ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE In-network storage Subject to optimization CREATE BUFFER birds(uint16 cnt) SIZE 1

32 More Events ON EVENT bird_detect(loc) AS bd SELECT AVG(s.light), AVG(s.temp) FROM sensors AS s WHERE dist(bd.loc,s.loc) < 10m SAMPLE PERIOD 1s for 10 [Coming soon!]

33 ACQP How does the user control acquisition? –Rates or lifetimes. –Event-based triggers How should the query be processed? –Sampling as an operator! –Events as joins Which nodes have relevant data? –Semantic Routing Tree »Nodes that are queried together route together Which samples should be transmitted? –Pick most “valuable”?

34 Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s E(mag) >> E(light) 1500 uJ vs. 90 uJ Possible orderings: At 1 sample / sec, total power savings could be as much as 4mW, same as the processor! 2. Sample light Apply pred2 Sample mag Apply pred1 1. Sample light Sample mag Apply pred1 Apply pred2 3.Sample mag Apply pred1 Sample light Apply pred2

35 Optimizing in ACQP Sampling = “expensive predicate” Some subtleties: –Which predicate to “charge”? –Can’t operate without samples Solution: –Treat sampling as a separate task –Build a partial order –Solve for cheapest schedule using series-parallel scheduling algorithm »Monma & Sidney, 1979, as in Ibaraki & Kameda, TODS, 1984, or Hellerstein, TODS, 1998.

36 Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x SAMPLE INTERVAL 1s Unless > x is very selective, correct ordering is: Sample light Check if it’s the maximum If it is: Sample mag Check predicate If satisfied, update maximum

37 Event-Join Duality ON EVENT E(nodeid) SELECT a FROM sensors AS s WHERE s.nodeid = e.nodeid SAMPLE INTERVAL d FOR k Problem: multiple outstanding queries (lots of samples) t d d d/2 High event frequency → Use Rewrite Rewrite problem: phase alignment! Solution: subsample SELECT s.a FROM sensors AS s, events AS e WHERE s.nodeid = e.nodeid AND e.type = E AND s.time – e.time < k AND s.time > e.time SAMPLE INTERVAL d

38 ACQP How does the user control acquisition? –Rates or lifetimes. –Event-based triggers How should the query be processed? –Sampling as an operator! –Events as joins Which nodes have relevant data? –Semantic Routing Tree »Nodes that are queried together route together Which samples should be transmitted? –Pick most “valuable”?

39 Attribute Driven Topology Selection Observation: internal queries often over local area –Or some other subset of the network »E.g. regions with light value in [10,20] Idea: build topology for those queries based on values of range-selected attributes –For range queries –Relatively static trees »Maintenance Cost

40 Attribute Driven Query Propagation [1,10] [7,15] [20,40] SELECT … WHERE a > 5 AND a < 12 Precomputed intervals = Semantic Routing Tree (SRT)

41 Attribute Driven Parent Selection [1,10] [7,15] [20,40] [3,6] [3,6]  [1,10] = [3,6] [3,7]  [7,15] = ø [3,7]  [20,40] = ø Even without intervals, expect that sending to parent with closest value will help

42 Simulation Result ~14% Reduction

43 ACQP How does the user control acquisition? –Rates or lifetimes. –Event-based triggers How should the query be processed? –Sampling as an operator! –Events as joins Which nodes have relevant data? –Semantic Routing Tree »Nodes that are queried together route together Which samples should be transmitted? –Pick most “valuable”?

44 Adaptive Rate Control Adaptive = 2x Successful Xmissions

45 Delta Encoding Must pick most valuable data How? –Domain Dependent »E.g., largest, average, shape preserving, frequency preserving, most samples, etc. Simple idea for time-series: order biggest-change-first

46 Choosing Data To Send Score each item Send largest score –Out of order -> Priority Queue Discard / aggregate when full [1,2][2,6] [3,15] [4,1] [5,4] [5,2.5] t=4t=5 TimeValue

47 Choosing Data To Send t=1 [1,2]

48 Choosing Data To Send t=5 [2,6] [3,15] [4,1] [1,2] |2-6| = 4 |2-15| = 13 |2-4| = 2

49 Choosing Data To Send t=5 [2,6] [3,15] [4,1] [1,2] |2-6| = 4 |15-4| = 11

50 Choosing Data To Send t=5 [2,6] [3,15] [4,1] [1,2]

51 Choosing Data To Send t=5 [2,6] [3,15] [4,1] [1,2]

52 Delta + Adaptivity 8 element queue 4 motes transmitting different signals 8 samples /sec / mote

53 Aggregate Prioritization Insight: Shared channel enables nodes to hear neighbor values Suppress values that won’t affect aggregate –E.g., MAX –Applies to all exemplary, monotonic aggregates e.g. top/bottom N, MIN, MAX, etc.

54 Hypothesis Testing Insight: Guess from root can be used for suppression –E.g. ‘MIN < 50’ –Works for monotonic & exemplary aggregates »Also summary, if imprecision allowed How is hypothesis computed? –Blind or statistically informed guess –Observation over network subset

55 Simulation: Aggregate Prioritization Uniform Value Distribution Dense Packing Ideal Communication

56 ACQP Summary Lifetime & event based queries –User preferences for when data is acquired Optimizations for –Order of sampling –Events vs. joins Semantic Routing Tree –Query dissemination Runtime prioritization –Adaptive rate control –Which samples to send

57 Fun Stuff Temporal aggregates Sophisticated or sensor network specific aggregates –Mapping –Tracking –Wavelets

58 Temporal Aggregates TAG was about “spatial” aggregates –Inter-node, at the same time Want to be able to aggregate across time as well Two types: –Windowed: AGG(size,slide,attr) –Decaying: AGG(comb_func, attr) –Demo! … R1 R2 R3 R4 R5 R6 … slide =2 size =4

59 Isobar Finding

60 Summary Declarative queries are the right interface for data collection in sensor nets! –Easier, faster, & more robust Acquisitional Query Processing –Framework for addresses many new issues that arise in sensor networks, e.g. »Order of sampling and selection »Languages, indices, approximations that give user control over which data enters the system TinyDB Release Available -

61 Questions?

62 Event Based Processing

63 Count vs. Time

64 Simulation Environment Chose to simulate to allow 1000’s of nodes and control of topology, connectivity, loss Java-based simulation & visualization for validating algorithms, collecting data. Coarse grained event based simulation –Sensors arranged on a grid, radio connectivity by Euclidian distance –Communication model »Lossless: All neighbors hear all messages »Lossy: Messages lost with probability that increases with distance »Symmetric links »No collisions, hidden terminals, etc.

65 Simulation Result Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 Some aggregates require dramatically more state!

66 Taxonomy of Aggregates TAG insight: classify aggregates according to various functional properties –Yields a general set of optimizations that can automatically be applied PropertyExamplesAffects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss MonotonicCOUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping

67 Optimization: Channel Sharing (“Snooping”) Insight: Shared channel enables optimizations Suppress messages that won’t affect aggregate –E.g., in a MAX query, sensor with value v hears a neighbor with value ≥ v, so it doesn’t report –Applies to all exemplary, monotonic aggregates Learn about query advertisements it missed –If a sensor shows up in a new environment, it can learn about queries by looking at neighbors messages. »Root doesn’t have to explicitly rebroadcast query!

68 Optimization: Hypothesis Testing Insight: Root can provide information that will suppress readings that cannot affect the final aggregate value. –E.g. Tell all the nodes that the MIN is definitely < 50; nodes with value ≥ 50 need not participate. –Works for monotonic & exemplary –Can be applied to summary aggregates also if imprecision is allowed How is hypothesis computed? –Blind guess –Statistically informed guess –Observation over first few levels of tree / rounds of aggregate

69 Experiment: Hypothesis Testing Uniform Value Distribution, Dense Packing, Ideal Communication

70 Optimization: Use Multiple Parents For duplicate insensitive aggregates Or aggregates that can be expressed as a linear combination of parts –Send (part of) aggregate to all parents –Decreases variance »Dramatically, when there are lots of parents A BC A BC A BC 1 A BC A BC 1/2 No splitting: E(count) = c * p Var(count) = c 2 * p * (1-p) With Splitting: E(count) = 2 * c/2 * p Var(count) = 2 * (c/2) 2 * p * (1-p)

71 Multiple Parents Results Interestingly, this technique is much better than previous analysis predicted! Losses aren’t independent! Instead of focusing data on a few critical links, spreads data over many links Critical Link! No Splitting With Splitting

72 TAG Summary In-network query processing a big win for many aggregate functions By exploiting general functional properties of operators, optimizations are possible –Requires new aggregates to be tagged with their properties Up next: non-aggregate query processing optimizations – a flavor of things to come!

73 TAG In-network processing of aggregates –Aggregates are common operation –Reduces costs depending on type of aggregates –Focus on “spatial aggregation” (Versus “temporal aggregation”) Exploitation of operator, functional semantics Tiny AGgregation (TAG), Madden, Franklin, Hellerstein, Hong. OSDI 2002.

74 Aggregation Framework As in extensible databases, we support any aggregation function conforming to: Agg n ={f merge, f init, f evaluate } F merge {, }  f init {a 0 }  F evaluate { }  aggregate value (Merge associative, commutative!) Example: Average AVG merge {, }  AVG init {v}  AVG evaluate { }  S 1 /C 1 Partial State Record (PSR) Just like parallel database systems – e.g. Bubba!

75 Query Propagation Review A B C D F E SELECT AVG(light)…

76 Pipelined Aggregates After query propagates, during each epoch: –Each sensor samples local sensors once –Combines them with PSRs from children –Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root –d = Depth of the routing tree –If desired, partial state from top k levels could be output in k th epoch To avoid combining PSRs from different epochs, sensors must cache values from children Value from 5 produced at time t arrives at 1 at time (t+3) Value from 2 produced at time t arrives at 1 at time (t+1)

77 Illustration: Pipelined Aggregation SELECT COUNT(*) FROM sensors Depth = d

78 Illustration: Pipelined Aggregation Sensor # Epoch # Epoch 1 SELECT COUNT(*) FROM sensors

79 Illustration: Pipelined Aggregation Sensor # Epoch # Epoch 2 SELECT COUNT(*) FROM sensors

80 Illustration: Pipelined Aggregation Sensor # Epoch # Epoch 3 SELECT COUNT(*) FROM sensors

81 Illustration: Pipelined Aggregation Sensor # Epoch # Epoch 4 SELECT COUNT(*) FROM sensors

82 Illustration: Pipelined Aggregation Sensor # Epoch # Epoch 5 SELECT COUNT(*) FROM sensors

83 Grouping If query is grouped, sensors apply predicate on each epoch PSRs tagged with group When a PSR (with group) is received: –If it belongs to a stored group, merge with existing PSR –If not, just store it At the end of each epoch, transmit one PSR per group

84 Group Eviction Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! (Partial Preaggregation*) –Choose one or more groups to forward up tree –Rely on nodes further up tree, or root, to recombine groups properly –What policy to choose? »Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. »Experiments suggest: Policy matters very little Evicting as many groups as will fit into a single message is good * Per-Åke Larson. Data Reduction by Partial Preaggregation. ICDE 2002.

85 TAG Advantages In network processing reduces communication –Important for power and contention Continuous stream of results –In the absence of faults, will converge to right answer Lots of optimizations –Based on shared radio channel –Semantics of operators

86 Simulation Screenshot

87 Hypothesis Testing For Average AVERAGE: each node suppresses readings within some ∆ of a approximate average µ*. –Parents assume children who don’t report have value µ* Computed average cannot be off by more than ∆.

88 TinyAlloc Handle Based Compacting Memory Allocator For Catalog, Queries Free Bitmap Heap Master Pointer Table Handle h; call MemAlloc.alloc(&h,10); … (*h)[0] = “Sam”; call MemAlloc.lock(h); tweakString(*h); call MemAlloc.unlock(h); call MemAlloc.free(h); User Program Free Bitmap Heap Master Pointer Table Free Bitmap Heap Master Pointer Table Free Bitmap Heap Master Pointer Table Compaction

89 Schema Attribute & Command IF –At INIT(), components register attributes and commands they support »Commands implemented via wiring »Attributes fetched via accessor command –Catalog API allows local and remote queries over known attributes / commands. Demo of adding an attribute, executing a command.

90 Q1: Expressiveness Simple data collection satisfies most users How much of what people want to do is just simple aggregates? –Anecdotally, most of it –EE people want filters + simple statistics (unless they can have signal processing) However, we’d like to satisfy everyone!

91 Query Language New Features: –Joins –Event-based triggers »Via extensible catalog –In network & nested queries –Split-phase (offline) delivery »Via buffers

92 Sample Query 1 Bird counter: CREATE BUFFER birds(uint16 cnt) SIZE 1 ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE

93 Sample Query 2 Birds that entered and left within time t of each other: ON EVENT bird-leave AND bird-enter WITHIN t SELECT bird-leave.time, bird-leave.nest WHERE bird-leave.nest = bird-enter.nest ONCE

94 Sample Query 3 Delta compression: SELECT light FROM buf, sensors WHERE | s.light – buf.light | > t OUTPUT INTO buf SAMPLE PERIOD 1s

95 Sample Query 4 Offline Delivery + Event Chaining CREATE BUFFER equake_data( uint16 loc, uint16 xAccel, uint16 yAccel) SIZE 1000 PARTITION BY NODE SELECT xAccel, yAccel FROM SENSORS WHERE xAccel > t OR yAccel > t SIGNAL shake_start(…) SAMPLE PERIOD 1s ON EVENT shake_start(…) SELECT loc, xAccel, yAccel FROM sensors OUTPUT INTO BUFFER equake_data(loc, xAccel, yAccel) SAMPLE PERIOD 10ms

96 Event Based Processing Enables internal and chained actions Language Semantics –Events are inter-node –Buffers can be global Implementation plan –Events and buffers must be local –Since n-to-n communication not (well) supported Next: operator expressiveness

97 Attribute Driven Topology Selection Observation: internal queries often over local area* –Or some other subset of the network »E.g. regions with light value in [10,20] Idea: build topology for those queries based on values of range-selected attributes –Requires range attributes, connectivity to be relatively static * Heideman et. Al, Building Efficient Wireless Sensor Networks With Low Level Naming. SOSP, 2001.

98 Attribute Driven Query Propagation [1,10] [7,15] [20,40] SELECT … WHERE a > 5 AND a < 12 Precomputed intervals == “Query Dissemination Index”

99 Attribute Driven Parent Selection [1,10] [7,15] [20,40] [3,6] [3,6]  [1,10] = [3,6] [3,7]  [7,15] = ø [3,7]  [20,40] = ø Even without intervals, expect that sending to parent with closest value will help

100 Hot off the press…

101 Grouping GROUP BY expr –expr is an expression over one or more attributes »Evaluation of expr yields a group number »Each reading is a member of exactly one group Example: SELECT max(light) FROM sensors GROUP BY TRUNC(temp/10) Sensor IDLightTempGroup Groupmax(light) Result:

102 Having HAVING preds –preds filters out groups that do not satisfy predicate –versus WHERE, which filters out tuples that do not satisfy predicate –Example: SELECT max(temp) FROM sensors GROUP BY light HAVING max(temp) < 100 Yields all groups with temperature under 100

103 Group Eviction Problem: Number of groups in any one iteration may exceed available storage on sensor Solution: Evict! –Choose one or more groups to forward up tree –Rely on nodes further up tree, or root, to recombine groups properly –What policy to choose? »Intuitively: least popular group, since don’t want to evict a group that will receive more values this epoch. »Experiments suggest: Policy matters very little Evicting as many groups as will fit into a single message is good

104 Experiment: Basic TAG Dense Packing, Ideal Communication

105 Experiment: Hypothesis Testing Uniform Value Distribution, Dense Packing, Ideal Communication

106 Experiment: Effects of Loss

107 Experiment: Benefit of Cache

108 Pipelined Aggregates After query propagates, during each epoch: –Each sensor samples local sensors once –Combines them with PSRs from children –Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root –d = Depth of the routing tree –If desired, partial state from top k levels could be output in k th epoch To avoid combining PSRs from different epochs, sensors must cache values from children Value from 5 produced at time t arrives at 1 at time (t+3) Value from 2 produced at time t arrives at 1 at time (t+1)

109 Pipelining Example SI D EpochAgg. SIDEpochAgg. SIDEpochAgg.

110 Pipelining Example SI D EpochAgg SI D EpochAgg. 101 SIDEpochAgg Epoch 0

111 Pipelining Example SIDEpochAgg SI D EpochAgg SIDEpochAgg Epoch 1

112 Pipelining Example SIDEpochAgg SI D EpochAgg SIDEpochAgg Epoch 2

113 Pipelining Example SIDEpoc h Agg SIDEpochAgg SIDEpochAgg Epoch 3

114 Pipelining Example Epoch 4

115 Our Stream Semantics One stream, ‘sensors’ We control data rates Joins between that stream and buffers are allowed Joins are always landmark, forward in time, one tuple at a time –Result of queries over ‘sensors’ either a single tuple (at time of query) or a stream Easy to interface to more sophisticated systems Temporal aggregates enable fancy window operations

116 Formal Spec. ON EVENT [... WITHIN ] [SELECT { |agg( )|temporalagg( )} FROM [sensors | | events]] [WHERE { }] [GROUP BY { }] [HAVING { }] [ACTION [ [WHERE ] | BUFFER SIGNAL ({ }) | (SELECT... ) [INTO BUFFER ]]] [SAMPLE PERIOD [FOR ] [INTERPOLATE ] [COMBINE {temporal_agg( )}] | ONCE]

117 Buffer Commands [AT :] CREATE [ ] BUFFER ({ }) PARTITION BY [ ] SIZE [, ] [AS SELECT... [SAMPLE PERIOD ]] DROP BUFFER