1 Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley Sam Madden.

Slides:



Advertisements
Similar presentations
Copyright ©2004 Carlos Guestrin VLDB 2004 Efficient Data Acquisition in Sensor Networks Presented By Kedar Bellare (Slides adapted.
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
한국기술교육대학교 컴퓨터 공학 김홍연 TinyDB : An Acquisitional Query Processing System for Sensor Networks. - Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein,
1 Sensor Network Databases Ref: Wireless sensor networks---An information processing approach Feng Zhao and Leonidas Guibas (chapter 6)
1 Querying Sensor Networks Sam Madden UC Berkeley.
David Chu--UC Berkeley Amol Deshpande--University of Maryland Joseph M. Hellerstein--UC Berkeley Intel Research Berkeley Wei Hong--Arched Rock Corp. Approximate.
1 Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley Sam Madden.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
DNA Research Group 1 CountTorrent: Ubiquitous Access to Query Aggregates in Dynamic and Mobile Sensor Networks Abhinav Kamra, Vishal Misra and Dan Rubenstein.
1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.
Distributed Regression: an Efficient Framework for Modeling Sensor Network Data Carlos Guestrin Peter Bodik Romain Thibaux Mark Paskin Samuel Madden.
Multi-dimensional Range Query in Sensor Networks Xin Li,Young Jim Kim, Ramesh Govindan (University of Southern California ) Wei Hong (Intel Research Lab.
The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.
DTNLite: Reliable Data Delivery in Sensornets Rabin Patra and Sergiu Nedevschi UCB Nest Retreat 2004.
Naming in Wireless Sensor Networks. 2 Sensor Naming  Exploiting application-specific naming and in- network processing for building efficient scalable.
Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.
A Survey of Wireless Sensor Network Data Collection Schemes by Brett Wilson.
Approximate data collection in sensor networks the appeal of probabilistic models David Chu Amol Deshpande Joe Hellerstein Wei Hong ICDE 2006 Atlanta,
UNIVERSITY OF SOUTHERN CALIFORNIA Embedded Networks Laboratory 1 Wireless Sensor Networks Ramesh Govindan Lab Home Page:
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.
1 Acquisitional Query Processing in TinyDB Sam Madden UC Berkeley NEST Winter Retreat 2003.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks Charlmek Intanagonwiwat Ramesh Govindan Deborah Estrin Presentation.
Model-driven Data Acquisition in Sensor Networks Amol Deshpande 1,4 Carlos Guestrin 4,2 Sam Madden 4,3 Joe Hellerstein 1,4 Wei Hong 4 1 UC Berkeley 2 Carnegie.
TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS Presented by Akash Kapoor SAMUEL MADDEN, MICHAEL J. FRANKLIN, JOSEPH HELLERSTEIN, AND WEI HONG.
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Paper By : Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong Instructor :
Using Probabilistic Models for Data Management in Acquisitional Environments Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Sensor Network Databases1 Overview: Chapter 6  Sensor Network Databases  Sensor networks are conceptually a distributed DB  Store collected data  Indexes.
March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.
1 Pradeep Kumar Gunda (Thanks to Jigar Doshi and Shivnath Babu for some slides) TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden,
TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Authors: Samuel Madden, Michael Franklin, Joseph Hellerstein Presented by: Vikas Motwani CSE.
1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.
INT 598 Data Management for Sensor Networks Silvia Nittel Spatial Information Science & Engineering University of Maine Fall 2006.
Sensor Database System Sultan Alhazmi
1 EnviroStore: A Cooperative Storage System for Disconnected Operation in Sensor Networks Liqian Luo, Chengdu Huang, Tarek Abdelzaher John Stankovic INFOCOM.
The Design of an Acquisitional Query Processor for Sensor Networks CS851 Presentation 2005 Presented by: Gang Zhou University of Virginia.
CS542 Seminar – Sensor OS A Virtual Machine For Sensor Networks Oct. 28, 2009 Seok Kim Eugene Seo R. Muller, G. Alonso, and D. Kossmann.
Benjamin AraiUniversity of California, Riverside Reliable Hierarchical Data Storage in Sensor Networks Song Lin – Benjamin.
Query Processing for Sensor Networks Yong Yao and Johannes Gehrke (Presentation: Anne Denton March 8, 2003)
Dave McKenney 1.  Introduction  Algorithms/Approaches  Tiny Aggregation (TAG)  Synopsis Diffusion (SD)  Tributaries and Deltas (TD)  OPAG  Exact.
College of Engineering Grid-based Coordinated Routing in Wireless Sensor Networks Uttara Sawant Major Advisor : Dr. Robert Akl Department of Computer Science.
Communication Paradigm for Sensor Networks Sensor Networks Sensor Networks Directed Diffusion Directed Diffusion SPIN SPIN Ishan Banerjee
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
ResTAG: Resilient Event Detection with TinyDB Angelika Herbold -Western Washington University Thierry Lamarre -ENSEIRB Systems Software Laboratory, OGI.
BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.
Dr. Sudharman K. Jayaweera and Amila Kariyapperuma ECE Department University of New Mexico Ankur Sharma Department of ECE Indian Institute of Technology,
1 Report of Advanced Data Base Topics Project Instructor : Dr. rahgozar euhanna ghadimi, Ali abbasi, kave pashaii Data Storage selection in sensor networks.
DISTIN: Distributed Inference and Optimization in WSNs A Message-Passing Perspective SCOM Team
Aggregation and Secure Aggregation. Learning Objectives Understand why we need aggregation in WSNs Understand aggregation protocols in WSNs Understand.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
In-Network Query Processing on Heterogeneous Hardware Martin Lukac*†, Harkirat Singh*, Mark Yarvis*, Nithya Ramanathan*† *Intel.
1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.
Building Wireless Efficient Sensor Networks with Low-Level Naming J. Heihmann, F.Silva, C. Intanagonwiwat, R.Govindan, D. Estrin, D. Ganesan Presentation.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
TAG: a Tiny AGgregation service for ad-hoc sensor networks Authors: Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong Presenter: Mingwei.
Introduction to Wireless Sensor Networks
Distributed database approach,
The Design of an Acquisitional Query Processor For Sensor Networks
Querying Sensor Networks
Distributing Queries Over Low Power Sensor Networks
Sam Madden MIT CSAIL With Amol Deshpande (UMD), Carlos Guestrin (CMU)
Data-Centric Networking
REED : Robust, Efficient Filtering and Event Detection
Overview: Chapter 2 Localization and Tracking
Presentation transcript:

1 Implementation and Research Issues in Query Processing for Wireless Sensor Networks Wei Hong Intel Research, Berkeley Sam Madden MIT Adapted by L.B.

2 Declarative Queries Programming Apps is Hard –Limited power budget –Lossy, low bandwidth communication –Require long-lived, zero admin deployments –Distributed Algorithms –Limited tools, debugging interfaces Queries abstract away much of the complexity –Burden on the database developers –Users get: Safe, optimizable programs Freedom to think about apps instead of details

3 TinyDB: Prototype declarative query processor Platform: Berkeley Motes + TinyOS Continuous variant of SQL : TinySQL Power and data-acquisition based in- network optimization framework Extensible interface for aggregates, new types of sensors

4 TinyDB Revisited SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms High level abstraction: –Data centric programming –Interact with sensor network as a whole –Extensible framework Under the hood: –Intelligent query processing: query optimization, power efficient execution –Fault Mitigation: automatically introduce redundancy, avoid problem areas App Sensor Network TinyDB Query, Trigger Data

5 Feature Overview Declarative SQL-like query interface Metadata catalog management Multiple concurrent queries Network monitoring (via queries) In-network, distributed query processing Extensible framework for attributes, commands and aggregates In-network, persistent storage

6 TinyDB GUI TinyDB Client API DBMS Sensor network Architecture TinyDB query processor JDBC Mote side PC side 8

7 Data Model Entire sensor network as one single, infinitely-long logical table: sensors Columns consist of all the attributes defined in the network Typical attributes: –Sensor readings –Meta-data: node id, location, etc. –Internal states: routing tree parent, timestamp, queue length, etc. Nodes return NULL for unknown attributes On server, all attributes are defined in catalog.xml Discussion: other alternative data models?

8 Query Language (TinySQL) SELECT, [FROM {sensors | }] [WHERE ] [GROUP BY ] [SAMPLE PERIOD | ONCE] [INTO ] [TRIGGER ACTION ]

9 Comparison with SQL Single table in FROM clause Only conjunctive comparison predicates in WHERE and HAVING No subqueries No column alias in SELECT clause Arithmetic expressions limited to column op constant Only fundamental difference: SAMPLE PERIOD clause

10 TinySQL Examples SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s 1 EpochNodeidnestNoLight Sensors “Find the sensors in bright nests.”

11 TinySQL Examples (cont.) EpochregionCNT(…)AVG(…) 0North3360 0South3520 1North3370 1South3520 “Count the number occupied nests in each loud region of the island.” SELECT region, CNT(occupied) AVG(sound) FROM sensors GROUP BY region HAVING AVG(sound) > 200 EPOCH DURATION 10s 3 Regions w/ AVG(sound) > 200 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 2

12 Event-based Queries ON event SELECT … Run query only when interesting events happens Event examples –Button pushed –Message arrival –Bird enters nest Analogous to triggers but events are user- defined

13 Query over Stored Data Named buffers in Flash memory Store query results in buffers Query over named buffers Analogous to materialized views Example: –CREATE BUFFER name SIZE x (field1 type1, field2 type2, …) –SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name –SELECT field1, field2, … FROM name SAMPLE PERIOD d

14 Inside TinyDB TinyOS Schema Query Processor Multihop Network Filter light > 400 get (‘temp’) Agg avg(temp) Queries SELECT AVG(temp) WHERE light > 400 Results T:1, AVG: 225 T:2, AVG: 250 TablesSamples got(‘temp’) Name: temp Time to sample: 50 uS Cost to sample: 90 uJ Calibration Table: 3 Units: Deg. F Error: ± 5 Deg F Get f : getTempFunc() … getTempFunc(…)TinyDB ~10,000 Lines Embedded C Code ~5,000 Lines (PC-Side) Java ~3200 Bytes RAM (w/ 768 byte heap) ~58 kB compiled code (3x larger than 2 nd largest TinyOS Program)

15 Tree-based Routing Tree-based routing –Used in: Query delivery Data collection In-network aggregation –Relationship to indexing? A B C D F E Q:SELECT … Q Q Q Q Q Q Q Q Q QQ Q R:{…}

16 Sensor Network Research Very active research area –Can’t summarize it all Focus: database-relevant research topics –Some outside of Berkeley –Other topics that are itching to be scratched –But, some bias towards work that we find compelling

17 Topics In-network aggregation Acquisitional Query Processing Heterogeneity Intermittent Connectivity In-network Storage Statistics-based summarization and sampling In-network Joins Adaptivity and Sensor Networks Multiple Queries

18 Topics In-network aggregation Acquisitional Query Processing Heterogeneity Intermittent Connectivity In-network Storage Statistics-based summarization and sampling In-network Joins Adaptivity and Sensor Networks Multiple Queries

19 Tiny Aggregation (TAG) In-network processing of aggregates –Common data analysis operation Aka gather operation or reduction in || programming –Communication reducing Operator dependent benefit –Across nodes during same epoch Exploit query semantics to improve efficiency! Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.

20 Basic Aggregation In each epoch: –Each node samples local sensors once –Generates partial state record (PSR) local readings readings from children –Outputs PSR during assigned comm. interval At end of epoch, PSR for whole network output at root New result on each successive epoch Extras: –Predicate-based partitioning via GROUP BY

21 Illustration: Aggregation Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch

22 Illustration: Aggregation Sensor # Interval 3 SELECT COUNT(*) FROM sensors Interval #

23 Illustration: Aggregation Sensor # Interval 2 SELECT COUNT(*) FROM sensors Interval #

24 Illustration: Aggregation Sensor # SELECT COUNT(*) FROM sensors Interval 1 Interval #

25 Illustration: Aggregation Sensor # SELECT COUNT(*) FROM sensors Interval 4 Interval #

26 Aggregation Framework As in extensible databases, TinyDB supports any aggregation function conforming to: Agg n ={f init, f merge, f evaluate } F init {a 0 }  F merge {, }  F evaluate { }  aggregate value Example: Average AVG init {v}  AVG merge {, }  AVG evaluate { }  S/C Partial State Record (PSR) Restriction: Merge associative, commutative

27 PropertyExamplesAffects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Monotonicity COUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Taxonomy of Aggregates TAG insight: classify aggregates according to various functional properties –Yields a general set of optimizations that can automatically be applied Drives an API!

28 Use Multiple Parents Use graph structure –Increase delivery probability with no communication overhead For duplicate insensitive aggregates, or Aggs expressible as sum of parts –Send (part of) aggregate to all parents In just one message, via multicast –Assuming independence, decreases variance SELECT COUNT(*) A BC R A BC c R P(link xmit successful) = p P(success from A->R) = p 2 E(cnt) = c * p 2 Var(cnt) = c 2 * p 2 * (1 – p 2 )  V # of parents = n E(cnt) = n * (c/n * p 2 ) Var(cnt) = n * (c/n) 2 * p 2 * (1 – p 2 ) = V/n A BC c/n R n = 2

29 Multiple Parents Results Better than previous analysis expected! Losses aren’t independent! Insight: spreads data over many links Critical Link! No Splitting With Splitting

30 Acquisitional Query Processing (ACQP) TinyDB acquires AND processes data –Could generate an infinite number of samples An acqusitional query processor controls –when, –where, –and with what frequency data is collected! Versus traditional systems where data is provided a priori Madden, Franklin, Hellerstein, and Hong. The Design of An Acqusitional Query Processor. SIGMOD, 2003.

31 ACQP: What’s Different? How should the query be processed? –Sampling as a first class operation How does the user control acquisition? –Rates or lifetimes –Event-based triggers Which nodes have relevant data? –Index-like data structures Which samples should be transmitted? –Prioritization, summary, and rate control

32 E(sampling mag) >> E(sampling light) 1500 uJ vs. 90 uJ Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light  (pred1)  (pred2) mag light Traditional DBMS ACQP At 1 sample / sec, total power savings could be as much as 3.5mW  Comparable to processor! Correct ordering (unless pred1 is very selective and pred2 is not): Cheap Costly

33 Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s Novel, general pushdown technique Mag sampling is the most expensive operation!  WINMAX  (mag>x) mag light Traditional DBMS light mag  (mag>x)  WINMAX  (light > MAX) ACQP

34 Topics In-network aggregation Acquisitional Query Processing Heterogeneity Intermittent Connectivity In-network Storage Statistics-based summarization and sampling In-network Joins Adaptivity and Sensor Networks Multiple Queries

35 Heterogeneous Sensor Networks Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes Still must be transparent and ad-hoc Key to scalability of sensor networks Interesting heterogeneities –Energy: battery vs. outlet power –Link bandwidth: Chipcon vs x –Computing and storage: ATMega128 vs. Xscale –Pre-computed results –Sensing nodes vs. QP nodes

36 Computing Heterogeneity with TinyDB Separate query processing from sensing –Provide query processing on a small number of nodes –Attract packets to query processors based on “service value” Compare the total energy consumption of the network No aggregation All aggregation Opportunistic aggregation HSN proactive aggregation Mark Yarvis and York Liu, Intel ’ s Heterogeneous Sensor Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.

37 5x7 TinyDB/HSN Mica2 Testbed

38 Data Packet Saving How many aggregators are desired? Does placement matter? 11% aggregators achieve 72% of max data reduction Optimal placement 2/3 distance from sink.

39 Occasionally Connected Sensornets TinyDB QP TinyDB Server GTWY Mobile GTWY TinyDB QP Mobile GTWY GTWY internet GTWY

40 Occasionally Connected Sensornets Challenges Networking support –Tradeoff between reliability, power consumption and delay –Data custody transfer: duplicates? –Load shedding –Routing of mobile gateways Query processing –Operation placement: in-network vs. on mobile gateways –Proactive pre-computation and data movement Tight interaction between networking and QP Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks,

41 Distributed In-network Storage Collectively, sensornets have large amounts of in-network storage Good for in-network consumption or caching Challenges –Distributed indexing for fast query dissemination –Resilience to node or link failures –Graceful adaptation to data skews –Minimizing index insertion/maintenance cost

42 Example: DIM Functionality –Efficient range query for multidimensional data. Approaches –Divide sensor field into bins. –Locality preserving mapping from m-d space to geographic locations. –Use geographic routing such as GPSR. Assumptions –Nodes know their locations and network boundary –No node mobility E 2 = E 1 = Q 1 = Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.

43 Statistical Techniques Approximations, summaries, and sampling based on statistics and statistical models Applications: –Limited bandwidth and large number of nodes -> data reduction –Lossiness -> predictive modeling –Uncertainty -> tracking correlations and changes over time –Physical models -> improved query answering

44 Correlated Attributes Data in sensor networks is correlated; e.g., –Temperature and voltage –Temperature and light –Temperature and humidity –Temperature and time of day –etc.

45 IDSQ Idea: task sensors in order of best improvement to estimate of some value: –Choose leader(s) Suppress subordinates Task subordinates, one at a time –Until some measure of goodness (error bound) is met »E.g. “Mahalanobis Distance” -- Accounts for correlations in axes, tends to favor minimizing principal axis See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P May, 2001.

46 Model location estimate as a point with 2-dimensional Gaussian uncertainty. Graphical Representation Principal Axis S1S1 Residual 1 Preferred because it reduces error along principal axis Residual 2 S2S2 Area of residuals is equal

47 MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Joint work with Amol Desphande, Carlos Guestrin, and Joe Hellerstein

48 MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Consult Model Observation Plan [Temp, 3], [Temp, 9]

49 MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Observation Plan [Temp, 3], [Temp, 9] Probabilistic Query select NodeID, Temp ± 0.1C where NodeID in [1..9] with conf(0.95) Consult Model

50 MQSN: Model-based Probabilistic Querying over Sensor Networks Query Processor Model Data [Temp, 3] = …, [Temp, 9] = … Query Results Update Model

51 Challenges What kind of models to use ? Optimization problem: –Given a model and a query, find the best set of attributes to observe –Cost not easy to measure Non-uniform network communication costs Changing network topologies –Large plan space Might be cheaper to observe attributes not in query –e.g. Voltage instead of Temperature Conditional Plans: –Change the observation plan based on observed values

52 MQSN: Current Prototype Multi-variate Gaussian Models –Kalman Filters to capture correlations across time Handles: –Range predicate queries sensor value within [x,y], w/ confidence –Value queries sensor value = x, w/in epsilon, w/ confidence –Simple aggregate queries AVG(sensor value)  n, w/in epsilon, w/confidence Uses a greedy algorithm to choose the observation plan

53 In-Net Regression Linear regression : simple way to predict future values, identify outliers Regression can be across local or remote values, multiple dimensions, or with high degree polynomials –E.g., node A readings vs. node B’s –Or, location (X,Y), versus temperature E.g., over many nodes Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient Framework for Modeling Sensor Network Data.” Under submission.

54 In-Net Regression (Continued) Problem: may require data from all sensors to build model Solution: partition sensors into overlapping “kernels” that influence each other –Run regression in each kernel Requiring just local communication –Blend data between kernels –Requires some clever matrix manipulation End result: regressed model at every node –Useful in failure detection, missing value estimation

55 Exploiting Correlations in Query Processing Simple idea: –Given predicate P(A) over expensive attribute A –Replace it with P’ over cheap attribute A’ such that P’ evaluates to P –Problem: unless A and A’ are perfectly correlated, P’ ≠ P for all time So we could incorrectly accept or reject some readings Alternative: use correlations to improve selectivity estimates in query optimization –Construct conditional plans that vary predicate order based on prior observations

56 Exploiting Correlations (Cont.) Insight: by observing a (cheap and correlated) variable not involved in the query, it may be possible to improve query performance –Improves estimates of selectivities Use conditional plans Example Light > 100 Lux Temp < 20° C Cost = 100 Selectivity =.5 Cost = 100 Selectivity =.5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity =.5 Cost = 100 Selectivity =.5 Expected Cost = 150 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity =.1 Cost = 100 Selectivity =.9 Expected Cost = 110 Light > 100 Lux Temp < 20° C Cost = 100 Selectivity =.1 Cost = 100 Selectivity =.9 Expected Cost = 110 Time in [6pm, 6am] T F

57 In-Network Join Strategies Types of joins: –non-sensor -> sensor –sensor -> sensor Optimization questions: –Should the join be pushed down? –If so, where should it be placed? –What if a join table exceeds the memory available on one node?

58 Choosing Where to Place Operators Idea : choose a “join node” to run the operator Over time, explore other candidate placements –Nodes advertise data rates to their neighbors –Neighbors compute expected cost of running the join based on these rates –Neighbors advertise costs –Current join node selects a new, lower cost node Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003.

59 Topics In-network aggregation Acquisitional Query Processing Heterogeneity Intermittent Connectivity In-network Storage Statistics-based summarization and sampling In-network Joins Adaptivity and Sensor Networks Multiple Queries

60 Adaptivity In Sensor Networks Queries are long running Selectivities change –E.g. night vs day Network load and available energy vary All suggest that some adaptivity is needed –Of data rates or granularity of aggregation when optimizing for lifetimes –Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations) As far as we know, this is an open problem!

61 Multiple Queries and Work Sharing As sensornets evolve, users will run many queries simultaneously –E.g., traffic monitoring Likely that queries will be similar –But have different end points, parameters, etc Would like to share processing, routing as much as possible But how? Again, an open problem.

62 Concluding Remarks Sensor networks are an exciting emerging technology, with a wide variety of applications Many research challenges in all areas of computer science –Database community included –Some agreement that a declarative interface is right TinyDB and other early work are an important first step But there’s lots more to be done!