Queries Over Streaming Sensor Data

Slides:

Advertisements

Similar presentations

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Advertisements

SDN Controller Challenges

Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.

Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.

1 Supporting Aggregate Queries Over Ad-Hoc Wireless Sensor Networks Samuel Madden UC Berkeley With Robert Szewczyk, Michael Franklin, and David Culler.

The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.

1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.

Aggregation in Sensor Networks NEST Weekly Meeting Sam Madden Rob Szewczyk 10/4/01.

Generic Sensor Platform for Networked Sensors Haywood Ho.

A Transmission Control Scheme for Media Access in Sensor Networks Alec Woo, David Culler (University of California, Berkeley) Special thanks to Wei Ye.

TAG: A TINY AGGREGATION SERVICE FOR AD-HOC SENSOR NETWORKS Presented by Akash Kapoor SAMUEL MADDEN, MICHAEL J. FRANKLIN, JOSEPH HELLERSTEIN, AND WEI HONG.

1 Energy Efficient Communication in Wireless Sensor Networks Yingyue Xu 8/14/2015.

TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Paper By : Samuel Madden, Michael J. Franklin, Joseph Hellerstein, and Wei Hong Instructor :

A System Architecture for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister

施賀傑何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.

Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.

An Integration Framework for Sensor Networks and Data Stream Management Systems.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.

March 6th, 2008Andrew Ofstad ECE 256, Spring 2008 TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden, Michael J. Franklin, Joseph.

1 Pradeep Kumar Gunda (Thanks to Jigar Doshi and Shivnath Babu for some slides) TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden,

TAG: a Tiny Aggregation Service for Ad-Hoc Sensor Networks Authors: Samuel Madden, Michael Franklin, Joseph Hellerstein Presented by: Vikas Motwani CSE.

1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.

1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.

Computer Networks with Internet Technology William Stallings

System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.

College of Engineering Grid-based Coordinated Routing in Wireless Sensor Networks Uttara Sawant Major Advisor : Dr. Robert Akl Department of Computer Science.

REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.

1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.

INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.

BARD / April BARD: Bayesian-Assisted Resource Discovery Fred Stann (USC/ISI) Joint Work With John Heidemann (USC/ISI) April 9, 2004.

1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.

A Dynamic Operating System for Sensor Nodes Chih-Chieh Han, Ram Kumar, Roy Shea, Eddie Kohler, Mani, Srivastava, MobiSys ‘05 Oct., 2009 발표자 : 김영선, 윤상열.

W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.

REED ： Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.

Building Wireless Efficient Sensor Networks with Low-Level Naming J. Heihmann, F.Silva, C. Intanagonwiwat, R.Govindan, D. Estrin, D. Ganesan Presentation.

Software Architecture of Sensors. Hardware - Sensor Nodes Sensing: sensor --a transducer that converts a physical, chemical, or biological parameter into.

- Pritam Kumat - TE(2) 1.  Introduction  Architecture  Routing Techniques  Node Components  Hardware Specification  Application 2.

TAG: a Tiny AGgregation service for ad-hoc sensor networks Authors: Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong Presenter: Mingwei.

William Stallings Data and Computer Communications

MAC Protocols for Sensor Networks

Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.

CS522 Advanced database Systems

The Development Process of Web Applications

Demetrios Zeinalipour-Yazti (Univ. of Cyprus)

Introduction to Wireless Sensor Networks

Applying Control Theory to Stream Processing Systems

SOFTWARE DESIGN AND ARCHITECTURE

Distributed database approach,

CHAPTER 3 Architectures for Distributed Systems

Energy-Efficient Communication Protocol for Wireless Microsensor Networks by Wendi Rabiner Heinzelman, Anantha Chandrakasan, and Hari Balakrishnan Presented.

Database Performance Tuning and Query Optimization

The Design of an Acquisitional Query Processor For Sensor Networks

Trickle: Code Propagation and Maintenance

Introduction to Query Optimization

Software Defined Networking (SDN)

Multimedia Data Stream Management System

Distributing Queries Over Low Power Sensor Networks

Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy

Introduction to Database Systems

CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.

Data-Centric Networking

Chapter 11 Database Performance Tuning and Query Optimization

REED : Robust, Efficient Filtering and Event Detection

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

PSoup: A System for streaming queries over streaming data

Adaptive Query Processing (Background)

Challenges in Sensor Network Query Processing

Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.

Presentation transcript:

Queries Over Streaming Sensor Data Samuel Madden Qualifying Exam University of California, Berkeley May 14th, 2002

Introduction Sensor networks are here My research Berkeley on the cutting edge Data collection, monitoring are a driving application My research Query processing for sensor networks Server (DBMS) side issues In-network issues Goal: Understand how to pose, distribute, and process queries over streaming, lossy, wireless, and power-constrained data sources such as sensor networks. Mention power Really fundamentally about data? (Data is very important, not fundamentally about it!) Sentence too long? Too many ands!

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Research Plan Future Implementation & Research Efforts Time line Related Work

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Research Plan Future Implementation & Research Efforts Time line Related Work

Sensor Networks & TinyOS A collection of small, radio-equipped, battery powered networked microprocessors Typically Ad-hoc & Multihop Networks Single devices unreliable Very low power; tiny batteries or solar cells power for months Berkeley’s Version: ‘Mica Motes’ TinyOS operating system (services) 4K RAM, 512K EEPROM, 128K code space Lossy: 20% loss @ 5M in Ganesan et al. experiments Communication Very Expensive 800 instrs/bit xmitted Apps: Environment Monitoring, Personal Nets, Object Tracking Data processing plays a key role!

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Visualizations Research Plan Future Implementation & Research Efforts Time line Related Work

Motivation TeleTiny architecture offers all of these Why apply database approach to sensor network data processing? Declarative Queries Data independence Optimization opportunities Hide low-level complexities Familiar Interface Work sharing Adaptivity Proper interfaces can leverage existing database systems TeleTiny architecture offers all of these Suitable for a variety of lossy, streaming environments (not just TinyOS!) Sharing & Adaptivity are Themes Citations Teletiny -> title (megan) Too confusing / muddled Combine WAY pared down version with next slide?

Architecture + Lots of help! Fjords: ICDE 2002, with Franklin Query Processor User Workstation Lots of help! Fjords: ICDE 2002, with Franklin CACQ: SIGMOD 2002, with Shah, Hellerstein, Raman, Franklin TAG: WMCSA 2002, with Szewczyk, Culler, Franklin, Hellerstein, Hong Catalog: with Hong Visualizations + Interfaces Architecture Disk Completed Research Partially Complete Future Work Telegraph + CACQ Long running queries that share work Queries Answers Fjords Handle push-based data Sensor Proxy Mediate between sensor & QP TAG In Network Aggregation TeleTiny Implementation Real world deployment @ Intel Berkeley Catalog + Sensor Schema

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Research Plan Future Implementation & Research Efforts Time line Related Work

Sensor Network Query Processing Challenges Query Processor Must Be Able To: Tolerate lossy data delivery Handle failure of individual data sources Conserve power on devices whenever possible Perhaps by using on-board processing E.g. Applying selection predicates in network Or by sharing work where ever possible Handle push-based data Handle streaming data Individual readings may not be of little interest emergency/outlier detection May not be practical to fetch all readings Say something about “themes”

Mediate between sensor & QP Query Processor User Workstation Telegraph + CACQ Long running queries that share work Fjords Handle push-based data Sensor Proxy Mediate between sensor & QP Disk Queries Answers Visualizations + Simulations Server-side Sensor QP Mechanisms Continuous Queries Sensor Proxies Fjord Query Plan Architecture Stream Sensitive Operators

Continuous Queries (CQ) Long running queries User installs Continuously receive answers until deinstallation Common in streaming domain Instantaneous snapshots don’t tell you much; may not be interested in history Monitoring Queries Examine light levels and locate rooms that are in use Monitor the temperature in my workspace and adjust the temperature to be in the range (x,y) Monitoring queries – too high level Couldn’t these run on the sensor too? Relate examples to sensors? “the right thing” – too assertive

Continuously Adaptive Continuous Queries (CACQ) Given user queries over current sensor data Expect that many queries will be over the same data sources (e.g. traffic sensors) Queries over current data always looking at same tuples Those queries can share Current tuples Work (e.g. selections) Sharing reduces computation, communication Continuously Adaptive When sharing work, queries come and go Over long periods of time, selectivities change Assumptions that were valid at the start of the query no longer valid Example of why you need adaptivity? At fine grained, it’s about changing queries

CACQ Overview a b R1 S1 S3 R2 R1 R2 S5 R2 R2 R1 S2 R2 R1 R2 S4 R1 S6 SELECT * FROM R WHERE s1(a),s2(b) S1 a S3 R2 R2 R1 S5 R2 SELECT * FROM R WHERE s3(a),s4(b) b R2 R1 SELECT * FROM R WHERE s5(a),s6(b) S2 R R2 R1 R2 S4 R1 S6

Working Sharing via Tuple Lineage Q1: SELECT * FROM s WHERE A, B, C Q2: SELECT * FROM s WHERE A, B, D Conventional Queries CACQ - Adaptivity A C D B Data Stream S A C D B Data Stream S Query 1 Query 2 Niagara CQ Query 1 Query 2 s(C,D,B,A) A A s s s(C,D,B) s s B B Reject? s(C,D) Not very clear (sez phil) s s s(C) s C D s() s s s Data Stream S

CACQ Contributions Continuous adaptivity (operator reordering) via eddies All queries within same eddy Routing policies to enable that reordering Explicit Tuple Lineage Within each tuple, store where has been, where it must go Maximizes sharing of tuples between queries Grouped Filter Predicate index that applies range & equality selections for multiple queries at the same time Windows – difficult! What’s the related work?

CACQ vs. NiagaraCQ  UDF(stocks.sym)  UDF(stocks.sym)  UDF(stocks) Performance Comparable for One Experiment in NCQ Paper Example where CACQ destroys NCQ: Stocks.sym = Articles.sym  UDF(stocks.sym) Articles Stocks Query 3 Stocks.sym = Articles.sym  UDF(stocks.sym) Articles Stocks Query 2 Stocks.sym = Articles.sym  UDF(stocks) Articles Stocks Query 1 |result| > |stocks| Expensive SELECT stocks.sym, articles.text FROM stocks,articles WHERE stocks.sym = articles.sym AND UDF(stocks)

CACQ vs. NiagaraCQ #2 UDF1 UDF2 UDF3 UDF3 UDF2 UDF1 UDF1 UDF2 Stocks.sym = Articles.sym Articles Stocks |result| > |stocks| Expensive UDF1 Query 1 Query 2 Query 3 Niagara Option #1 UDF2 UDF3 UDF3 UDF2 UDF1 CACQ S S1|2|3 UDF1 Query 1 UDF2 Query 2 UDF3 Query 3 Niagara Option #2 S1 S1A S2A S2 S3 S3A SA SA SA S A

CACQ vs. NiagaraCQ Graph

CACQ Review Many Queries, One Eddy Fine Grained Adaptivity Grouped Filter Predicate Index Tuple Lineage

Sensor Proxy CQ is a query processing mechanism; need to get data from sensors Mediate between Sensors and Query Processor Push operators out to sensors Hide query processing, knowledge of multiple queries from sensors Hide details of sensors from query processor Enable power-sensitivity Query Processor Query Registration Parsed Queries [sources, ops] Where is proxy physically located? Handles partial results… [fields, filters, aggregates, rates] Query [tuples]

Fjording The Stream Sensors, even through proxy, deliver data unusually Query plan implementation Useful for streams and distributed environments Combine push (streaming) data and pull (static) data E.g. traffic sensors with CHP accident reports Databases aren’t all pull based (SQL pull based) Asynchronous (alerters) push based Exchange/iterator distinction confusing

Summary of Server Side QP CACQ Enables sharing of work between long running queries Enable adaptivity for long running queries Sensor Proxy Hides QP complexity from sensors, power issues from QP Fjords Enable combination of push and pull data Non-blocking processing integral to the query processor SIGMOD ICDE

Query Processor User Workstation Queries Answers Telegraph Real world deployment @ Intel Berkeley Catalog + Sensor Schema TAG In Network Aggregation TeleTiny Implementation Sensor Side Sensor QP Research thus far allows central QP to play nice with sensors Doesn’t address how sensors can help with QP Use their processors to processes queries Advertise their capabilities and data sources Control data delivery rates Detect, report, and mitigate errors and failures Two pieces thus far: Tiny Aggregation (TAG) : WMCSA Paper, Resubmission in Progress Catalog Lots of work in progress!

Catalog Problem: Given a heterogeneous environment full of motes, how do I know what data they can provide or process? Solution: Store a small catalog on each device describing its capabilities Mirror that catalog centrally to avoid overloading sensors Enables data independence Catalog Content: For each attribute: Name, Type, Size Units (e.g. farenheit) Resolution (e.g. 10 bits) Calibration Information Accessor functions Cost information Power, time, maximum sample rate Even with homogeneous network, its good to abstract sensor network as a schema (not that heterogeneous!) How is sensor software deployed?

Tiny Aggregation (TAG) How can sensors be leveraged in query processing? Insight: Aggregate queries common case! Users want summaries of information across hundreds or thousands of nodes Information from individual nodes: Often uninteresting Could be expensive to retrieve at fine granularity Take advantage of tree-based multihop routing Common way to collect data at a centralized location Combine data at each level to compute aggregates in network Talk about technqiues ? (Summarize briefly) Too verbose!

Advantages of TAG Order of magnitude decrease in communication for some aggregates Streaming results: Converge after transient errors Successive results in half the messages of initial result Reduces the burden on the upper levels of routing tree Declarative queries enable: Optimizations based on a classification of aggregate properties Very simple to deploy, use

SELECT COUNT * FROM SENSORS TAG Example Query SELECT COUNT * FROM SENSORS 2 3 4 1 5 6 2 3 4 1 5 6 2 3 4 1 5 6 2 3 4 1 5 6

SELECT COUNT * FROM SENSORS TAG Example SELECT COUNT * FROM SENSORS Epoch: 0 (5, 0, 1) Sensor ID Epoch Count 2 3 4 1 5 6 (4, 0, 1) (5, 0, 1) (6, 0, 1)

SELECT COUNT * FROM SENSORS TAG Example SELECT COUNT * FROM SENSORS Epoch: 1 (5, 0, 1) Sensor ID Epoch Count 2 3 4 1 5 6 (2, 0, 2) (3, 0, 2) (4, 1, 1) (5, 1, 1) (6, 1, 1)

SELECT COUNT * FROM SENSORS TAG Example 1,0,6 SELECT COUNT * FROM SENSORS Epoch: 2 (5, 0, 1) Sensor ID Epoch Count 2 3 4 1 5 6 (2, 1, 3) (3, 1, 2) (4, 2, 1) (5, 2, 1) (6, 2, 1)

SELECT COUNT * FROM SENSORS TAG Example 1,1,6 Value at Root (d-1) Epochs Old New Value Every Epoch Nodes must cache old values SELECT COUNT * FROM SENSORS Epoch: 3 (5, 0, 1) Sensor ID Epoch Count 2 3 4 1 5 6 (2, 2, 3) (3, 2, 2) (4, 3, 1) (5, 3, 1) (6, 3, 1)

TAG: Optimizations + Loss Tolerance Optimizations to Decrease Message Overhead When computing a MAX, nodes can suppress their own transmissions if they hear neighbors with greater values Or, root can propagate down a ‘hypothesis’ Suppress values that don’t change between epochs Techniques to Handle Lossiness of Network Cache child results Send results up multiple paths in the routing tree Grouping Techniques for handling too many groups (aka group eviction)

Experiment: Basic TAG Dense Packing, Ideal Communication Joe: Why bytes? Are bytes really the right metric? Dense Packing, Ideal Communication

Sensor QP Summary In-Sensor Query Processing Consists of TAG, for in-network aggregation Order of magnitude reduction in communication costs for simple aggregates. Techniques for grouping, loss tolerance, and further reduction in costs Catalog, for tracking queryable attributes of sensors In upcoming implementation Selection predicates Multiplexing multiple queries over network

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Research Plan Future Implementation & Research Efforts Time line Related Work

What’s Left? Development Tasks Research Tasks TeleTiny Implementation Sensor Proxy Policies & Implementation Telegraph (or some adaptive QP) Interface Research Tasks Publish / Follow-on to TAG Query Semantics Real-world Deployment Study Techniques for Reporting & Managing Resources + Loss What’s telegraph? (Amol)?

TeleTiny Implementation In Progress (Goal: Ready for SIGMOD ’02 Demo) In TinyOS, for Mica Motes, with Wei Hong & JMH Features: SELECT and aggregate queries processed in-network Ability to query arbitrary attributes Including power, signal strength, etc. Flexible architecture that can be extended with additional operators Multiple simultaneous queries UDF / UDAs via VM Status: Aggregation & Selection engine built No UDFs Primitive routing No optimizations Catalog interface designed, stub implementation 20kb of code space!

Sensor Proxy Sensor Proxy Issues: How to choose what runs on centrally and what runs on the motes? Some operators obvious (e.g. join?): Storage or computation demands preclude running in-network Other operators there is a choice: Limited resources mean motes will not have capacity for all pushable operators. So which subset of operators to push? Make 2 slides

Sensor Proxy (cont) Cost-based query optimization problem; what to optimize? Power load on network Central CPU costs Basic approach: Push down as much as possible Push high-update rate, low-state aggregate queries first Benefit most from TAG Satisfy other queries by sampling at minimum rate that can satisfy all queries, processing centrally

Research: Real World Study Goal: Characterize performance of TeleTiny on a building monitoring network running in the Intel-Research Lab in the PowerBar™ building. To: Demonstrate effectiveness of our approach Derive a number of important workload and real-world parameters that we can only speculate about Be cool. Also, Telegraph Integration, which should offer: CACQ over real sensors Historical data interface Queries that combine historical data and streaming sensor data Fancy adaptive / interactive features E.g. adjust sample rates on user demand

Real World Study (Cont.) Measurements to obtain: Types of queries Snapshot vs. continuous Loss + Failure Characteristics % lost messages, frequency of disconnection Power Characteristics Amount of Storage Server Load Variability in Data Rates Is adaptivity really needed? Lifetime of Queries Adaptivity about query workload, not data workload! Ask – what else?

Research: Reporting & Mitigating Resource Consumption + Loss Resource scarcity & loss are endemic to the domain Problem: What techniques can be used to Accommodate desired workload despite limited resources? Mitigate + inform users of losses? Key Issue because: Dramatically affects usability of system Otherwise users will roll-their-own Dramatically affects quality of system Results are poor without some additional techniques Within themes of my research Sharing of resources Adaptivity to losses Educate – really about lowering expectations Too verbose?

Some Resource + Loss Tolerance Techniques Identify locations of loss E.g. annotate reported values with information about lost children Provide user with tradeoffs for smoothing loss TAG Cache results: temporal smearing Send to multiple parents: more messages, less variance Or, as in STREAM project, compute lossy summaries of streams, Offer user alternatives to unanswerable queries E.g. ask if a lower sample rate would be OK? Or if a nearby set of sensors would suffice? Educate. (Lower expectations!) Employ Admission Control, Leases

Timeline May - June 2002: June - August 2002: Complete sensor-side software Schema API Catalog Server UDFs SIGMOD Demo ICDE Paper on stream semantics Resubmit TAG (to OSDI, hopefully.) June - August 2002: Telegraph Integration Sensor proxy implementation Instrument + Deploy Lab Monitoring, Begin Data Collection

Timeline (cont.) August - November 2002 August - January 2003 Telegraph historical results integration / implementation SIGMOD paper on Lab Monitoring deployment August - January 2003 Explore and implement mechanisms for handling resource constraints + faults February 2003 VLDB Paper on Resource Constraints February - June 2003 Complete Dissertation

Overview Introduction Research Goals Sensor Networks & TinyOS Research Goals Completed Research: Sensor Network QP Central Query Processor In Network, on Sensors Research Plan Future Implementation & Research Efforts Time line Related Work

Related Work Database Research Cougar (Cornell) Sequences + Streams SEQ (Wisconsin) + Temporal Database Systems Stanford STREAM Architecture similar to CACQ State management Query Semantics Continuous Queries NiagaraCQ (Wisconsin) Psoup (Chandrasekaran & Franklin) X/YFilter (Altinel & Franklin, Diao & Franklin) Adaptive / Interactive Query Processing CONTROL (Hellerstein, et. al) Eddies (Avnur & Hellerstein) Xjoin / Volcano (Urhan & Franklin, Graefe)

Related Work (Cont.) Sensor / Networking Research UCLA / ISI / USC (Estrin, Heidemann, et al.) Diffusion: Sensor-fusion + Routing Low-level naming: Mechanisms for data collection, joins? Application specific aggregation Impact of Network Density on Data Aggregation Aka Greedy Aggregation, or how to choose a good topology Network measurements (Ganesan, et al.) MIT (Balakrishnan, Morris, et al.) Fancy routing protocols (LEACH / Span) Insights into data delivery scheduling for power efficiency Intentional Naming System (INS) Berkeley / Intel TinyOS (Hill, et al.), lots of discussion & ideas

Summary Query processing is a key feature for improving usability of sensor networks TeleTiny Solution Brings: On the query processor Ability to combine + query data as it streams in Adaptivity and performance In the sensor network Power efficiency via in-network evaluation Catalog Upcoming research work: Real world deployment + study Evaluation of techniques for resource usage + loss mitigation TAG resubmission Graduation, Summer 2003!

That’s all, folks! Questions?

Sensor Networks A collection of small, radio-equipped, battery powered networked microprocessors Typically Ad-hoc No predefined network routes Multihop Routes span at least one intermediate node Deployed in hundreds or thousands Little concern concern for reliability of a single device Very low power, such that tiny batteries or solar cells can keep them powered for months Popular in Research Community Berkeley Motes USC / UCLA / Sensoria WINS Platform MIT Cricket Requires: goofy list Citations Really need “smart dust vision”?

TinyOS Motes TinyOS Project Current generation devices (“Mica Motes”): Goal: build an operating system for sensor networks Prototype on simple devices built from off-the-shelf components. (2cm x 3cm) Current generation devices (“Mica Motes”): 50kbit radios, 100ft range 4K RAM, 512K EEPROM Sensors: light, temperature, acceleration, sound, humidity, magnetic field Radio Loss Rate: 20% @ 5M Range Communication Dominates Power Cost: 800 instrs / bit xmitted Communication really dominant? Is hardware representative? Techniques apply to more than just tinyos, more than just sensor networks (Any power constrainted, wireless, lossy environment) Citations Define loss rate? See: Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kristofer Pister. System architecture directions for network sensors. ASPLOS 2000.

TinyOS Lightweight OS for sensors Features: Event-driven Software based radio stack Linked into user programs written in C Features: Network reprogramming Time synchronization Localization Simple VM Simulator Too comapct?

Sensor Network Applications Environmental Monitoring Power, light, temp, movement in buildings Activity, weather outside Structural Moving Object Tracking Personal Networks Data processing plays a key role! Really all data centric?

Fjords Operators (e.g. select, join) data-direction agnostic Different Modes of Data Delivery & Consumption Implemented via queues (connectors) Synchronous / asynchronous result production Blocking / non-blocking result consumption Sensors: asynchronous production, non-blocking consumption Contrast with Iterator model (synchronous production, blocking consumption) Exchange operator (Graefe) (asynchronous production, blocking consumption)

Pull Example Operator Pull Queue s Operator parent, child; Queue q;… Tuple process() { Tuple t = q.get(), outt = null; If (t != null) { <process t> } else { … do something else … } return outt; } Pull Queue Operator parent, child; Tuple get() { Tuple t = null; while (t == null) { t = child.process(); } return t; s get() Notice: Iterator semantics by making get() blocking Get() can return null Process() can return null Pull Connection process() Scan

Push Example s Operator Push Queue Operator parent, child; Queue q;… Tuple process() { Tuple t = q.get(), outt = null; If (t != null) { <process t> } else { … do something else … } return outt; } Thread while(true) { Tuple t = op.process(); if (t != null) op.outq.enqueue(t); Push Queue Operator parent, child; Vector v = new Vector(); Tuple get() { if (v.size() > 0) return v.removeFirst(); else return null; } Tuple enqueue(Tuple t) { v.put(t); s get() Push Connection Scan

Relational Operators And Streams In addition to query plan mechanism, need new operators Selection and Projection Apply Naturally Non-Blocking Operators Sorts and aggregates over the entire stream Nested loops and sort-merge join Windowed Operators Sorts, aggregates, etc. Online, Interactive QP Techniques In memory symmetric hash join (Wilschut & Apers) Alternatives: Ripple-join (Haas & Hellerstein) Xjoin (Urhan & Franklin), etc. Partial Results (Raman & Hellerstein) References? In memory Symmetric hash join? Transition from previous slide… Too much database terminology?

CACQ Architecture

Grouped Filter

Per Tuple State

Tuple Lineage R R R R Query 1 Query 2 Query 3 T.c Stem R.a Stem S.b Stem R R R R Query 1 Query 2 Query 3

Tuple Lineage R R R R Query 1 Query 2 Query 3 T.c Stem R.a Stem S.b Stem R R R R Query 1 Query 2 Query 3

Tuple Lineage R R R R Query 1 Query 2 Query 3 T.c Stem R.a Stem S.b Stem R R R R Query 1 Query 2 Query 3

Tuple Lineage R R R R Query 1 Query 2 Query 3 T.c Stem R.a Stem S.b Stem R R R Query 1 Query 2 Query 3

Tuple Lineage R R R R Query 1 Query 2 Query 3 T.c Stem R.a Stem S.b Stem R R R R Query 1 Query 2 Query 3

Tuple Lineage T.c Stem R.a Stem S.b Stem S S S Query 1 Query 2 Query 3

Tuple Lineage T.c Stem R.a Stem S.b Stem S S S Query 1 Query 2 Query 3

Continuous Adaptivity Result Fix for single query Delete values at y = 0 Attributes uniformly distributed over (0,100)

Database Style Aggregation SELECT {aggn(attrn), attrs} FROM sensors WHERE {selPreds} GROUP BY {expr} HAVING {havingPreds} EPOCH DURATION I Aggn={fmerge, finit, fevaluate} Fmerge{<a1>, <a2>}  <a12> finit{v}  <a0> Fevaluate{<a1>}  aggregate value Example: Average AVGmerge{<S1, C1>, <S2, C2>}  <S1 + S2 , C1 + C2> AVGinit{v}  <v,1> AVGevaluate{<S1, C1>}  <S1/C1 > Each a tuple is a Partial State Record (PSR), representing the combination of local values and child aggregates at a particular node

Query Propagation TAG propagation agnostic One Approach: Flood Any algorithm that can: Deliver the query to all sensors Provide all sensors with one or more duplicate free routes to some root One Approach: Flood Query introduced at a root; rebroadcast by all sensors until it reaches leaves Sensors pick parent and level when they hear query Reselect parent after k silent epochs Query 1 P:0, L:1 2 3 P:1, L:2 P:1, L:2 4 P:2, L:3 6 P:3, L:3 5 P:4, L:4

Pipelined Aggregates Value from 2 produced at time t arrives at 1 at time (t+1) 1 After query propagates, during each epoch: Each sensor samples local sensors once Combines them with PSRs from children Outputs PSR representing aggregate state in the previous epoch. After (d-1) epochs, PSR for the whole tree output at root d = Depth of the routing tree If desired, partial state from top k levels could be output in kth epoch Complication: May need to avoid combining PSRs from different epochs Conceptually, “stall the pipeline” Solutions: Introduce delays or adjust delivery rates (requires schedule) In paper, use a cache 2 3 4 5 Alan: if we had a perfect schedule, we wouldn’t need to buffer pipeline Somehow related to data parallel systems / architectures (we’re introducing delay stages into the pipeline) Should try to separate out these routing / topology issues from algorithmic issues Too much stuff (amol) Value from 5 produced at time t arrives at 1 at time (t+3)

Pipelining Example Epoch 0 1 2 4 3 5 2 4 3 1 Introduce delay stages Make delay stage arrows different color? Introduce delay stages

Pipelining Example SELECT COUNT(*) FROM sensors Epoch 0 1 2 4 3 5 Delay Stage Delay Stage Delay Stage 5,0,1 3,0,1 2,0,1 4,0,1 1,0,1 7,4,1 Sensor ID Epoch Count 3 2 4 1

Pipelining Example SELECT COUNT(*) FROM sensors Epoch 1 1 2 4 3 5 Delay Stage Delay Stage 3,0,2 2,0,1 4,0,1 1,0,1 Delay Stage 5,1,1 3,1,1 2,1,1 4,1,1 1,1,1 7,4,1 Sensor ID Epoch Count 3 2 4 1

Pipelining Example SELECT COUNT(*) FROM sensors Epoch 2 1 2 4 3 5 Delay Stage 2,0,4 1,0,1 Delay Stage 3,1,2 2,1,1 4,1,1 1,1,1 Delay Stage 5,2,1 3,2,1 2,2,1 4,2,1 1,2,1 7,4,1 Sensor ID Epoch Count 3 2 4 1

Pipelining Example SELECT COUNT(*) FROM sensors Epoch 3 1 2 4 3 5 1,0,5 SELECT COUNT(*) FROM sensors Epoch 3 1 2 4 3 5 Delay Stage 2,1,4 1,1,1 Delay Stage 3,2,2 2,2,1 4,2,1 1,2,1 Delay Stage 5,3,1 3,3,1 2,3,1 4,3,1 1,3,1 7,4,1 Sensor ID Epoch Count 3 2 4 1

Pipelining Example SELECT COUNT(*) FROM sensors Epoch 4 1 2 4 3 5 1,1,5 SELECT COUNT(*) FROM sensors Epoch 4 1 2 4 3 5 Delay Stage 2,2,4 1,2,1 Delay Stage 3,3,2 2,3,1 4,3,1 1,3,1 Delay Stage 5,4,1 3,4,1 2,4,1 4,4,1 1,4,1 7,4,1 Sensor ID Epoch Count 3 2 4 1

Discussion Result of query is a stream of values After transient error, converges in at most d epochs Value at root is d-1 epochs old New value every epoch Versus d epochs for first complete value, or to collect a snapshot Delay Stages Conceptually Represent Local caches Adjusted delivery rates

Visualizations Not published Motivate ideas Illustrate Algorithms E.g. need to combine streaming and static data Illustrate Algorithms E.g. TAG Debug algorithms E.g TAG

Traffic Visualization

Sensor Network Visualization

TeleTiny Overview Major components: TINY_ALLOC: Memory allocator TUPLE_ROUTER: Query processor AGG_OPERATOR: Aggregator TINYDB_NETWORK: Network interface SCHEMA: Catalog (aka Introspection) interface SELECT_OPERATOR AGG_OPERATOR TUPLE_ROUTER TINYDB_NETWORK Radio Stack Schema TinyAllloc