Download presentation
Presentation is loading. Please wait.
1
HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley 2.13.04
2
M. Franklin, UC Berkeley, Feb. 04 Introduction Continuing improvements in sensor devices – Wireless motes – RFID – Cellular-based telemetry Cheap devices can monitor the environment at a high rate. Connectivity enables remote monitoring at many different scales. Widely different concerns at each of these levels and scales.
3
M. Franklin, UC Berkeley, Feb. 04 Plan of Attack Motivation/Applications/Examples Characteristics of HiFi Systems Foundational Components – TelegraphCQ – TinyDB Research Issues Conclusions
4
M. Franklin, UC Berkeley, Feb. 04 The Canonical HiFi System
5
M. Franklin, UC Berkeley, Feb. 04 RFID - Retail Scenario “Smart Shelves” continuously monitor item addition and removal. Info is sent back through the supply chain.
6
M. Franklin, UC Berkeley, Feb. 04 Manufacturer C Retailer A “Extranet” Information Flow Manufacturer D Retailer B Aggregation/ Distribution Service
7
M. Franklin, UC Berkeley, Feb. 04 M2M - Telemetry/Remote Monitoring Energy Monitoring - Demand Response Traffic Power Generation Remote Equipment
8
M. Franklin, UC Berkeley, Feb. 04 Time-Shift Trend Prediction National companies can exploit East Coast/ West Coast time differentials to optimize West Coast operations.
9
M. Franklin, UC Berkeley, Feb. 04 Virtual Sensors Sensors don’t have to be physical sensors. Network Monitoring algorithms for detecting viruses, spam, DoS attacks, etc. Disease outbreak detection
10
M. Franklin, UC Berkeley, Feb. 04 Properties High Fan-In, globally-distributed architecture. Large data volumes generated at edges. – Filtering and cleaning must be done there. Successive aggregation as you move inwards. – Summaries/anomalies continually, details later. Strong temporal focus. Strong spatial/geographic focus. Streaming data and stored data. Integration within and across enterprises.
11
M. Franklin, UC Berkeley, Feb. 04 One View of the Design Space Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) On-the-fly processing Disk-based processing Combined Stream/Disk Processing Time Scale seconds years
12
M. Franklin, UC Berkeley, Feb. 04 Another View of the Design Space Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) Geographic Scope local global Several Readers Regional Centers Central Office
13
M. Franklin, UC Berkeley, Feb. 04 One More View of the Design Space Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) Degree of Detail Aggregate Data Volume Dup Elim history: hrs Interesting Events history: days Trends/Archive history: years
14
M. Franklin, UC Berkeley, Feb. 04 Building Blocks TelegraphCQ TinyDB
15
TelegraphCQ: Monitoring Data Streams Streaming Data – Network monitors – Sensor Networks – News feeds – Stock tickers B2B and Enterprise apps – Supply-Chain, CRM, RFID – Trade Reconciliation, Order Processing etc. (Quasi) real-time flow of events and data Must manage these flows to drive business (and other) processes. Can mine flows to create/adjust business rules or to perform on-line analysis.
16
TelegraphCQ (Continuous Queries) An adaptive system for large-scale shared dataflow processing. Based on an extensible set of operators: Ingress (data access) 1) Ingress (data access) operators Wrappers, File readers, Sensor Proxies Data processing 2) Non-Blocking Data processing operators Selections (filters), XJoins, … Adaptive Routing 3) Adaptive Routing Operators Eddies, STeMs, FLuX, etc. Operators connected through “Fjords” – queue-based framework unifying push&pull. – Fjords will also allow us to easily mix and match streaming and stored data sources.
17
M. Franklin, UC Berkeley, Feb. 04 Extreme Adaptivity This is the region that we are exploring in the Telegraph project. ??? Dynamic, Parametric, Competitive, … static plans late binding inter- operator per tuple current DBMS Query Scrambling, MidQuery Re-opt Eddies, CACQ XJoin, DPHJ Convergent QP ??? PSoup intra- operator Traditional query optimization depends on statistical knowledge of the data and a stable environment. The streaming world has neither.
18
M. Franklin, UC Berkeley, Feb. 04 Adaptivity Overview [Avnur & Hellerstein 2000] How to order and reorder operators over time? –Traditionally, use performance, economic/admin feedback –won’t work for never-ending queries over volatile streams Instead, use adaptive record routing. Reoptimization = change in routing policy static dataflow AB C D eddy A B C D
19
M. Franklin, UC Berkeley, Feb. 04 The TelegraphCQ Architecture TelegraphCQ Wrapper ClearingHouse Wrappers Proxy TelegraphCQ Front End Planner Parser Listener Mini-Executor Catalog Query Plan Queue Eddy Control Queue Query Result Queues } Shared Memory Shared Memory Buffer Pool Disk Split TelegraphCQ Back End Modules Scans CQEddy Split TelegraphCQ Back End Modules Scans CQEddy A single CQEddy can encode multiple queries.
20
M. Franklin, UC Berkeley, Feb. 04 The StreaQuel Query Language S ELECT projection_list F ROM from_list W HERE selection_and_join_predicates O RDERED B Y T RANSFORM …T O W INDOW …B Y Target language for TelegraphCQ Windows can be applied to individual streams Window movement is expressed using a “for loop construct in the “transform” clause We’re not completely happy with our syntax at this point.
21
Example Window Query: Landmark
22
Current Status - TelegraphCQ System developed by modifying PostgreSQL. Initial Version released Aug 03 – Open Source (PostgreSQL license) – Shared joins with windows and aggregates – Archived/unarchived streams – Next major release planned this summer. Initial users include – Network monitoring project at LBL (Netlogger) – Intrusion detection project at Eurecom (France) – Our own project on Sensor Data Processing – Class projects at Berkeley, CMU, and ??? Visit http://telegraph.cs.berkeley.edu for more information.
23
M. Franklin, UC Berkeley, Feb. 04 Query-based interface to sensor networks Developed on TinyOS/Motes Benefits – Ease of programming and retasking – Extensible aggregation framework – Power-sensitive optimization and adaptivity Sam Madden (Ph.D. Thesis) in collaboration with Wei Hong (Intel). http://telegraph.cs.berkeley.edu/tinydb SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Sensor Network TinyDB Query, Trigger Data
24
Declarative Queries in Sensor Nets SELECT nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s EpochnestNoLightTempAccelSound 01455xxx 02389xxx 11422xxx 12405xxx Sensors “Report the light intensities of the bright nests.”EpochnestNoLightTempAccelSound 01455xxx 02389xxx Many sensor network applications can be described using query language primitives. Many sensor network applications can be described using query language primitives. – Potential for tremendous reductions in development and debugging effort.
25
Aggregation Query Example EpochregionCNT(…)AVG(…) 0North3360 0South3520 1North3370 1South3520 “Count the number occupied nests in each loud region of the island.” SELECT region, CNT(occupied) AVG(sound) FROM sensors GROUP BY region HAVING AVG(sound) > 200 EPOCH DURATION 10s Regions w/ AVG(sound) > 200
26
M. Franklin, UC Berkeley, Feb. 04 Query Language (TinySQL) SELECT, [FROM {sensors | }] [WHERE ] [GROUP BY ] [SAMPLE PERIOD | ONCE] [INTO ] [TRIGGER ACTION ]
27
A B C D F E Sensor Queries @ 10000 Ft Query {D,E,F} {B,D,E,F} {A,B,C,D,E,F} Written in SQL With Extensions For : Sample rate Offline delivery Temporal Aggregation (Almost) All Queries are Continuous and Periodic M. Franklin, UC Berkeley, Feb. 04
28
In-Network Processing: Aggregation 12345 4 3 2 1 4 1 2 3 4 5 Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch
29
M. Franklin, UC Berkeley, Feb. 04 In-Network Processing: Aggregation 12345 41 3 2 1 4 1 2 3 4 5 1 Sensor # Interval # Interval 4 SELECT COUNT(*) FROM sensors Epoch
30
M. Franklin, UC Berkeley, Feb. 04 In-Network Processing : Aggregation 12345 41 32 2 1 4 1 2 3 4 5 2 Sensor # Interval 3 SELECT COUNT(*) FROM sensors Interval #
31
M. Franklin, UC Berkeley, Feb. 04 In-Network Processing : Aggregation 12345 41 32 213 1 4 1 2 3 4 5 3 1 Sensor # Interval 2 SELECT COUNT(*) FROM sensors Interval #
32
M. Franklin, UC Berkeley, Feb. 04 In-Network Processing : Aggregation 12345 41 32 213 15 4 1 2 3 4 5 5 Sensor # SELECT COUNT(*) FROM sensors Interval 1 Interval #
33
M. Franklin, UC Berkeley, Feb. 04 In-Network Processing : Aggregation 12345 41 32 213 15 41 1 2 3 4 5 1 Sensor # SELECT COUNT(*) FROM sensors Interval 4 Interval #
34
In Network Aggregation: Example Benefits 2500 Nodes 50x50 Grid Depth = ~10 Neighbors = ~20 M. Franklin, UC Berkeley, Feb. 04
35
Taxonomy of Aggregates TinyDB insight: classify aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied PropertyExamplesAffects Partial StateMEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss MonotonicCOUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping
36
Current Status - TinyDB System built on top of TinyOS (~10K lines embedded C code)Latest release 9/2003 Several deployments including redwoods at UC Botanical Garden Visit http://telegraph.cs.berkeley.edu/tinydb for more information. 36m 33m: 111 32m: 110 30m: 109,108,107 20m: 106,105,104 10m: 103, 102, 101
37
M. Franklin, UC Berkeley, Feb. 04 Putting It All Together? TelegraphCQ TinyDB
38
M. Franklin, UC Berkeley, Feb. 04 Ursa - A HiFi Implementation Current effort towards building an integrated infrastructure that spans the large scale in: – Time – Geography – Resources Ursa-Minor (TinyDB-based) Ursa-Major (TelegraphCQ w/Archiving) Mid-tier (???)
39
M. Franklin, UC Berkeley, Feb. 04 TelegraphCQ/TinyDB Integration Fjords [Madden & Franklin 02] provide the dataflow plumbing necessary to use TinyDB as a data stream. Main issues revolve around what to run where. – TCQ is a query processor – TinyDB is also a query processor – Optimization criteria include: total cost, response time, answer quality, answer likelihood, power conservation on motes, … Project on-going, should work by summer. Related work: Gigascope work at AT&T
40
M. Franklin, UC Berkeley, Feb. 04 TCQ-based Overlay Network TCQ is primarily a single node system – Flux operators [Shah et al 03] support cluster-based processing. Want to run TCQ at each internal node. Primary issue is support for wide-area temporal and geographic aggregation. – In an adaptive manner, of course Currently under design. Related work: Astrolabe, IRISNet, DBIS, …
41
M. Franklin, UC Berkeley, Feb. 04 Querying the Past, Present, and Future Need to handle archived data – Adaptive compression can reduce processing time. – Historical queries – Joins of Live and Historical Data – Deal with later arriving detail info Archiving Storage Manager - A Split-stream SM for stream and disk-based processing. Initial version of new SM running. Related Work: Temporal and Time-travel DBs
42
M. Franklin, UC Berkeley, Feb. 04 XML, Integration, and Other Realities Eventually need to support XML Must integrate with existing enterprise apps. In many areas, standardization well underway Augmenting moving data Related Work: YFilter [Diao & Franklin 03], Mutant Queries [ Papadimos et al. OGI], 30+ years of data integration research, 10+ years of XML research, … High Fan-in High Fan-out
43
M. Franklin, UC Berkeley, Feb. 04 Conclusions Sensors, RFIDs, and other data collection devices enable real- time enterprises. These will create high fan-in systems. Can exploit recent advances in streaming and sensor data management. Lots to do!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.