HiFi: Network-centric Query Processing in the Physical World SAP Research Forum February 2005 Mike Franklin UC Berkeley.

Slides:



Advertisements
Similar presentations
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
Advertisements

Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
한국기술교육대학교 컴퓨터 공학 김홍연 TinyDB : An Acquisitional Query Processing System for Sensor Networks. - Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein,
Design Considerations for High Fan-in Systems: The HiFi Approach Presented by Shawn Jeffery CIDR‘05 1/7/05 Michael J. Franklin, Shawn R. Jeffery, Sailesh.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
11 Distributed Middleware for Container Transport: Lessons Learned (Klaas Thoelen, Sam Michiels, Wouter Joosen) 7th MiNEMA Workshop August 21, Lappeenranta,
1 SAFIRE Project DHS Update – July 15, 2009 Introductions  Update since last teleconference Demo Video - Fire Incident Command Board (FICB) SAFIRE Streams.
Design and Implementation of a Middleware for Sentient Spaces Bijit Hore, Hojjat Jafarpour, Ramesh Jain, Shengyue Ji, Daniel Massaguer Sharad Mehrotra,
Multi-dimensional Range Query in Sensor Networks Xin Li,Young Jim Kim, Ramesh Govindan (University of Southern California ) Wei Hong (Intel Research Lab.
The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.
Chapter 10: Stream-based Data Management Title: Design, Implementation, and Evaluation of the Linear Road Benchmark on the Stream Processing Core Authors:
Engine Issues for Data Stream Processing Mike Franklin UC Berkeley 1 st Duodecennial SWiM Meeting January 9, 2003.
Components and Architecture CS 543 – Data Warehousing.
A Survey of Wireless Sensor Network Data Collection Schemes by Brett Wilson.
UNIVERSITY OF SOUTHERN CALIFORNIA Embedded Networks Laboratory 1 Wireless Sensor Networks Ramesh Govindan Lab Home Page:
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Data Management Challenges and Opportunities in the Digital Home* ICME Amsterdam July 2005 Mike Franklin UC Berkeley *in collaboration with Intel Research.
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Declarative Support for Sensor Data Cleaning Shawn Jeffery Gustavo Alonso Michael Franklin Wei Hong Jennifer Widom UC Berkeley ETH Zurich UC Berkeley Arch.
Sensor Networks: Implications for Database Systems and Vice-Versa Michael Franklin January UCB Sensor Day.
HiFi Systems: Network-Centric Query Processing for the Physical World Michael Franklin UC Berkeley
1 Global Sensor Networks A Platform for the Internet of Things Ali Salehi, Prof. Karl Aberer.
Abstractions for Shared Sensor Networks DMSN September 2006 Michael J. Franklin.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Computer Science Storage Systems and Sensor Storage Research Overview.
HOL9396: Oracle Event Processing 12c
The Structure of (Computer) Scientific Revolutions Dow Jones Enterprise Ventures May 2006 Michael Franklin UC Berkeley & Amalgamated Insight.
Adaptive Cleaning for RFID Data Streams VLDB /12/06 Shawn Jeffery Minos Garofalakis Michael Franklin UC Berkeley Intel Research Berkeley UC Berkeley.
Data-Intensive Systems Michael Franklin UC Berkeley
IBM Research – Thomas J Watson Research Center | March 2006 © 2006 IBM Corporation Events and workflow – BPM Systems Event Application symposium Parallel.
Sensor Coordination using Role- based Programming Steven Cheung NSF NeTS NOSS Informational Meeting October 18, 2005.
Global Services for Internet Scale e-Science Matt Welsh Harvard University Division of Engineering and Applied Sciences.
Sensor Data Management: Challenges and (some) Solutions Amol Deshpande, University of Maryland.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Cluster Reliability Project ISIS Vanderbilt University.
PIER & PHI Overview of Challenges & Opportunities Ryan Huebsch † Joe Hellerstein † °, Boon Thau Loo †, Sam Mardanbeigi †, Scott Shenker †‡, Ion Stoica.
Progress SOA Reference Model Explained Mike Ormerod Applied Architect 9/8/2008.
1 TAG: A Tiny Aggregation Service for Ad-Hoc Sensor Networks Samuel Madden UC Berkeley with Michael Franklin, Joseph Hellerstein, and Wei Hong December.
Sensor Database System Sultan Alhazmi
Network Computing Laboratory HiFi Systems: Network-Centric Query Processing for the Physical World Michael J. Franklin, Shawn R. Jeffrey, et al UC Berkeley.
한국기술교육대학교 컴퓨터 공학 김홍연 Habitat Monitoring with Sensor Networks DKE.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
The Future of Data Management or The Structure of (Computer) Scientific Revolutions EECS BEARS Conference February 2007 Michael Franklin UC Berkeley &
1 SATWARE: A Semantic Middleware for Multi Sensor Applications Sharad Mehrotra.
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
1 REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
The GriPhyN Planning Process All-Hands Meeting ISI 15 October 2001.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.
Yanlei Diao, University of Massachusetts Amherst Future Directions in Sensor Data Management Yanlei Diao University of Massachusetts, Amherst.
Plethora: Infrastructure and System Design. Introduction Peer-to-Peer (P2P) networks: –Self-organizing distributed systems –Nodes receive and provide.
Programming Sensor Networks Andrew Chien CSE291 Spring 2003 May 6, 2003.
Internet of Things. IoT Novel paradigm – Rapidly gaining ground in the wireless scenario Basic idea – Pervasive presence around us a variety of things.
Adaptive Cleaning for RFID Data Streams. RFID: Radio Frequency IDentification.
W. Hong & S. Madden – Implementation and Research Issues in Query Processing for Wireless Sensor Networks, ICDE 2004.
In-Network Query Processing on Heterogeneous Hardware Martin Lukac*†, Harkirat Singh*, Mark Yarvis*, Nithya Ramanathan*† *Intel.
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Enabling Grids for E-sciencE High Performance Distributed Computing Sophie Lemaitre Monterey - California July 2007.
Applying Control Theory to Stream Processing Systems
The Design of an Acquisitional Query Processor For Sensor Networks
Distributing Queries Over Low Power Sensor Networks
Adaptive Cleaning for RFID Data Streams
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
Presentation transcript:

HiFi: Network-centric Query Processing in the Physical World SAP Research Forum February 2005 Mike Franklin UC Berkeley

Mike Franklin UC Berkeley EECS Introduction Receptors everywhere! Wireless sensor networks, RFID technologies, digital homes, network monitors,... Large-scale deployments will be as High Fan-In Systems

Mike Franklin UC Berkeley EECS High Fan-in Systems Large numbers of receptors = large data volumes Hierarchical, successive aggregation The “Bowtie”

Mike Franklin UC Berkeley EECS High Fan-in Example (SCM) Receptors Warehouses, Stores Dock doors, Shelves Regional Centers Headquarters

Mike Franklin UC Berkeley EECS Properties High Fan-In, globally-distributed architecture. Large data volumes generated at edges. Filtering and cleaning must be done there. Successive aggregation as you move inwards. Summaries/anomalies continually, details later. Strong temporal focus. Strong spatial/geographic focus. Streaming data and stored data. Integration within and across enterprises.

Mike Franklin UC Berkeley EECS Design Space: Time Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) On-the-fly processing Disk-based processing Stream/Disk Processing Time Scale seconds years

Mike Franklin UC Berkeley EECS Design Space: Geography Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) Geographic Scope local global Several Readers Regional Centers Central Office

Mike Franklin UC Berkeley EECS Design Space: Resources Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) Individual Resources tiny huge Devices Stargates/ Desktops Clusters/ Grids

Mike Franklin UC Berkeley EECS Design Space: Data Filtering, Cleaning, Alerts Monitoring, Time-series Data mining (recent history) Archiving (provenance and schema evolution) Degree of Detail Aggregate Data Volume Dup Elim history: hrs Interesting Events history: days Trends/Archive history: years

Mike Franklin UC Berkeley EECS State of the Art Current approaches: hand-coded, script-based expensive, one-off, brittle, hard to deploy and keep running Piecemeal/stovepipe systems Each type of receptor (RFID, sensors, etc) handled separately Standards-efforts not addressing this: Protocol design bent Different “data models” at each level Reinventing “query languages” at each level  No end-to-end, integrated middleware for managing distributed receptor data

Mike Franklin UC Berkeley EECS HiFi A data management infrastructure for high fan-in environments Uniform Declarative Framework Every node is a data stream processor that speaks SQL-ese  stream-oriented queries at all levels Hierarchical, stream-based views as an organizing principle

Mike Franklin UC Berkeley EECS Why Declarative? (database dogma) Independence: data, location, platform Allows the system to adapt over time Many optimization opportunities In a complex system, automatic optimization is key. Also, optimization across multiple applications. Simplifies Programming ???

Mike Franklin UC Berkeley EECS Building HiFi

Mike Franklin UC Berkeley EECS Integrating RFID & Sensors (the “loudmouth” query)

Mike Franklin UC Berkeley EECS A Tale of Two Systems TinyDB Declarative query processing for wireless sensor networks In-network aggregation Released as part of TinyOS Open Source Distribution TelegraphCQ Data stream processor Continuous, adaptive query processing with aggressive sharing Built by modifying PostgreSQL Open source “beta” release out now; new release soon

Mike Franklin UC Berkeley EECS The Network is the Database: Basic idea: treat the sensor net as a “virtual table”. System hides details/complexities of devices, changing topologies, failures, … System is responsible for efficient execution. Developed on TinyOS/Motes SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Sensor Network TinyDB Query, Trigger Data TinyDB

Mike Franklin UC Berkeley EECS TelegraphCQ: Data Stream Monitoring Streaming Data Network monitors Sensor Networks, RFID News feeds, Stock tickers, … B2B and Enterprise apps Trade Reconciliation, Order Processing etc. (Quasi) real-time flow of events and data Manage these flows to drive business processes. Can mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.

Mike Franklin UC Berkeley EECS Data Stream Processing Queries Queries Data Traditional Database Data Stream Processor Result Tuples Data streams are unending Continuous, long running queries Real-time processing Data

Mike Franklin UC Berkeley EECS Windowed Queries SELECT S.city, AVG(temp) FROM SOME_STREAM S [range by ‘5 seconds’ slide by ‘5 seconds’] WHERE S.state = ‘California’ GROUP BY S.city “I want to look at 5 seconds worth of data” “I want a result tuple every 5 seconds” A typical streaming query Result Tuple(s) Data Stream Result Tuple(s) … Window Window Clause

Mike Franklin UC Berkeley EECS TelegraphCQ Architecture Proxy TelegraphCQ Front End Planner Parser Listener Mini-Executor Catalog TelegraphCQ Wrapper ClearingHouse Wrappers Query Plan Queue Eddy Control Queue Query Result Queues } Shared Memory Shared Memory Buffer Pool Disk Split TelegraphCQ Back End Modules Scans CQEddy Split TelegraphCQ Back End Modules Scans CQEddy

Mike Franklin UC Berkeley EECS The HiFi System TelegraphCQ TinyDB Stargates Sensor Networks & RFID Readers RFID Wrappers PC

Mike Franklin UC Berkeley EECS Basic HiFi Architecture HiFi Glue DSQP HiFi Glue DSQP MDR Hierarchical federation of nodes Each node: Data Stream Query Processor (DSQP) HiFi Glue Views drive system functionality Metadata Repository (MDR) HiFi Glue DSQP HiFi Glue DSQP Management Query Planning Archiving Internode coordination and communication

Mike Franklin UC Berkeley EECS HiFi Processing Pipelines The CSAVA Framework Multiple Receptors Single TupleWindow CSAVA Generalization Arbitrate Clean Smooth Validate Analyze Join w/Stored Data On-line Data Mining

Mike Franklin UC Berkeley EECS CSAVA Processing Clean CREATE VIEW cleaned_rfid_stream AS (SELECT receptor_id, tag_id FROM rfid_stream rs WHERE read_strength >= strength_T)

Mike Franklin UC Berkeley EECS CSAVA: Processing Clean Smooth CREATE VIEW smoothed_rfid_stream AS (SELECT receptor_id, tag_id FROM cleaned_rfid_stream [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= count_T)

Mike Franklin UC Berkeley EECS CSAVA: Processing Clean Smooth Arbitrate CREATE VIEW arbitrated_rfid_stream AS (SELECT receptor_id, tag_id FROM smoothed_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’] GROUP BY receptor_id, tag_id HAVING count(*) >= ALL (SELECT count(*) FROM smoothed_rfid_stream [range by ’5 sec’, slide by ’5 sec’] WHERE tag_id = rs.tag_id GROUP BY receptor_id))

Mike Franklin UC Berkeley EECS CSAVA: Processing Arbitrate Validate CREATE VIEW validated_tags AS (SELECT tag_name, FROM arbitrated_rfid_stream rs [range by ’5 sec’, slide by ’5 sec’], known_tag_list tl WHERE tl.tag_id = rs.tag_id Clean Smooth

Mike Franklin UC Berkeley EECS CSAVA: Processing Validate CREATE VIEW tag_count AS (SELECT tag_name, count(*) FROM validated_tags vt [range by ‘5 min’, slide by ‘1 min’] GROUP BY tag_name Analyze Arbitrate Clean Smooth

Mike Franklin UC Berkeley EECS Ongoing Work Bridging the physical-digital divide VICE – A “Virtual Device” Interface Hierarchical query processing Automatic Query planning & dissemination Complex event processing Unifying event and data processing

Mike Franklin UC Berkeley EECS Virtual Device (VICE) Layer “Metaphysical* Data Independence” *The branch of philosophy that deals with the ultimate nature of reality and existence. (name due to Shawn Jeffery)

Mike Franklin UC Berkeley EECS The Virtues of VICE A simple RFID Experiment 2 Adjacent Shelves, 8 ft each 10 EPC-tagged items each, plus 5 moved between them. RFID antenna on each shelf.

Mike Franklin UC Berkeley EECS Ground Truth

Mike Franklin UC Berkeley EECS Raw RFID Readings

Mike Franklin UC Berkeley EECS After VICE Processing Under the covers (in this case): Cleaning, Smoothing, and Arbitration

Mike Franklin UC Berkeley EECS Other VICE Uses Once you have the right abstractions: “Soft Sensors” Quality and lineage streams Pushdown of external validation information Power management and other optimizations Data Archiving Model-based sensing “Non-declarative” code …

Mike Franklin UC Berkeley EECS Hierarchical Query Processing “I provide raw readings for Soda Hall” “I provide avg daily values for Berkeley” “I provide avg weekly values for California” “I provide national monthly values for the US” Continuous and Streaming Automatic placement and optimization Hierarchical Temporal granularity vs. geographic scope Sharing of lower-level streams

Mike Franklin UC Berkeley EECS Complex Event Processing Needed for monitoring and actuation Key to prioritization (e.g., of detail data) Exploit duality of data and events Shared Processing “Semantic Windows” Challenge: a single system that simultaneously handles events spanning seconds to years.

Mike Franklin UC Berkeley EECS Next Steps Archiving and Detail Data Dealing with transient overloads Rate matching between stored and streaming data Scheduling large archive transfers System design & deployment Tools for provisioning and evaluating receptor networks System monitoring & management Leverage monitoring infrastructure for introspection

Mike Franklin UC Berkeley EECS Conclusions Receptors everywhere  High Fan-In Systems Current middleware solutions are complex & brittle Uniform declarative framework is the key The HiFi project is exploring this approach Our initial prototype Leveraged TelegraphCQ and TinyDB Demonstrated RFID/multiple sensor integration Validated the HiFi approach We have an ambitious on-going research agenda See for more info.

Mike Franklin UC Berkeley EECS Acknowledgements Team HiFi: Shawn Jeffery, Sailesh Krishnamurthy, Frederick Reiss, Shariq Rizvi, Eugene Wu, Nathan Burkhart, Owen Cooper, Anil Edakkunni Experts in VICE: Gustavo Alonso, Wei Hong, Jennifer Widom Funding and/or Reduced-Price Gizmos from NSF, Intel, UC MICRO program, and Alien Technologies