Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.

Slides:



Advertisements
Similar presentations
1 Introduction to Data Management. Understand: meaning of data management history of managing data challenges in managing data approaches to managing.
Advertisements

Chart 1 C3 & IO Advanced Concepts SPARQLMotion for Distributed Network Ops John Carson, Lockheed Martin 2009 Semantic Technology.
Cognitive Publish/Subscribe for Heterogeneous Clouds Šarūnas Girdzijauskas, Swedish Institute of Computer Science (SICS) Joint work with:
4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.
Joint work with Svilen Mihaylov, Marie Jacob, Mengmeng Liu, Sudipto Guha, Boon Thau Loo DMSN 2008 August 24, 2008 Zachary G. Ives University of Pennsylvania.
TTDD: A Two-tier Data Dissemination Model for Large- scale Wireless Sensor Networks Haiyun Luo Fan Ye, Jerry Cheng Songwu Lu, Lixia Zhang UCLA CS Dept.
Some contributions to the management of data in grids Lionel Brunie National Institute of Applied Science (INSA) LIRIS Laboratory/DRIM Team – UMR CNRS.
1 Next Century Challenges: Scalable Coordination in sensor Networks MOBICOMM (1999) Deborah Estrin, Ramesh Govindan, John Heidemann, Satish Kumar Presented.
Multicasting in Mobile Ad-Hoc Networks (MANET)
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Continuously Adaptive Processing of Data and Query Streams Michael Franklin UC Berkeley April 2002 Joint work w/Joe Hellerstein and the Berkeley DB Group.
The Cougar Approach to In-Network Query Processing in Sensor Networks By Yong Yao and Johannes Gehrke Cornell University Presented by Penelope Brooks.
Queries over Sensor Networks Sam Madden UC Berkeley Database Seminar October 5, 2001.
A Survey on Sensor Networks Rick Han CSCI 7143 Secure Sensor Networks Fall 2004.
CS538: Advanced Topics in Information Systems. 2 Secure Location transparency Consistent Real-Time Available Black Box: Distributed Storage [GMM] ? Data.
Finale’ cs294-8 Design of Deeply Networked Systems Spring 2000 David Culler & Randy Katz U.C. Berkeley
Cumulative Violation For any window size  t  Communication-Efficient Tracking for Distributed Cumulative Triggers Ling Huang* Minos Garofalakis.
Object Naming & Content based Object Search 2/3/2003.
Sensor Networks: Implications for Database Systems and Vice-Versa Michael Franklin January UCB Sensor Day.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Research Directions for the Internet of Things Supervised by: Dr. Nouh Sabry Presented by: Ahmed Mohamed Sayed.
Data-Intensive Systems Michael Franklin UC Berkeley
Architectural Styles SE 464 / ECE 452 / CS 446 Chang Hwan Peter Kim Based on slides prepared by Michał Antkiewicz June 24, 2006.
Consultation Workshop “Future R&D Challenges on Networked Media Systems” Welcome and Context Luis Rodríguez-Roselló Director a.i “Converged Networks &
Mobile Agents in Wireless Sensor Networks Ivan Vukasinovic Zoran Babovic Goran Rakocevic.
P2P Systems Meet Mobile Computing A Community-Oriented Software Infrastructure for Mobile Social Applications Cristian Borcea *, Adriana Iamnitchi + *
Word Wide Cache Distributed Caching for the Distributed Enterprise.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Distributed Real-Time Systems for the Intelligent Power Grid Prof. Vincenzo Liberatore.
CodeBlue – Wireless Sensor Networks for Emergency Medical Care Matt Welsh, David Malan, Breanne Duncan, and Thaddeus Fulford-Jones Harvard University Steve.
SensIT PI Meeting, January 15-17, Self-Organizing Sensor Networks: Efficient Distributed Mechanisms Alvin S. Lim Computer Science and Software Engineering.
MIDDLEWARE SYSTEMS RESEARCH GROUP MSRG.ORG MADES - A Multi-Layered, Adaptive, Distributed Event Store Tilmann Rabl Mohammad Sadoghi Kaiwen Zhang Hans-Arno.
SCAN: a Scalable, Adaptive, Secure and Network-aware Content Distribution Network Yan Chen CS Department Northwestern University.
Tufts Wireless Laboratory School Of Engineering Tufts University “Network QoS Management in Cyber-Physical Systems” Nicole Ng 9/16/20151 by Feng Xia, Longhua.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
ACME: a platform for benchmarking distributed applications David Oppenheimer, Vitaliy Vatkovskiy, and David Patterson ROC Retreat 12 Jan 2003.
Managing a Cloud For Multi Agent System By, Pruthvi Pydimarri, Jaya Chandra Kumar Batchu.
An Integrated Instrumentation Architecture for NGI Applications Ian Foster, Darcy Quesnel, Steven Tuecke Argonne National Laboratory The University of.
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
The Data Ring: Community Content Sharing Serge Abiteboul (INRIA) Alkis Polyzotis (UC Santa Cruz)
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Event Processing A Perspective From Oracle Dieter Gawlick, Shailendra Mishra Oracle Corporation March,
Copyright © 2002 Intel Corporation. Intel Labs Towards Balanced Computing Weaving Peer-to-Peer Technologies into the Fabric of Computing over the Net Presented.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
The Problem of Location Determination and Tracking in Networked Systems Weikuan Yu, Hui Cao, and Vineet Mittal The Ohio State University.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
NCP Info DAY, Brussels, 23 June 2010 NCP Information Day: ICT WP Call 7 - Objective 1.3 Internet-connected Objects Alain Jaume, Deputy Head of Unit.
Network Intelligence, Monetizing the Meter Mauricio Arango Sun Microsystems January 21, 2010.
Societal-Scale Computing: The eXtremes Scalable, Available Internet Services Information Appliances Client Server Clusters Massive Cluster Gigabit Ethernet.
Helping the Cause of Medical Device Interoperability Through Standards- based Test Tools DoC/NIST John J. Garguilo January 25,
Danilo Florissi, Yechiam Yemini (YY), Sushil da Silva, Hao Huang Columbia University, New York, NY 10027
Workflow Management Concepts and Requirements For Scientific Applications.
Patrick Ortiz Global SQL Solution Architect Dell Inc. BIN209.
Internet of Things. Creating Our Future Together.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
IoT R&I on IoT integration and platforms INTERNET OF THINGS
András Benczúr Head, “Big Data – Momentum” Research Group Big Data Analytics Institute for Computer.
Weikuan Yu, Hui Cao, and Vineet Mittal The Ohio State University
Modern Data Management
Moirae: History-Enhanced Monitoring
Pervasive Data Access (PDA) Research Group
SDM workshop Strawman report History and Progress and Goal.
Distributing Queries Over Low Power Sensor Networks
Power is Leading Design Constraint
Network Intelligence, Monetizing the Meter
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Self-Managed Systems: an Architectural Challenge
Information Capture and Re-Use
Presentation transcript:

Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.

2 Data Stream Processing Networked data streams central to current and future computing. Existing data management and query processing infrastructure is lacking: – Adaptability – Continuous and Incremental Processing – Work Sharing for large scale – Resource scalability: from “smart dust” up to clusters to grids. XML provides additional opportunites.

3 Example 1: “Transactional Flows” E-Commerce, clickstream, swipestream, logs… Network Monitoring B2B and Enterprise apps – Supply-Chain, CRM, ERP (Quasi) real-time flow of events and data Must manage these flows to drive business processes. Mine flows to create and adjust business rules. Can also “tap into” flows for on-line analysis.

4 Example 2: Information Dissemination User Profiles Users Filtered Data Data Sources Doc creation or crawler initiates flow of data towards users. profiles are aggregated back towards data.

5 Example 3: Sensor Nets Tiny (or not so tiny) devices measure the physical world. – Berkeley “motes”, Smart Dust, Smart Tags, … Many monitoring applications – Transportation, Seismic, Energy, Military… Form dynamic ad hoc networks. Aggregate and communicate streams of values. Not one way – can actuate to effect or actively monitor the environment

6 Common Features Centrality of Dataflow and Data Routing – Architecture is focused on data movement – Moving streams of data through code in a network Volatility of the environment – Dynamic resources & topology, partial failures – Long-running (never-ending?) tasks – Potential for user interaction during the flow – Large Scale: users, data, resources, … Resource Constraints – Bandwidth, memory,processing,battery,… – Time and human attention

7 In The Beginning Data Query Index Result

8 Pub Sub/CQ/Filtering Queries Data Index Result Effectively processes all queries simultaneously. Shares work for common sub-expressions.

9 Telegraph/PSoup: Query & Data Duality Queries Index Result Data Index

10 Telegraph/PSoup: Query & Data Duality Queries Index Result Data Index Query

11 PSoup – Query Invocation PSoup continuously maintains materialized views over streaming data and queries. Data is returned to user when query is invoked. – Invocation requires applying “windows” to precomputed results. Adaptive approach allows system to continuously absorb new data and new queries without recompilation. Lots of issues to study: – Query indexing, Spilling to disk, bulk processing – Other semantics and interaction models (e.g., alerts)

12 Stream Processing Research Agenda Need continuously-adaptive processing. Need appropriate data model & query lang. – Window semantics: input and output – Notification semantics & thresholds Approximation, satisficing, and QoS – must be driven by user needs and context – adapt to available resources & time constraints Integration & interaction with “pooled” data. – time travel, archiving, “normal” databases Structured, semi-, and un- data; XML etc. Sensor-sensitive processing. Metrics and Benchmarks (challenge problems).

13 Conclusions Dataflow and streaming are central to many emerging application areas. – Solutions require a mixture of database and networking approaches: adaptivity and tolerance of partial failure exploitation of user, app, and data semantics A new infrastructure is needed for solving these problems. – Duality of Data and Queries Currently a topic of major interest in the research community.