TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Spark: Cluster Computing with Working Sets
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena BalazinskaNesime Tatbul MIT Brown.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
Fjording the Stream: An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael J. Franklin University of California, Berkeley Proceedings.
PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Copyright ©2009 Opher Etzion Event Processing Course Engineering and implementation considerations (related to chapter 10)
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Application-Layer Anycasting By Samarat Bhattacharjee et al. Presented by Matt Miller September 30, 2002.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
Data Streams and Continuous Query Systems
CS 5204 (FALL 2005)1 Leases: An Efficient Fault Tolerant Mechanism for Distributed File Cache Consistency Gray and Cheriton By Farid Merchant Date: 9/21/05.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Bi-Hadoop: Extending Hadoop To Improve Support For Binary-Input Applications Xiao Yu and Bo Hong School of Electrical and Computer Engineering Georgia.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Full-Text Support in a Database Semantic File System Kristen LeFevre & Kevin Roundy Computer Sciences 736.
Telegraph Status Joe Hellerstein. Overview Telegraph Design Goals, Current Status First Application: FFF (Deep Web) Budding Application: Traffic Sensor.
Secure middleware patterns E.B.Fernandez. Middleware security Architectures have been studied and several patterns exist Security aspects have not been.
Queue Manager and Scheduler on Intel IXP John DeHart Amy Freestone Fred Kuhns Sailesh Kumar.
Bigtable: A Distributed Storage System for Structured Data
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Using Collaborative Filtering to Weave an Information Tapestry
Contents. Goal and Overview. Ingredients. The Page Model.
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Module 11: File Structure
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Self Healing and Dynamic Construction Framework:
Applying Control Theory to Stream Processing Systems
Out-of-Process Components
Ishan Sharma Abhishek Mittal Vivek Raj
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Database Performance Tuning and Query Optimization
Apache Spark Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Aditya Waghaye October 3, 2016 CS848 – University.
Multimedia Data Stream Management System
AGENT OS.
Chapter 5 Designing the Architecture Shari L. Pfleeger Joanne M. Atlee
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
CLUSTER COMPUTING.
The Dataflow Model.
QuaSAQ: Enabling End-to-End QoS for Distributed Multimedia Databases
Chapter 11 Database Performance Tuning and Query Optimization
Out-of-Process Components
Control Theory in Log Processing Systems
B. Stegmaier und R. Kuntschke TU München – Fakultät für Informatik
A. Kemper, R. Kuntschke, and B. Stegmaier
Database System Architectures
TensorFlow: A System for Large-Scale Machine Learning
PSoup: A System for streaming queries over streaming data
Eddies for Continuous Queries
Adaptive Query Processing (Background)
Streams and Stuff Sirish and Sam and Mike.
Robust Query Processing through Progressive Optimization
Presentation transcript:

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein,Wei Hong*, Sailesh Krishnamurthy, Sam Madden, Vijayshankar Raman**, Fred Reiss, and Mehul Shah University of California, Berkeley *Intel Berkeley Laboratory **IBM Almaden Research Center http://telegraph.cs.berkeley.edu/

Contents Background and Motivation Telegraph – Architecture Window Semantics in TelegraphCQ TelegraphCQ – Design Overview TelegraphCQ – Architecture Conclusion All diagrams and contents are directly adapted/taken from the paper itself! 5/7/2019

TelegraphCQ – Background and Motivation Adaptive Dataflow Architecture – systems that could adjust their processing on-the-fly in response to Changes in user needs [HACO+99] Intermittent delays in accessing data across WANs [UFA98] Shared Processing CACQ [MSHR02] PSoup [CF02] Limitations - processing restricted to in-memory data No scheduling and resource management for queries with little or no overlap No Quality of Service (QoS) for adapting to resource limitations No tradeoff between flexibility and overhead 5/7/2019

Telegraph - Architecture Extensible set of composable dataflow modules/operators Producer-Consumer design with Fjords API Push as well as Pull queues Ingress and Caching Query Processing Adaptive Routing 5/7/2019

Adaptive Processing – Eddies & SteMs EDDY – continuously route tuples according to a routing policy per tuple basis routing requiring associated state to the tuple SteMs – Temporary repository of tuples Stores homogeneous tuples Supports build (insert), probe (search) and eviction (deletion) operations 5/7/2019

Fjords – InterModule Communication Allow use of mixture of push and pull connections between modules a pull-queue is implemented using a blocking dequeue on the consumer side and a blocking enqueue on the producer side. A push-queue is implemented using non-blocking enqueue and dequeue; control is returned to the consumer when the queue is empty Execute query over any combination of streaming and static data sources Flux – Scaling Up Dataflow Processing Interposed between a producer-consumer operator pair in a pipelined, partitioned dataflow Fault-tolerant, Load-balancing eXchange Load-balancing via online repartitioning of the input stream and corresponding state of operators Fault-tolerance by leveraging these state movement mechanisms to replicate an operator’s internal state and in-flight data 5/7/2019

Initial CQ Approaches PSoup CACQ First CQ engine exploiting adaptive query processing framework Modification of Eddies- execution of multiple queries by executing a single “super”- query as disjunction of all the queries Tuple Lineage – state to determine the client Grouped Filters – index for single variable Boolean factors over the same attribute for optimizing selections in the shared execution PSoup Extends CACQ Allows queries to access historical data – treats data and queries symmetrically Adds support for disconnected operation-users can register queries 5/7/2019

Window Semantics in TelegraphCQ Rich windowing schemes over both already-arrived as well as incoming data Various window semantics are- Snapshot query: execute exactly once over one window e.g. “Select the closing prices for MSFT on the first five days of trading” Landmark query: fixed beginning point and a forward moving endpoint e.g. “Select all the days after the hundredth trading day, on which the closing price of MSFT has been greater than $50. Keep this query standing in the system for a thousand trading days” Sliding query: forward moving beginning and end e.g. “On every fifth trading day starting today, calculate the average closing price of MSFT for the five most recent trading days. Keep the query standing for fifty trading days” Temporal Band-Join: join tuples in one stream with those in another based on timestamp e.g. “For the five most recent trading days starting today, select all stocks that closed higher than MSFT on a given day. Keep the query standing for twenty trading days” 5/7/2019

TelegraphCQ – Design Overview Adapted the architecture of PostgreSQL Implemented the new system in C/C++ to leverage the open source PostgreSQL code base Reused components with different levels of changes 5/7/2019

TelegraphCQ – Architecture Three processes that comprise the TelegraphCQ server FrontEnd Wrapper Providing Abstraction of External Source Separate Process( non-blocking) Executor Execution Object Providing Execution Context for Multiple Queries Dispatch Unit Performing Actual Work 5/7/2019

TelegraphCQ: Rebuttal Query Grouping and Sharing: Degree of overlap. No prioritizing of queries. Adaptivity Schemes: “Per tuple”, “per operator” or batch. No experimental evaluation of the efficiency of the schemes. 5/7/2019

TelegraphCQ: Rebuttal Ingress module: allows input from various sources. Does that bring the efficiency down? Lack of Egress module. It does not support value based windows. 5/7/2019

TelegraphCQ: Rebuttal It does not have special arrangements for supporting ad-hoc queries as in Aurora. Does not support distributed operations (proposed later). No support for crash recovery and imprecise or missing data. 5/7/2019

Conclusion TelegraphCQ provides adaptive dataflow and shared processing architecture Eddy and SteM form building blocks for adaptive processing Features like Fjord’s inter-module communication (push and pull connections) and Flux – Fault-tolerant and Load-balancing Exchange CACQ (tuple-lineage and group-filters) PSoup (Symmetrical treatment of data and queries) Built over the PostgreSQL framework The rebuttal presented was comparing TelegraphCQ with other stream engines and with the concept of relational databases. Thank you  5/7/2019