An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.

Slides:



Advertisements
Similar presentations
Construction process lasts until coding and testing is completed consists of design and implementation reasons for this phase –analysis model is not sufficiently.
Advertisements

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
Data Structures.
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
MapReduce Online Veli Hasanov Fatih University.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 11 Database Performance Tuning and Query Optimization.
WXES2106 Network Technology Semester /2005 Chapter 8 Intermediate TCP CCNA2: Module 10.
Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.
Query Processing Presented by Aung S. Win.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 11 Database Performance Tuning and Query Optimization.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
By Lecturer / Aisha Dawood 1.  You can control the number of dispatcher processes in the instance. Unlike the number of shared servers, the number of.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Query Processing, Resource Management, and Approximation in a Data Stream Management System.
Database Management 9. course. Execution of queries.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
Master’s Thesis (30 credits) By: Morten Lindeberg Supervisors: Vera Goebel and Jarle Søberg Design, Implementation, and Evaluation of Network Monitoring.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
Design, Implementation, and Evaluation of Network Monitoring Tasks with the TelegraphCQ Data Stream Management System INF5100, Autumn 2006 Jarle Søberg.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
DDBMS Distributed Database Management Systems Fragmentation
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
CE Operating Systems Lecture 13 Linux/Unix interprocess communication.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Processes CSCI 4534 Chapter 4. Introduction Early computer systems allowed one program to be executed at a time –The program had complete control of the.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 3: Process-Concept.
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 17 This presentation © 2004, MacAvon Media Productions Multimedia and Networks.
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
File Systems cs550 Operating Systems David Monismith.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
M.Kersten MonetDB, Cracking and recycling Martin Kersten CWI Amsterdam.
Query Processing CS 405G Introduction to Database Systems.
REED : Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
Practical Database Design and Tuning
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Tuning Transact-SQL Queries
Module 11: File Structure
Containers and Lists CIS 40 – Introduction to Programming in Python
SOFTWARE DESIGN AND ARCHITECTURE
CSC 4250 Computer Architectures
Database Performance Tuning and Query Optimization
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Examples of Physical Query Plan Alternatives
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Practical Database Design and Tuning
Selected Topics: External Sorting, Join Algorithms, …
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Chapter 8 Advanced SQL.
Chapter 3 Part 3 Switching and Bridging
Chapter 11 Database Performance Tuning and Query Optimization
Lecture 3: Main Memory.
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
PSoup: A System for streaming queries over streaming data
Eddies for Continuous Queries
Chapter 13: I/O Systems.
Presentation transcript:

An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg

INF5100, Autumn 2009 © Jarle Søberg TelegraphCQ Introduction and overview Description of concepts – Wrappers – Fjords – Eddies – SteMs – CACQ – Other features A practical overview Limitations

INF5100, Autumn 2009 © Jarle Søberg TelegraphCQ: Introduction Developed at Berkeley Written in C – Open source GNU license Based on the PostgreSQL DBMS Current version: 2.1 on PostgreSQL code base – Each group has a running copy on dmms-lab107 Project closed down Summer 2006 – Still, many interesting and important features to discuss

INF5100, Autumn 2009 © Jarle Søberg Client TelegraphCQ: Overview Shared memory buffer pool Wrapper clearing house Disk Shared memory queues Back end Front end Planner Parser Listener Fjords Eddies SteMs CACQ Client PostmasterServer

INF5100, Autumn 2009 © Jarle Søberg TelegraphCQ: Overview Based on modules – Query processing – Adaptive routing – Ingress and caching Communicate via Fjords – Push and pull data in pipeline fashion – Reduce overhead by non-blocking behavior

INF5100, Autumn 2009 © Jarle Søberg Wrappers Transform data to Datum items Push or pull Several formats – Comma separated format (CSV) is used by TelegraphCQ Contacted via TCP Wrapper clearing house (WCH) – Many connections possible Store streams to database if needed

INF5100, Autumn 2009 © Jarle Søberg Wrappers Shedded tuples, Data Triage – Support for dropping tuples Look at Morten’s presentation about methods – Periodically summarize tuple information – Runs “shadow” queries on shedded tuples The queries run in parallel with the real queries Shared Memory Buffer Pool shed

INF5100, Autumn 2009 © Jarle Søberg Eddies DBMSs – Query plan created once – E.g. joined (we use “ " ” to show a join) on some attributes may give this plan: – Ok, as long as data set is finite and pulled

INF5100, Autumn 2009 © Jarle Søberg Eddies How about pushed data? Blocking or throwing away tuples is unavoidable!

INF5100, Autumn 2009 © Jarle Søberg Eddies A reconfiguration is necessary Might be much changes in the different streams Reconfiguration may take long time Not dynamic enough

INF5100, Autumn 2009 © Jarle Søberg Eddies An alternative is to use an eddy: eddy Dynamic on a tuple-per-tuple basis Adaptive to changes in the stream

Eddies In fluid dynamics, an eddy is the swirling of a fluid and the reverse current created when the fluid flows past an obstacle (Wikipedia). INF5100, Autumn 2009 © Jarle Søberg

Eddies: Details Bitmap per tuple represents each operator – ready and done bits – The ready bits specifies the operators the tuple should visit – Tuple is ready for output when all done bits are set – Manipulate bits to set a route for a tuple – On creation of new tuples due to e.g. joins: OR the bitmaps 10tuple

INF5100, Autumn 2009 © Jarle Søberg Eddies: Routing policy Priority scheme – Tuples coming from an operator = high priority Prevents starvation Originally: Back-pressure – Self regulating due to queuing – Naïve, hence not optimal Extended to lottery scheduling

INF5100, Autumn 2009 © Jarle Søberg Eddies: Lottery scheduling Each operator has ticket account – Credited for each arriving tuple – Debited for each leaving tuple Lottery among available operators – Empty in-queue: Fast operators – High number of tickets: Low selectivity operators

INF5100, Autumn 2009 © Jarle Søberg Eddies: Lottery scheduling Low selectivity operators – Win even if the operator is slowing down – Expand with a window scheme Banked tickets Escrow tickets window operator

INF5100, Autumn 2009 © Jarle Søberg Eddies Works for single query environments Simple and adaptive May still not be optimal with respect to dynamic changes over e.g. a single join Extend the eddy’s strength by introducing state modules (SteMs)

INF5100, Autumn 2009 © Jarle Søberg SteMs Split joins in two – Dynamic – Send build tuples Build hash tables – Send probe tuples Look for matches RS T eddy R S T

INF5100, Autumn 2009 © Jarle Søberg SteMs RS Any possible problems? – Two equal intermediate tuples! Solved by globally unique sequence number – Only youngest tuples allowed to match

INF5100, Autumn 2009 © Jarle Søberg SteMs: Issues SteMs are implemented using hash tables – Only equi-joins work properly Alternatively, use B-trees – Can correctly express more: “<>”, “>>”, “<=”, … – Is this consistent with the data stream concept?

INF5100, Autumn 2009 © Jarle Søberg Eddies and SteMs Still single-query environment DSMSs aim to support many concurrent queries This feature needs to be adaptive and manage creation and deletion of queries in real-time Optimization is proven NP-hard

INF5100, Autumn 2009 © Jarle Søberg Introducing CACQ Continuously Adaptive Continuous Queries Heuristics – Adding more information to the tuples – Creating even more meta information Avoid sending same singleton and intermediate tuples to same operators First of all: Use grouped filters!

INF5100, Autumn 2009 © Jarle Søberg CACQ: Grouped Filters Module for early filtering of selection predicates – For example: SELECT * FROM stream WHERE stream.a = 7 – All tuples without stream.a = 7 are not sent to the eddy – Includes “>”, “<”, and “ ”, as well

INF5100, Autumn 2009 © Jarle Søberg The CACQ Tuple Extended the eddy tuple to include bitmaps for queriesCompleted and sourceId The queriesCompleted bitmap – Represents the queries – Shows a lineage of the tuple The sourceId bitmap – Source when queries do not share tuples

INF5100, Autumn 2009 © Jarle Søberg Eddies, SteMs, and CACQ: Issues Bitmap statically configured – Faster, but not dynamic Much overhead experienced by the developers – Tuple-by-tuple processing takes time – Batching tuples are suggested Static for shorter periods

INF5100, Autumn 2009 © Jarle Søberg Continuous Queries in TelegraphCQ Windowing supports sliding, hopping, and jumping behavior – Aggregations are important for correct results – Output does not start until window is reached when aggregations are used SELECT stream.color, COUNT(*) FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’] GROUP BY stream.color window START OUPUT!

INF5100, Autumn 2009 © Jarle Søberg Other Information Pros: – Introspective streams – Sub-queries, to some extent – Shadow queries for Data Triage tuples Cons: – OR is not understood – Only istreams, and not dstreams – Only six ANDs between SteMs – TelegraphCQ is very unstable at high pressure