Data Streams & Continuous Queries The Stanford STREAM Project stanfordstreamdatamanager.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

Chapter 10: Designing Databases
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
The Power of How-to Queries joint work with Dan Suciu (University of Washington) Alexandra Meliou.
Query Processing, Resource Management, and Approximation in a Data Stream Management System Jennifer Widom Stanford University stanfordstreamdatamanager.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Query Processing, Resource Management, and Approximation in a Data Stream Management System Selected subset of slides taken from talk by Jennifer Widom.
1 Stream-based Data Management IS698 Min Song 2 Characteristics of Data Streams  Data Streams Data streams — continuous, ordered, changing, fast, huge.
Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate studentshttp://www-db.stanford.edu/stream.
1 PODS 2002 Motivation. 2 PODS 2002 Data Streams data sets Traditional DBMS – data stored in finite, persistent data sets data streams New Applications.
Fundamentals, Design, and Implementation, 9/e Chapter 7 Using SQL in Applications.
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
Introduction to Structured Query Language (SQL)
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Stream Data Management System Prototypes Ying Sheng, Richard Sia June 1, 2004 Professor Carlo Zaniolo CS 240B Spring 2004.
The Stanford Data Streams Research Project Profs. Rajeev Motwani & Jennifer Widom And a cast of full- and part-time students: Arvind Arasu, Brian Babcock,
Chapter 1 Database and Database Users Dr. Bernard Chen Ph.D. University of Central Arkansas.
Chapter 4: Organizing and Manipulating the Data in Databases
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS MBNA ebay
STREAM The Stanford Data Stream Management System.
DAVID M. KROENKE’S DATABASE PROCESSING, 10th Edition © 2006 Pearson Prentice Hall 7-1 David M. Kroenke’s Chapter Seven: SQL for Database Construction and.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
History Server & API Christopher Larrieu Jefferson Laboratory.
DBSQL 14-1 Copyright © Genetic Computer School 2009 Chapter 14 Microsoft SQL Server.
Module 7 Reading SQL Server® 2008 R2 Execution Plans.
Query Processing, Resource Management, and Approximation in a Data Stream Management System.
Database Management 9. course. Execution of queries.
Session-9 Data Management for Decision Support
MIS DATABASE SYSTEMS, DATA WAREHOUSES, AND DATA MARTS CHAPTER 3
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
MIS 327 Database Management system 1 MIS 327: DBMS Dr. Monther Tarawneh Dr. Monther Tarawneh Week 2: Basic Concepts.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Chapter 12: Designing Databases
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Views In some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.) In some.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
1 STREAM: The Stanford Data Stream Management System STanfordstREamdatAManager 陳盈君 吳哲維 林冠良.
6 1 Lecture 8: Introduction to Structured Query Language (SQL) J. S. Chou, P.E., Ph.D.
Building Data and Document-Driven Decision Support Systems How do managers access and use large databases of historical and external facts?
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Adaptive Query Processing in Data Stream Systems Paper written by Shivnath Babu Kamesh Munagala, Rajeev Motwani, Jennifer Widom stanfordstreamdatamanager.
Data Stream Management Systems
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Data Mining: Concepts and Techniques Mining data streams
Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
Data Models. 2 The Importance of Data Models Data models –Relatively simple representations, usually graphical, of complex real-world data structures.
1 © 2013 Cengage Learning. All Rights Reserved. This edition is intended for use outside of the U.S. only, with content that may be different from the.
Streaming Semantic Data COMP6215 Semantic Web Technologies Dr Nicholas Gibbins –
Managing Data Resources File Organization and databases for business information systems.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: DB the Cloud, and SQL & Stream Processing.
Database Systems: Design, Implementation, and Management Tenth Edition
Advanced Database Systems: DBS CB, 2nd Edition
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
COMP3211 Advanced Databases
Dieter Gawlick, Oracle October, 2005 (GGF15 in Boston)
Chapter 7 Using SQL in Applications
Probabilistic Databases
An Analysis of Stream Processing Languages
Presentation transcript:

Data Streams & Continuous Queries The Stanford STREAM Project stanfordstreamdatamanager

2 Data Streams Continuous streams of data elements (may be unbounded, rapid, time-varying) Occur in a variety of modern applications –Network monitoring and traffic engineering –Sensor networks, RFID tags –Telecom call records –Financial applications –Web logs and click-streams –Manufacturing processes DSMSDSMS = Data Stream Management System

stanfordstreamdatamanager 3 DBMS versus DSMS DBMS versus DSMS Persistent relations One-time queries Random access Access plan determined by query processor and physical DB design Transient streams (and persistent relations) Continuous queries Sequential access Unpredictable data characteristics and arrival patterns

stanfordstreamdatamanager 4 DSMS Scratch Store The (Simplified) Big Picture Input streams Register Query Streamed Result Stored Result Archive Stored Relations

stanfordstreamdatamanager 5 (Simplified) Network Monitoring Register Monitoring Queries DSMS Scratch Store Network measurements, Packet traces Intrusion Warnings Online Performance Metrics Archive Lookup Tables

stanfordstreamdatamanager 6 The STREAM System Data streams and stored relations SQL-based language for registering continuous queries Variety of query execution strategies Textual, graphical, and application interfaces Relational, centralized

stanfordstreamdatamanager 7 Rest of This Lecture Query language System issues and overview (brief) Live system demonstration

stanfordstreamdatamanager 8 Goals in Language Design 1)Support continuous queries over multiple streams and updateable relations 2)Exploit existing relational semantics to the extent possible 3)Easy queries should be easy to write 4)Simple queries should do what you expect

stanfordstreamdatamanager 9 Example Query 1 Two streams, contrived for ease of examples: Orders (orderID, customer, cost) Fulfillments (orderID, clerk) Total cost of orders fulfilled over the last day by clerk “Sue” for customer “Joe” Select Sum(O.cost) From Orders O, Fulfillments F [Range 1 Day] Where O.orderID = F.orderID And F.clerk = “Sue” And O.customer = “Joe”

stanfordstreamdatamanager 10 Example Query 2 Using a 10% sample of the Fulfillments stream, take the 5 most recent fulfillments for each clerk and return the maximum cost Select F.clerk, Max(O.cost) From Orders O, Fulfillments F [Partition By clerk Rows 5] 10% Sample Where O.orderID = F.orderID Group By F.clerk

stanfordstreamdatamanager 11Next Formal definitions for relations and streams Formal conversions between them Abstract semantics Concrete language: CQL Syntactic defaults and shortcuts Equivalence-based transformations

stanfordstreamdatamanager 12 Relations and Streams Assume global, discrete, ordered time domain Relation –Maps time T to set-of-tuples R Stream –Set of (tuple,timestamp) elements

stanfordstreamdatamanager 13Conversions StreamsRelations Window specification Special operators: Istream, Dstream, Rstream Any relational query language

stanfordstreamdatamanager 14 Conversion Definitions Stream-to-relation –S [W ] is a relation — at time T it contains all tuples in window W applied to stream S up to T –When W = , contains all tuples in stream S up to T Relation-to-stream –Istream(R) contains all (r,T ) where r  R at time T but r  R at time T–1 –Dstream(R) contains all (r,T ) where r  R at time T–1 but r  R at time T –Rstream(R) contains all (r,T ) where r  R at time T

stanfordstreamdatamanager 15 Abstract Semantics Take any relational query language Can reference streams in place of relations –But must convert to relations using any window specification language ( default window = [  ] ) Can convert relations to streams –For streamed results –For windows over relations (note: converts back to relation)

stanfordstreamdatamanager 16 Query Result at Time T Use all relations at time T Use all streams up to T, converted to relations Compute relational result Convert result to streams if desired

stanfordstreamdatamanager 17 Abstract Semantics – Example 1 Select F.clerk, Max(O.cost) From O [  ], F [Rows 1000] Where O.orderID = F.orderID Group By F.clerk Maximum-cost order fulfilled by each clerk in last 1000 fulfillments

stanfordstreamdatamanager 18 Abstract Semantics – Example 1 Select F.clerk, Max(O.cost) From O [  ], F [Rows 1000] Where O.orderID = F.orderID Group By F.clerk At time T: entire stream O and last 1000 tuples of F as relations Evaluate query, update result relation at T

stanfordstreamdatamanager 19 Abstract Semantics – Example 1 Istream Select Istream(F.clerk, Max(O.cost)) From O [  ], F [Rows 1000] Where O.orderID = F.orderID Group By F.clerk At time T: entire stream O and last 1000 tuples of F as relations Evaluate query, update result relation at T Streamed result:Streamed result: New element (,T) whenever changes from T–1

stanfordstreamdatamanager 20 Abstract Semantics – Example 2 Relation CurPrice(stock, price) Select stock, Avg(price) From Istream(CurPrice) [Range 1 Day] Group By stock Average price over last day for each stock

stanfordstreamdatamanager 21 Abstract Semantics – Example 2 Relation CurPrice(stock, price) Select stock, Avg(price) From Istream(CurPrice) [Range 1 Day] Group By stock Istream provides history of CurPrice Window on history, back to relation, group and aggregate

stanfordstreamdatamanager 22 Concrete Language – CQL Relational query language: SQL Window specification language derived from SQL-99 –Tuple-based windows –Time-based windows –Partitioned windows Simple “X% Sample” construct

stanfordstreamdatamanager 23 CQL (cont’d) Syntactic shortcuts and defaults –So easy queries are easy to write and simple queries do what you expect Equivalences –Basis for query-rewrite optimizations –Includes all relational equivalences, plus new stream-based ones Examples: already seen some, more coming up

stanfordstreamdatamanager 24 Shortcuts and Defaults Prevalent stream-relation conversions can make some queries cumbersome –Easy queries should be easy to write Two defaults: –Omitted window: –Omitted window: Default [  ] –Omitted relation-to-stream operator: –Omitted relation-to-stream operator: Default Istream operator on: Monotonic outermost queries Monotonic subqueries with windows

stanfordstreamdatamanager 25 The Simplest CQL Query Select  From Strm Had better return Strm (It does) –Default [  ] window for Strm –Default Istream for result

stanfordstreamdatamanager 26 Simple Join Query Select  From Strm, Rel Where Strm.A = Rel.B Default [  ] window on Strm, but often want Now window for stream-relation joins Select Istream(O.orderID, A.City) From Orders O, AddressRel A Where O.custID = A.custID

stanfordstreamdatamanager 27 Simple Join Query Select  From Strm, Rel Where Strm.A = Rel.B Default [  ] window on Strm, but often want Now window for stream-relation joins Select Istream(O.orderID, A.City) [  ] From Orders O [  ], AddressRel A Where O.custID = A.custID

stanfordstreamdatamanager 28 Simple Join Query Select  From Strm, Rel Where Strm.A = Rel.B Default [  ] window on Strm, but often want Now window for stream-relation joins Select Istream(O.orderID, A.City) [ Now ] From Orders O [ Now ], AddressRel A Where O.custID = A.custID We decided against a separate default

stanfordstreamdatamanager 29 Equivalences and Transformations All relational equivalences apply to all relational constructs directly –Queries are highly relational Two new transformations –Window reduction –Filter-window commutativity

stanfordstreamdatamanager 30 Window Reduction Select Istream(L) From S [  ] Where C is equivalent to Select Rstream(L) from S [ Now ] Where C Question for classQuestion for class  Why Rstream and not Istream in second query? Answer: Consider stream,,,, …

stanfordstreamdatamanager 31 Window Reduction (cont’d) Select Istream(L) From S [  ] Where C is equivalent to Select Rstream(L) from S [ Now ] Where C First query form is very common due to defaults In a naïve implementation second form is much more efficient

stanfordstreamdatamanager 32 Filter-Window Commutativity Another question for classAnother question for class When is Select L From S [window] Where C equivalent to Select L From (Select L From S Where C) [window] Is this transformation always advantageous?

stanfordstreamdatamanager 33 Constraint-Based Transformations Recall first example query (simplified) Select Sum(O.cost) From Orders O, Fulfillments F [Range 1 Day] Where O.orderID = F.orderID If orders always fulfilled within one week Select Sum(O.cost) [Range 8 Days] From Orders O [Range 8 Days], Fulfillments F [Range 1 Day] Where O.orderID = F.orderID keysstream referential integrityclusteringorderingUseful constraints: keys, stream referential integrity, clustering, ordering

stanfordstreamdatamanager 34 STREAM System First challenge: basic functionality from scratch Next steps – cope with : –Stream ratesvariable –Stream rates that may be high,variable, bursty –Stream dataunpredictable, variable –Stream data that may be unpredictable, variable –Continuous query loadsvariable –Continuous query loads that may be high, variable  Overload  Changing conditions

stanfordstreamdatamanager 35 System Features sharingAggressive sharing of state and computation among registered queries resource allocation and useCareful resource allocation and use self-monitoringreoptimizationContinuous self-monitoring and reoptimization approximationGraceful approximation as necessary

stanfordstreamdatamanager 36 Query Execution query planWhen a continuous query is registered, generate a query plan –New plan merged with existing plans –Users can also create & manipulate plans directly Plans composed of three main components: –Operators –Queues –Queues (input and inter-operator) –State –State (windows, operators requiring history) schedulerGlobal scheduler for plan execution

stanfordstreamdatamanager 37 Simple Query Plan Q1Q1 Q2Q2 State 4 ⋈ State 3  Stream 1 Stream 2 Stream 3 State 1 State 2 ⋈ Scheduler

stanfordstreamdatamanager 38 System Status System is “complete” –30,000 lines of C++ and Java –Multiple Ph.D. theses, undergrad and MS projects Source is available and system is being used Can also use system over internet

stanfordstreamdatamanager 39 Stream System Benchmark: “Linear Road” Stream System Benchmark: “Linear Road” 100 segments of 1 mile each Main Input Stream: Car Locations (CarLocStr) Linear City 10 Expressways Reports every 30 seconds car_idspeedexp_waylanex_pos (Right) (Ramp)4539 ……………

stanfordstreamdatamanager 40 STREAM System Demo Incoming data streams Continuous queries executing over streams Query plan visualizer System monitoring via “introspection” queries Benchmark execution