Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
1 Constraints, Triggers and Active Databases Chapter 9.
1 11. Streaming Data Management Chapter 18 Current Issues: Streaming Data and Cloud Computing The 3rd edition of the textbook.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
SQL Constraints and Triggers
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 26, 2005 Content on hashing.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Query Processing, Resource Management, and Approximation in a Data Stream Management System Selected subset of slides taken from talk by Jennifer Widom.
Stream Processing Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 30, 2005.
Building a Data Stream Management System Prof. Jennifer Widom Joint project with Prof. Rajeev Motwani and a team of graduate studentshttp://www-db.stanford.edu/stream.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5 (cont.)
Database Systems More SQL Database Design -- More SQL1.
Stream Data Management System Prototypes Ying Sheng, Richard Sia June 1, 2004 Professor Carlo Zaniolo CS 240B Spring 2004.
1 Chapter 2 Reviewing Tables and Queries. 2 Chapter Objectives Identify the steps required to develop an Access application Specify the characteristics.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
10/3/2000SIMS 257: Database Management -- Ray Larson Relational Algebra and Calculus University of California, Berkeley School of Information Management.
Sorting and Query Processing Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 29, 2005.
Cs3431 Triggers vs Constraints Section 7.5. cs3431 Triggers (Make DB Active) Trigger: A procedure that starts automatically if specified changes occur.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Chapter 6: Integrity and Security Thomas Nikl 19 October, 2004 CS157B.
STREAM The Stanford Data Stream Management System.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Query Processing, Resource Management, and Approximation in a Data Stream Management System.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 18, 2008 Content on hashing.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
DAY 12: DATABASE CONCEPT Tazin Afrin September 26,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
DBMS Implementation Chapter 6.4 V3.0 Napier University Dr Gordon Russell.
1 STREAM: The Stanford Data Stream Management System STanfordstREamdatAManager 陳盈君 吳哲維 林冠良.
M1G Introduction to Database Development 5. Doing more with queries.
Triggers. Why Triggers ? Suppose a warehouse wishes to maintain a minimum inventory of each item. Number of items kept in items table Items(name, number,...)
Stream and Sensor Data Management Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems November 17, 2008.
Topics Related to Attribute Values Objectives of the Lecture : To consider sorting relations by attribute values. To consider Triggers and their use for.
1 CS 430 Database Theory Winter 2005 Lecture 4: Relational Model.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Session 1 Module 1: Introduction to Data Integrity
Query Processing CS 405G Introduction to Database Systems.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
Database Management COP4540, SCS, FIU Database Trigger.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Constraints and Views Chap. 3-5 continued (7 th ed. 5-7)
SQL Triggers, Functions & Stored Procedures Programming Operations.
SQL Basics Review Reviewing what we’ve learned so far…….
Copyright © 2004 Pearson Education, Inc.. Chapter 24 Enhanced Data Models for Advanced Applications.
1 Constraints and Triggers in SQL. 2 Constraints are conditions that must hold on all valid relation instances SQL2 provides a variety of techniques for.
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
Data Integrity & Indexes / Session 1/ 1 of 37 Session 1 Module 1: Introduction to Data Integrity Module 2: Introduction to Indexes.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
S. Sudarshan CS632 Course, Mar 2004 IIT Bombay
Chapter 6: Integrity (and Security)
COMP3211 Advanced Databases
P2P Integration, Concluded, and Data Stream Processing
Evaluation of Relational Operations: Other Techniques
Self-organizing Tuple Reconstruction in Column-stores
Presentation transcript:

Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005

2 Administrivia  Midterms returned – overall, pretty good!  Low 78, high 96  Remember you can revise & resubmit for up to 20% back  Tomorrow, L101, 3PM:  Krishna Gummadi, U. Wash., Measurement-driven Modeling and Design of Internet-scale Systems  For Wednesday:  Retrospective on Aurora  Compare it to STREAM  Thursday, L101, 3PM:  Muthian Sivathanu, U. Wisc., Semantically Smart Disk Systems

3 Today’s Trivia Question

4 Making Databases Active  Thus far we’ve seen databases treated as static data and direct updates  But…  … What if we want to “trap” an update and cause it to perform other updates  More generally than incremental view maintenance  Example: on deletion of a department, delete all entries saying that a student is within the department  Or the insertion of a new item should result in an entry in a log  … What if we want to respond to some notion of events in the system?

5 “Active” Databases: Triggers  Basic idea: define rules that 1.Trap events 2.Have access to the data associated with the event, plus the “old” state of the database 3.Can test certain conditions 4.Can apply operations to the system  What’s an event?  In relational databases: insert, delete, update  In general: any operation that can be trapped  I/O events, interrupts, signals, …

6 Triggers in SQL (based on Starburst)  Can trap updates before or after they occur:  CREATE TRIGGER t BEFORE UPDATE ON mytable REFERENCING OLD AS oldrow NEW AS newrow REFERENCING OLD_TABLE AS oldtable REFERENCING NEW_TABLE AS currenttable FOR EACH STATEMENT WHEN (oldrow.salary < newrow.salary) BEGIN ATOMIC SET increasedRecently = true END  Row variables work just like you’d expect; table variables can be used for querying, but not updating  Can have recursive triggers!

7 Why Triggers Are Useful 1.Error validation  Can define a SIGNAL that induces an SQL error and aborts the operation 2.View updates  Can cause updates to a view to be propagated back to base relations (in some DBMSs) 3.Cascading updates, logging  SQL DDL’s ON DELETE CASCADE is basically a trigger  In some systems, can also trigger arbitrary updates  e.g., the TriggerMan system

8 TriggerMan  Goal: support large-scale use of triggers that can do arbitrary SQL when events occur while (process time remains and work to do) { Get a task from queue and execute: Process a token against rules, OR Run a rule action, OR Process a token against conditions, OR Process a token to run set of actions Yield to other tasks }

9 Making It Scale  Define an expression signature corresponding to the ON/WHEN conditions, normalized to CNF  The signatures shouldn’t have constants – only tree structures  Define a trigger cache to keep most recently used triggers around  Each update gets an update descriptor with op type, data source, old & new tuples  Define a predicate index to quickly find all predicates matching an update descriptor  For each trigger, only index its most selective predicate  A constant table is used to list the different constants that need to be tested against – for equality tests

10 Data Structures in Detail

11 Active Databases  An interesting way to program fairly complex operations and embed them into a DBMS  … Especially true if we can run user-defined functions!  Why don’t we see full-fledged apps written this way? It is Turing-complete…

12 A Variation on the Model: Streams  An interesting class of applications exists where data is constantly changing, and we want to update our output accordingly  Publish-subscribe systems  Stock tickers, news headlines  Data acquisition, e.g., from sensors, traffic monitoring, …  In general, we want “live” output based on changing input  This has been called many things: pub/sub, continuous queries, …  In general, these have been eclipsed by the term “stream processing”

13 What’s a Stream, and What Do We Do with It?  A stream is a time-varying series of values of a particular data type  In STREAM, they consider instead a set of values with timestamps – how does this differ?  What kinds of operations might we perform over changing data?  Aggregation:  Over a time window, or a series of values  Last value for each key  Some combination thereof  Joins  … But over what?  What about approximation? Why might that be useful?

14 STREAM’s Model: the CQL Language  An attempt to extend SQL to handle streams – not to invent a language from the ground up  Thus it’s a bit quirky  In CQL, everything is built around instantaneous relations, which are time-varying bags of tuples  Relation-relation operators (normal SQL)  Stream-relation operators (convert to relations)  Relation-stream operators (convert instantaneous to streams)  No stream-stream operators!

15 Converting between Streams & Relations  Stream-to-relation operators:  Sliding window: tuple-based (last N rows) or time-based (within time range)  Partitioned sliding window: does grouping by keys, then does sliding window over that  Is this necessary or minimal?  Relation-to-stream operators:  Istream: stream-ifies any insertions over a relation  Dstream: stream-ifies the deletes  Rstream: stream contains the set of tuples in the relation

16 Some Examples  Select * From S1 [Rows 1000], S2 [Range 2 minutes] Where S1.A = S2.A And S1.A > 10  Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A

17 Building a Stream System  Basic data item is the element:  where op 2 {+, -}  Query plans need a few new (?) items:  Queues  Used for hooking together operators, esp. over windows  (Assumption is that pipelining is generally not possible, and we may need to drop some tuples from the queue)  Synopses  The intermediate state an operator needs to carry around  Note that this is usually bounded by windows

18 Example Query Plan What’s different here?

19 Some Tricks for Performance  Sharing synopses across multiple operators  In a few cases, more than one operator may join with the same synopsis  Can exploit punctuations or “k-constraints”  Analogous to interesting orders  Referential integrity k-constraint: bound of k between arrival of “many” element and its corresponding “one” element  Ordered-arrival k-constraint: need window of at most k to sort  Clustered-arrival k-constraint: bound on distance between items with same grouping attributes

20 Query Processing – “Chain Scheduling”  Similar in many ways to eddies  May decide to apply operators as follows:  Assume we know how many tuples can be processed in a time unit  Cluster groups of operators into “chains” that maximize reduction in queue size per unit time  Greedily forward tuples into the most selective chain  Within a chain, process in FIFO order  They also do a form of join reordering

21 Scratching the Surface: Approximation  They point out two areas where we might need to approximate output:  CPU is limited, and we need to drop some stream elements according to some probabilistic metric  Collect statistics via a profiler  Use Hoeffding inequality to derive a sampling rate in order to maintain a confidence interval  May need to do similar things if memory usage is a constraint  Are there other options? When might they be useful?

22 Next Time  We’ll see the Aurora project from MIT, Brown, and Brandeis  It takes a different approach to the query processing aspects of stream processing