Presentation is loading. Please wait.

Presentation is loading. Please wait.

Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005.

Similar presentations


Presentation on theme: "Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005."— Presentation transcript:

1 Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005

2 2 Administrivia  Midterms returned – overall, pretty good!  Low 78, high 96  Remember you can revise & resubmit for up to 20% back  Tomorrow, L101, 3PM:  Krishna Gummadi, U. Wash., Measurement-driven Modeling and Design of Internet-scale Systems  For Wednesday:  Retrospective on Aurora  Compare it to STREAM  Thursday, L101, 3PM:  Muthian Sivathanu, U. Wisc., Semantically Smart Disk Systems

3 3 Today’s Trivia Question

4 4 Making Databases Active  Thus far we’ve seen databases treated as static data and direct updates  But…  … What if we want to “trap” an update and cause it to perform other updates  More generally than incremental view maintenance  Example: on deletion of a department, delete all entries saying that a student is within the department  Or the insertion of a new item should result in an entry in a log  … What if we want to respond to some notion of events in the system?

5 5 “Active” Databases: Triggers  Basic idea: define rules that 1.Trap events 2.Have access to the data associated with the event, plus the “old” state of the database 3.Can test certain conditions 4.Can apply operations to the system  What’s an event?  In relational databases: insert, delete, update  In general: any operation that can be trapped  I/O events, interrupts, signals, …

6 6 Triggers in SQL (based on Starburst)  Can trap updates before or after they occur:  CREATE TRIGGER t BEFORE UPDATE ON mytable REFERENCING OLD AS oldrow NEW AS newrow REFERENCING OLD_TABLE AS oldtable REFERENCING NEW_TABLE AS currenttable FOR EACH STATEMENT WHEN (oldrow.salary < newrow.salary) BEGIN ATOMIC SET increasedRecently = true END  Row variables work just like you’d expect; table variables can be used for querying, but not updating  Can have recursive triggers!

7 7 Why Triggers Are Useful 1.Error validation  Can define a SIGNAL that induces an SQL error and aborts the operation 2.View updates  Can cause updates to a view to be propagated back to base relations (in some DBMSs) 3.Cascading updates, logging  SQL DDL’s ON DELETE CASCADE is basically a trigger  In some systems, can also trigger arbitrary updates  e.g., the TriggerMan system

8 8 TriggerMan  Goal: support large-scale use of triggers that can do arbitrary SQL when events occur while (process time remains and work to do) { Get a task from queue and execute: Process a token against rules, OR Run a rule action, OR Process a token against conditions, OR Process a token to run set of actions Yield to other tasks }

9 9 Making It Scale  Define an expression signature corresponding to the ON/WHEN conditions, normalized to CNF  The signatures shouldn’t have constants – only tree structures  Define a trigger cache to keep most recently used triggers around  Each update gets an update descriptor with op type, data source, old & new tuples  Define a predicate index to quickly find all predicates matching an update descriptor  For each trigger, only index its most selective predicate  A constant table is used to list the different constants that need to be tested against – for equality tests

10 10 Data Structures in Detail

11 11 Active Databases  An interesting way to program fairly complex operations and embed them into a DBMS  … Especially true if we can run user-defined functions!  Why don’t we see full-fledged apps written this way? It is Turing-complete…

12 12 A Variation on the Model: Streams  An interesting class of applications exists where data is constantly changing, and we want to update our output accordingly  Publish-subscribe systems  Stock tickers, news headlines  Data acquisition, e.g., from sensors, traffic monitoring, …  In general, we want “live” output based on changing input  This has been called many things: pub/sub, continuous queries, …  In general, these have been eclipsed by the term “stream processing”

13 13 What’s a Stream, and What Do We Do with It?  A stream is a time-varying series of values of a particular data type  In STREAM, they consider instead a set of values with timestamps – how does this differ?  What kinds of operations might we perform over changing data?  Aggregation:  Over a time window, or a series of values  Last value for each key  Some combination thereof  Joins  … But over what?  What about approximation? Why might that be useful?

14 14 STREAM’s Model: the CQL Language  An attempt to extend SQL to handle streams – not to invent a language from the ground up  Thus it’s a bit quirky  In CQL, everything is built around instantaneous relations, which are time-varying bags of tuples  Relation-relation operators (normal SQL)  Stream-relation operators (convert to relations)  Relation-stream operators (convert instantaneous to streams)  No stream-stream operators!

15 15 Converting between Streams & Relations  Stream-to-relation operators:  Sliding window: tuple-based (last N rows) or time-based (within time range)  Partitioned sliding window: does grouping by keys, then does sliding window over that  Is this necessary or minimal?  Relation-to-stream operators:  Istream: stream-ifies any insertions over a relation  Dstream: stream-ifies the deletes  Rstream: stream contains the set of tuples in the relation

16 16 Some Examples  Select * From S1 [Rows 1000], S2 [Range 2 minutes] Where S1.A = S2.A And S1.A > 10  Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A

17 17 Building a Stream System  Basic data item is the element:  where op 2 {+, -}  Query plans need a few new (?) items:  Queues  Used for hooking together operators, esp. over windows  (Assumption is that pipelining is generally not possible, and we may need to drop some tuples from the queue)  Synopses  The intermediate state an operator needs to carry around  Note that this is usually bounded by windows

18 18 Example Query Plan What’s different here?

19 19 Some Tricks for Performance  Sharing synopses across multiple operators  In a few cases, more than one operator may join with the same synopsis  Can exploit punctuations or “k-constraints”  Analogous to interesting orders  Referential integrity k-constraint: bound of k between arrival of “many” element and its corresponding “one” element  Ordered-arrival k-constraint: need window of at most k to sort  Clustered-arrival k-constraint: bound on distance between items with same grouping attributes

20 20 Query Processing – “Chain Scheduling”  Similar in many ways to eddies  May decide to apply operators as follows:  Assume we know how many tuples can be processed in a time unit  Cluster groups of operators into “chains” that maximize reduction in queue size per unit time  Greedily forward tuples into the most selective chain  Within a chain, process in FIFO order  They also do a form of join reordering

21 21 Scratching the Surface: Approximation  They point out two areas where we might need to approximate output:  CPU is limited, and we need to drop some stream elements according to some probabilistic metric  Collect statistics via a profiler  Use Hoeffding inequality to derive a sampling rate in order to maintain a confidence interval  May need to do similar things if memory usage is a constraint  Are there other options? When might they be useful?

22 22 Next Time  We’ll see the Aurora project from MIT, Brown, and Brandeis  It takes a different approach to the query processing aspects of stream processing


Download ppt "Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005."

Similar presentations


Ads by Google