Download presentation
Presentation is loading. Please wait.
Published byPatrick Sanders Modified over 8 years ago
1
Triggers and Streams Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 28, 2005
2
2 Administrivia Midterms returned – overall, pretty good! Low 78, high 96 Remember you can revise & resubmit for up to 20% back Tomorrow, L101, 3PM: Krishna Gummadi, U. Wash., Measurement-driven Modeling and Design of Internet-scale Systems For Wednesday: Retrospective on Aurora Compare it to STREAM Thursday, L101, 3PM: Muthian Sivathanu, U. Wisc., Semantically Smart Disk Systems
3
3 Today’s Trivia Question
4
4 Making Databases Active Thus far we’ve seen databases treated as static data and direct updates But… … What if we want to “trap” an update and cause it to perform other updates More generally than incremental view maintenance Example: on deletion of a department, delete all entries saying that a student is within the department Or the insertion of a new item should result in an entry in a log … What if we want to respond to some notion of events in the system?
5
5 “Active” Databases: Triggers Basic idea: define rules that 1.Trap events 2.Have access to the data associated with the event, plus the “old” state of the database 3.Can test certain conditions 4.Can apply operations to the system What’s an event? In relational databases: insert, delete, update In general: any operation that can be trapped I/O events, interrupts, signals, …
6
6 Triggers in SQL (based on Starburst) Can trap updates before or after they occur: CREATE TRIGGER t BEFORE UPDATE ON mytable REFERENCING OLD AS oldrow NEW AS newrow REFERENCING OLD_TABLE AS oldtable REFERENCING NEW_TABLE AS currenttable FOR EACH STATEMENT WHEN (oldrow.salary < newrow.salary) BEGIN ATOMIC SET increasedRecently = true END Row variables work just like you’d expect; table variables can be used for querying, but not updating Can have recursive triggers!
7
7 Why Triggers Are Useful 1.Error validation Can define a SIGNAL that induces an SQL error and aborts the operation 2.View updates Can cause updates to a view to be propagated back to base relations (in some DBMSs) 3.Cascading updates, logging SQL DDL’s ON DELETE CASCADE is basically a trigger In some systems, can also trigger arbitrary updates e.g., the TriggerMan system
8
8 TriggerMan Goal: support large-scale use of triggers that can do arbitrary SQL when events occur while (process time remains and work to do) { Get a task from queue and execute: Process a token against rules, OR Run a rule action, OR Process a token against conditions, OR Process a token to run set of actions Yield to other tasks }
9
9 Making It Scale Define an expression signature corresponding to the ON/WHEN conditions, normalized to CNF The signatures shouldn’t have constants – only tree structures Define a trigger cache to keep most recently used triggers around Each update gets an update descriptor with op type, data source, old & new tuples Define a predicate index to quickly find all predicates matching an update descriptor For each trigger, only index its most selective predicate A constant table is used to list the different constants that need to be tested against – for equality tests
10
10 Data Structures in Detail
11
11 Active Databases An interesting way to program fairly complex operations and embed them into a DBMS … Especially true if we can run user-defined functions! Why don’t we see full-fledged apps written this way? It is Turing-complete…
12
12 A Variation on the Model: Streams An interesting class of applications exists where data is constantly changing, and we want to update our output accordingly Publish-subscribe systems Stock tickers, news headlines Data acquisition, e.g., from sensors, traffic monitoring, … In general, we want “live” output based on changing input This has been called many things: pub/sub, continuous queries, … In general, these have been eclipsed by the term “stream processing”
13
13 What’s a Stream, and What Do We Do with It? A stream is a time-varying series of values of a particular data type In STREAM, they consider instead a set of values with timestamps – how does this differ? What kinds of operations might we perform over changing data? Aggregation: Over a time window, or a series of values Last value for each key Some combination thereof Joins … But over what? What about approximation? Why might that be useful?
14
14 STREAM’s Model: the CQL Language An attempt to extend SQL to handle streams – not to invent a language from the ground up Thus it’s a bit quirky In CQL, everything is built around instantaneous relations, which are time-varying bags of tuples Relation-relation operators (normal SQL) Stream-relation operators (convert to relations) Relation-stream operators (convert instantaneous to streams) No stream-stream operators!
15
15 Converting between Streams & Relations Stream-to-relation operators: Sliding window: tuple-based (last N rows) or time-based (within time range) Partitioned sliding window: does grouping by keys, then does sliding window over that Is this necessary or minimal? Relation-to-stream operators: Istream: stream-ifies any insertions over a relation Dstream: stream-ifies the deletes Rstream: stream contains the set of tuples in the relation
16
16 Some Examples Select * From S1 [Rows 1000], S2 [Range 2 minutes] Where S1.A = S2.A And S1.A > 10 Select Rstream(S.A, R.B) From S [Now], R Where S.A = R.A
17
17 Building a Stream System Basic data item is the element: where op 2 {+, -} Query plans need a few new (?) items: Queues Used for hooking together operators, esp. over windows (Assumption is that pipelining is generally not possible, and we may need to drop some tuples from the queue) Synopses The intermediate state an operator needs to carry around Note that this is usually bounded by windows
18
18 Example Query Plan What’s different here?
19
19 Some Tricks for Performance Sharing synopses across multiple operators In a few cases, more than one operator may join with the same synopsis Can exploit punctuations or “k-constraints” Analogous to interesting orders Referential integrity k-constraint: bound of k between arrival of “many” element and its corresponding “one” element Ordered-arrival k-constraint: need window of at most k to sort Clustered-arrival k-constraint: bound on distance between items with same grouping attributes
20
20 Query Processing – “Chain Scheduling” Similar in many ways to eddies May decide to apply operators as follows: Assume we know how many tuples can be processed in a time unit Cluster groups of operators into “chains” that maximize reduction in queue size per unit time Greedily forward tuples into the most selective chain Within a chain, process in FIFO order They also do a form of join reordering
21
21 Scratching the Surface: Approximation They point out two areas where we might need to approximate output: CPU is limited, and we need to drop some stream elements according to some probabilistic metric Collect statistics via a profiler Use Hoeffding inequality to derive a sampling rate in order to maintain a confidence interval May need to do similar things if memory usage is a constraint Are there other options? When might they be useful?
22
22 Next Time We’ll see the Aurora project from MIT, Brown, and Brandeis It takes a different approach to the query processing aspects of stream processing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.