Presentation is loading. Please wait.

Presentation is loading. Please wait.

Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December.

Similar presentations


Presentation on theme: "Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December."— Presentation transcript:

1 Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 4, 2003 Slide content courtesy of Susan Davidson, Raghu Ramakrishnan & Johannes Gehrke

2 2 Final Administrivia  Project demos today and tomorrow  Final exam handed out at the end of today’s class  Finals plus project reports due by 1PM, 12/18/2003  Project reports should be ballpark 10-15 pages  Remember, quality and clarity of presentation matters!  Also, email me a brief message detailing:  Your contributions to the project  Your group members’ contributions and your assessment of “group dynamics”  Turn in at my office, 576 Levine Hall or to my assistant, Kathy Venit, in 308 Levine Hall

3 3 Last Time…  We were discussing isolation levels  How to keep transactions from interfering with one another  Or at least, how to minimize this  Recall the strongest version of isolation was serializability

4 4 Theory of Serializability  A schedule of a set of transactions is a linear ordering of their actions  e.g. for the simultaneous deposits example: R1(X.bal) R2(X.bal) W1(X.bal) W2(X.bal)  A serial schedule is one in which all the steps of each transaction occur consecutively  A serializable schedule is one which is equivalent to some serial schedule (i.e. given any initial state, the final state is the same as one produced by some serial schedule)  The example above is neither serial nor serializable

5 5 Questions of Concern  Given a schedule S, is it serializable?  How can we "restrict" transactions in progress to guarantee that only serializable schedules are produced?

6 6 Conflicting Actions  Consider a schedule S in which there are two consecutive actions I i and I j of transactions T i and T j respectively  If I i and I j refer to different data items, then swapping I i and I j does not matter  If I i and I j refer to the same data item Q, then swapping I i and I j matters if and only if one of the actions is a write  Ri(Q) Wj(Q) produces a different final value for Q than Wj(Q) Ri(Q)

7 7 Testing for Serializability  Given a schedule S, we can construct a di-graph G=(V,E) called a precedence graph  V : all transactions in S  E : T i  T j whenever an action of T i precedes and conflicts with an action of T j in S  Theorem: A schedule S is conflict serializable if and only if its precedence graph contains no cycles  Note that testing for a cycle in a digraph can be done in time O(|V|2)

8 8 An Example T1 T2 T3 R(X,Y,Z) R(X) W(X) R(Y) W(Y) R(Y) R(X) W(Z) T1 T2 T3 Cyclic: Not serializable.

9 9 Another Example T1 T2 T3 R(X) W(X) R(X) W(X) R(Y) W(Y) R(Y) W(Y) T1 T2 T3 Acyclic: serializable

10 10 Producing the Equivalent Serial Schedule  If the precedence graph for a schedule is acyclic, then an equivalent serial schedule can be found by a topological sort of the graph  For the second example, the equivalent serial schedule is:  R1(Y)W1(Y) R2(X)W2(X) R2(Y)W2(Y) R3(X)W3(X)

11 11 Locking and Serializability  We said that for a serializable schedule, a transaction must hold all locks until it terminates (a condition called strict locking)  It turns out that this is crucial to guarantee serializability  Note that the first (bad) example could have been produced if transactions acquired and immediately released locks.

12 12 Well-Formed, Two-Phased Transactions  A transaction is well-formed if it acquires at least a shared lock on Q before reading Q or an exclusive lock on Q before writing Q and doesn’t release the lock until the action is performed  Locks are also released by the end of the transaction  A transaction is two-phased if it never acquires a lock after unlocking one  i.e., there are two phases: a growing phase in which the transaction acquires locks, and a shrinking phase in which locks are released

13 13 Two-Phased Locking Theorem  If all transactions are well-formed and two-phase, then any schedule in which conflicting locks are never granted ensures serializability  i.e., there is a very simple scheduler!  However, if some transaction is not well-formed or two-phase, then there is some schedule in which conflicting locks are never granted but which fails to be serializable  i.e., one bad apple spoils the bunch.

14 14 Summary of Transactions  Transactions are all-or-nothing units of work guaranteed despite concurrency or failures in the system  Theoretically, the “correct” execution of transactions is serializable (i.e. equivalent to some serial execution)  Practically, this may adversely affect throughput  isolation levels  With isolation levels, users can specify the level of “incorrectness” they are willing to tolerate

15 15 What to Look for Down the Road  … well, no one really knows the answer to this…  … But here are some hints, ideas, and hot directions  Sensors and streaming data  Peer-to-peer meets databases  “The Semantic Web”  Collaborative data sharing

16 16 Sensors and Streaming Data  No databases at all…  … Instead we have networks of simple sensors  Madden, starting at MIT  Gehrke, Cornell  Widom, Stanford  queries are in SQL  data is live and “streaming”  we compute aggregates over “windows”

17 17 What’s Interesting Here  We’re not talking about data on disk – we’re talking about queries over “current readings”  Sensors are generally “stupid” and may be battery-operated  A lot of challenges are networking-related: how to aggregate data before it gets sent, etc.  The next step (e.g., work initiated here @ Penn): including sensors that capture images – a very different problem!  This has many more compelling applications – security, monitoring, correlating multiple sensors, rescue operations, military logistics and coordination, etc.

18 18 Peer-to-Peer Computing  Fundamentally, our model of DBMSs tends to be centralized  Even for data integration: there’s a single mediator  This has many implications: central administration, central coordination, etc.  What can be gained from borrowing a page from peer-to- peer systems like Napster, Kazaa, etc.?  A better architecture?  Solutions to many problems unsolved by distributed DBMSs?  Replication, object location, distributed optimization, resiliency to failure, …  New types of applications, e.g., in integration?

19 19 P2P Work  As a new architecture for storage and querying  PIER (Berkeley), P-Grid (EPFL), Medusa (MIT)  A better way of thinking about translating and exchanging data  Piazza (Washington), Orchestra (Penn), Hyperion (Toronto), work at Trento

20 20 The Semantic Web  In some ways, a very “pie-in-the-sky” vision  But some real and concrete problems might be partly solvable  Goal is really very similar to data integration, where somehow we have mappings between the schemas  Currently, most people in the SW community are from knowledge representation community and use RDF  Focus: very rich ways of describing schemas – “ontologies” – that blend querying with class definitions  “Teachers are people who teach students” “Tenure-track professors are teachers at universities who can get tenure”; etc.  Implicit take on the problem: if we create better languages for describing ontologies, it’s easier to mediate between schemas

21 21 Holes in the Semantic Web  What issues and concerns came up in the data integration assignment you had?  Do you think a richer schema language would help for these?  Do you think “better normalization” would help?  Fundamentally, we need:  Languages for not only describing relationships, but transformations between formats (e.g., XML schemas)  Automatic or partly automated ways of discovering mappings and correspondences  These are all database problems, and the solution likely must come from the DB community  This is part of what P2P systems like Piazza, Hyperion try to address

22 22 My Take on the Future  We’ve evolved from a world where data management is about controlling the data  Instead, data management is about translating and transforming data using declarative languages  It should ultimately become much like TCP or SOAP – a set of standard services for “getting stuff” from one point to another, or from one form to another  It’s the plumbing that connects different applications using different formats  Orchestra project at Penn: focuses on how to build a system for supporting collaborative science  People publish and map data in different schemas  What happens if people start updating it?  How do you propagate, manage, trace, reconcile changes?


Download ppt "Transactions, Concluded, and the Future of Data Management Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December."

Similar presentations


Ads by Google