Download presentation
Presentation is loading. Please wait.
1
Chapter 13 An Overview of Transaction Processing
2
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-2 Transactions Many enterprises use databases to store information about their state –E.g., balances of all depositors When an event occurs in the real world that changes the state of the enterprise, a program is executed to change the database state in a corresponding way –E.g., balance must be updated when you deposit Such a program is called a transaction
3
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-3 ACID Properties of Transactions Transaction execution must maintain the correctness of the database model Therefore additional requirements are placed on the execution of transactions beyond those placed on ordinary programs –Atomicity –Consistency –Isolation –Durability
4
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-4 ACID Properties Atomic - Transaction should either complete or have no effect at all –Responsibility of transaction processing system Consistent - Transaction should correctly transform the database state to reflect the effect of a real world event –Responsibility of transaction designer
5
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-5 ACID Properties (con’t) Isolation - The effect of concurrently executing a set of transactions is the same as if they had executed serially (serializable) –Responsibility of transaction processing system Durable - The effect of a transaction on the database state should not be lost once the transaction has committed –Responsibility of transaction processing system
6
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-6 Isolation Serial execution of a set of (consistent) transactions is correct, but performance might be inadequate Concurrent (interleaved) execution of a set of transactions offers performance benefits, but might not be correct
7
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-7 Interleaved Execution
8
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-8 Serializable Schedules The concurrent schedule S: r 1 (x) w 2 (z) w 1 (y) is equivalent to the serial schedules of T1 and T2 in either order: –T1, T2: r 1 (x) w 1 (y) w 2 (z) and –T2, T1: w 2 (z) r 1 (x) w 1 (y) since operations of distinct transactions on different data items commute. Hence, S is a serializable schedule
9
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-9 Serializable Schedules The concurrent schedule S: r 1 (z) r 2 (q) w 2 (z) r 1 (q) w 1 (y) is equivalent to the serial schedule T1, T2: r 1 (z) r 1 (q) w 1 (y) r 2 (q) w 2 (z) since read operations of distinct transactions on the same data item commute. Hence, S is a serializable schedule S is not equivalent to T2, T1 since read and write operations (or two write operations) of distinct transactions on the same data item do not commute
10
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-10 Non-Serializable Schedule Example: course registration; cur_reg is the number of current registrants T1: r(cur_reg : 29) w(cur_reg : 30) T2: r(cur_reg : 29) w(cur_reg : 30) Schedule not equivalent to T1, T2 or T2, T1 Database state no longer corresponds to real-world state, integrity constraint violated
11
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-11 Commutativity Two operations commute if, when executed in either order: –The values returned by both are the same and –The database is left in the same final state Two schedules are equivalent if one can be derived from the other by a series of simple interchanges of commutative operations A schedule is serializable if it is equivalent to a serial schedule
12
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-12 Concurrency Control Performance requirements might not be achievable if schedules are serializable In addition to serializable, DBMSs implement less stringent isolation levels –Serializable schedules correct for all applications –Less stringent levels do not guarantee correctness for all applications, but are correct for some The concurrency control of a DBMS is responsible for implementing isolation levels Application programmer is responsible for choosing appropriate level
13
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-13 Implementing Serializability: Two-Phase Locking Locks are associated with each data item A transaction must acquire a read (shared) or write (exclusive) lock on an item in order to read or write it A write lock on an item conflicts with all other locks on the item; a read lock conflicts with a write lock If T1 requests a lock on x and T2 holds a conflicting lock on x, T1 must wait
14
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-14 Lock Release Two-Phase locking: All locks are acquired before any lock is released Phase 1 Phase 2 Strict: Transaction holds all locks until completion t locks t
15
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-15 Correctness of Strict Two-Phase Locking Intuition: Active transactions cannot have executed operations that do not commute (since locks required for non-commutative operations conflict) Hence, a schedule produced by a two-phase locking concurrency control is serializable since operations of concurrent transactions can always be reordered to produce a serial schedule
16
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-16 Non-Strict Concurrency Controls Non-strict controls: locks can be released before completion Problem: (Bank account example) w 1 (Bal) u 1 (Bal) r 2 (Bal) w 2 (Cred-Lim) commit 2 abort 1 –Although abort 1 rolls Bal back, the new value of Cred- Lim might have been affected The new credit limit might have been based on a deposit that never happened –T1 has an effect even though it is aborted –Hence, atomicity is violated
17
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-17 Anomalies in Non-Serializable Schedules Dirty read (previous example – write lock given up early) w 1 (x) r 2 (x) abort 1 Non-Repeatable Read (read lock given up early) r 1 (x) w 2 (x) commit 2 r 1 (x) Lost Update (result of non-repeatable read – read lock given up early) –Two transactions trying to deposit in the same bank account – the deposit of transaction 2 is lost r 1 (Bal) r 2 (Bal) w 2 (Bal) commit 2 w 1 (Bal) commit 1
18
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-18 Deadlock When a transaction can hold locks and request another lock (e.g., in two-phase locking), a cycle of waiting transactions can result: –Suppose two transactions are both trying to update the value of x (for example to deposit in the same bank account) r 1 (x) r 2 (x) request_w 1 (x) request_w 2 (x) A transaction in the cycle must be aborted by DBMS (since transactions will wait forever) DBMS uses deadlock detection algorithms or timeout to deal with this
19
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-19 Locking in Relational Databases Locking entire table restricts concurrency Locking only rows returned yields new anomaly SELECT * FROM Transcript T WHERE T.CrsCode = ‘CS305’ AND T.Semester = ‘F2000’ T1: execute SELECT T2: insert a new row satisfying WHERE clause T1: execute SELECT again Inserted row is called a phantom Strict two-phase row locking does not prevent phantoms
20
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-20 ANSI Standard Isolation Levels Defined in terms of anomalies –Anomaly prohibited at one level is also prohibited at all higher levels –READ UNCOMMITTED : all anomalies possible –READ COMMITTED : dirty read prohibited –REPEATABLE READ : reads of individual tuples are repeatable (but phantoms are possible) –SERIALIZABLE : phantoms prohibited; transaction execution is serializable
21
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-21 Locks in Relational Databases DBMS guarantees that each SQL statement is isolated Early (non-strict) lock release used to implement levels –Short-term locks - held for duration of single statement –Long-term locks - held until transaction completes (strict) At all levels, transactions obtain long-term write locks
22
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-22 Locking Implementation of Isolation Levels READ UNCOMMITTED - no read locks (dirty reads possible since transaction can read a write- locked item) READ COMMITTED - short-term read locks on rows (non-repeatable reads possible since transaction releases read lock after reading) REPEATABLE READ - long-term read locks on rows (phantoms possible) SERIALIZABLE - combination of table, row, and index locks
23
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-23 Snapshot Isolation Not an ANSI standard isolation level, but used in a major DBMS (Oracle) Multiversion database: The old value of an item is not overwritten when it is updated. Instead, a new version is created –DBMS can construct, for any i, the state of an item as a result of the execution of the first i transactions to commit –Snapshot: The database state produced by the execution of the first i transactions to commit (snapshots never change)
24
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-24 Definition of Snapshot Isolation No read locks necessary: a transaction reads all values from latest snapshot at time it started. Thus, read/only transactions do not wait. A (read/write) transaction T that has updated x can commit if no other transaction that updated x while T was executing has committed –First-committer-wins
25
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-25 Snapshot Isolation Evaluation Good performance Serializability not guaranteed: Implementation complicated by need to maintain multiversion database. Eventually old versions must be discarded (creates problems for long-running transactions) r 1 (x) r 1 (y) r 2 (x) r 2 (y) w 1 (x) w 2 (y)
26
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-26 Lock Granularity Cost of implementing locks: –Space: data structure in DBMS for each lock –Time: handling of lock request and release Locks can be associated with different size items: row (fine granularity), page, table (coarse granularity) Tradeoff: Coarse granularity locks have lower cost than fine granularity locks, but they reduce concurrency
27
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-27 Multiple Granularity Locking Some DBMSs implement a lock granularity hierarchy based on containment –E.g.,a read lock on a table implies read locks on all the table’s pages (or rows) Problem: A read lock on a table should conflict with a write lock on a page (or row) in the table –How does DBMS detect this since different items are involved? Solution: Intentions locks
28
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-28 Intentions Locks Before being granted a read (S) or write (X) lock on an item, a transaction must acquire an intention read (IS) or intention write (IX) lock on the containing item IS and IX locks do not conflict with each other Conflicts between locks at different levels in hierarchy detected at the intentions level –E.g., To update a row, T1 gets an IX lock on the table (and an X lock on the page containing the row). To read all rows in the table, T2 gets an S lock on the table. R and IX locks conflict.
29
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-29 Implementing Serializability Phantom: T1 reads table R using predicate P. T2 inserts row t that satisfies P –T2 needs IX lock on R and X lock on page containing t –If no index available, T1 holds S lock on R in order to scan entire table. Thus no problem –If an index is available, T1 gets IS lock on R and, using the index, gets S locks only on pages containing rows satisfying P. Thus locking does not prevent phantoms since IS and IX do not conflict and t might be inserted in a different page
30
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-30 Implementing Serializability Solution: Lock the index. In addition: –T1 gets IS lock on index, and S locks on index pages it scans –Since t satisfies P, the index entry for t must be inserted in one of those pages. T2 gets IX lock on index (this is a slight simplification) and X lock on that page. Hence, conflict is detected
31
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-31 Atomicity and Durability Atomicity deals with failure: –User aborts transaction (e.g., cancel button) –System aborts transaction (e.g., deadlock) –Transaction aborts itself (e.g., unexpected db state) –System crashes Durability deals with failure: –Media failure Mechanism for dealing with failures is the log
32
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-32 Log Log: –Append-only sequence of records used to restore database to a consistent state after a failure. –Stored on non-volatile device distinct from mass storage device that contains database Survives processor crash and media failure Update record: –Appended to log when a transaction updates an item –Contains before image: value of item prior to update –Used to restore item when transaction is aborted
33
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-33 Aborting a Transaction Transaction abort: –Scan log backward; apply before image in each of the transaction’s update records to database items to restore them to their original state. Begin Record terminates scan
34
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-34 Log B T1 U T1 B T2 U T1 U T2 U T1 U T2 End of scan when T2 is aborted B - begin U - update End of log when T2 is aborted
35
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-35 Recovery From Crash Crash: –Active transactions must be identified and aborted when system recovers –Commit and Abort Records identify completed transactions. If, during a backward log scan, the first record encountered for T is an update record, T was active at time of crash and must be rolled back
36
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-36 Log B T1 U T1 B T2 U T1 U T2 U T1 B T3 A T2 U T3 C T3 B T4 End of scan for crash recovery U T4 End of log when crash occurs; roll back T1 and T4 on recovery B - begin U - update C - commit A - abort
37
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-37 Commit Transaction is not committed until its commit record is in the log –A crash at any time before that causes transaction to be rolled back
38
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-38 Write-Ahead Log Both the log and database must be updated when a transaction modifies an item. If a crash occurs between updates, abort the transaction –Database updated first - On recovery, item is in the new state but there is no before image to roll it back. Transaction cannot be aborted. –Log updated first - On recovery, item in old state and before image in log. Use of before image has no effect, but transaction can be aborted Update record in log must be written-ahead of update to item in database
39
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-39 Practical Database Systems Two writes to mass store for each database update implies intolerable performance Real world: –DBMS maintains cache of recently accessed pages in memory. Most accesses are to cache. Pages which have been updated eventually written to disk –DBMS maintains log buffer in memory. Records appended to buffer until it fills; then buffer written to log –Maintaining write-ahead feature more complex when buffers taken into account
40
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-40 DBMS Organization
41
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-41 Media Failure Durability requires that database be stored redundantly Log can be used as second copy if : –Update records contain after image (as well as before image): new value of item –A snapshot, or dump, of the database is periodically stored in non-volatile memory Recovery: Starting with most recent dump, play the log forward, update database using after images appended subsequent to dump
42
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-42 Distributed Transactions Information supporting a large enterprise is generally stored on multiple computer systems scattered through the enterprise –E.g., Bank has a database at each branch recording local branch data and a database at the main office recording aggregate data Each DBMS is independent, supporting local transactions at a site With increased enterprise integration and automation, global, or distributed, transactions, involving multiple sites must also be supported
43
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-43 Distributed Transaction A transaction that invokes local transactions –E.g., Bank transfer: invoke withdraw at one site and deposit at another A system that supports distributed transactions is often referred to as a multidatabase system In addition to the local integrity constraints that apply at each site, global integrity constraints might also exist –E.g., Aggregate bank assets at central site = sum of assets at each branch
44
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-44 ACIDity of Distributed Transaction Although a distributed transaction is consistent, maintaining isolation in a multi- database is an important issue Even if local sites are serializable, subtransactions of two distributed transactions might be serialized in different orders at different sites –At site A, T 1A is serialized before T 2A –At site B, T 2B is serialized before T 1B
45
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-45 ACIDity of Distributed Transaction Although a distributed transaction is consistent, maintaining atomicity in a multidatabase is an important issue Guaranteeing that subtransactions of a distributed transaction either all commit or all abort in spite of failures (e.g., message loss, site crash) requires the use of a two-phase commit protocol
46
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-46 Two-Phase Commit Protocol Implemented as an exchange of messages between the coordinator and the cohorts –The cohorts are the individual subtransactions that participated in the transaction –The coordinator polls the cohorts to see if they want to commit
47
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-47 Two-Phase Commit Protocol Prepare message (coordinator to cohort) : –If cohort wants to abort, it aborts –If cohort wants to commit, it moves all update log records to non-volatile store and forces a prepared record to its log –Cohort sends a (ready or aborting) vote message to coordinator
48
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-48 Two-Phase Commit Protocol Vote message (cohort to coordinator): Cohort indicates ready to commit or aborting. –If any are aborting, coordinator decides abort –If all are ready, coordinator decides commit and forces commit record to its log –Coordinator sends commit/abort message to all cohorts that voted ready
49
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-49 Two-Phase Commit Protocol Commit/abort message (coordinator to cohort): –Cohort commits locally by forcing a commit record to its log. Or, if abort message, it aborts –Cohort sends done message to coordinator Done message (cohort to coordinator): –When coordinator receives done message from all cohorts, it writes a complete record to its log
50
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-50 Global Serializability Theorem: If all sites use a two-phase locking protocol and a two-phase commit protocol is used, transactions are globally serializable –Transactions are serialized in the same order at every site – the order in which the transactions committed
51
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-51 ACIDity of Distributed Transactions Global deadlock can be another result of implementing two-phase locking and two- phase commit protocols –At site A, T 1A is waiting for a lock held by T 2A –At site B, T 2B is waiting for a lock held by T 1B System uses deadlock detection algorithms or timeout to deal with this
52
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-52 Replication Information is often replicated in a distributed system –Performance enhancement possible: access to a local replica replaces network communication –Availability improved: if a site containing a data item is unavailable, access a replica at a different site Major implementation problem: how do you keep the replicas synchronized when a replicated data item is updated?
53
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-53 Implementation of Replication DBMSs provide replica control modules to make replication invisible to the application Typical implementation: read one/write all –When application requests to read an item, replica control fetches the nearest copy –When application requests to write an item, replica control updates all copies As compared with non-replicated systems, performance of read better, of write worse
54
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-54 Implementation of Replication Synchronous update systems: all replicas updated as part of transaction. Supports serializability, but performance bad, deadlocks frequent, and cannot handle disconnected sites Asynchronous update systems: one replica updated as part of transaction. Others updated after transaction commits. Performance better, deadlocks less frequent, and disconnected sites can be supported, but serializability sacrificed Practical systems are generally asynchronous
55
Copyright © 2005 Pearson Addison-Wesley. All rights reserved. 13-55 Correctness Application designers must be aware of the fact that real-world systems do not always support ACID executions even if all transactions are consistent –Isolation levels lower than SERIALIZABLE might be used –Two-phase commit protocol might not be used –Replication might use asynchronous update
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.