Recovery 10/18/05
Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s effects are recorded in persistent storage. However, while a transaction is active (not yet committed), failure of the transaction is a real problem for atomicity -- the DB is left in an inconsistent state. --> NOT GOOD!
Reasons for Rollback: Any sort of System SW or HW crash. Transaction abort: User initiated Transaction, T, itself - e.g., error handling. System - e.g., T is involved in deadlock Letting T complete may lead to inconsistency (I.e., violate consistency property).
How to rollback in immediate update systems Immediate update system: If T’s request to write x is granted, x is immediately updated in the DB. If T’s request to read x is granted, the value of x is returned. Note, concurrency control is relied upon to prevent reads of data that were written by uncommitted transactions. Immediate update systems maintain a log of records.
Log in immediate update system Only append log records -- never change or delete. System uses log to maintain atomicity and durability. For durability: Log used to restore effects of committed transactions. Log is a sequential file on disk Often, multiple copies kept on separate non-volatile storage.
Log - first assume all log records only on disk. Update record (for writes) Before image (aka “undo record”) Transaction id - the transaction executing the write. To rollback T, scan log backwards starting from last record. Write the before image of each of T’s log records to DB. To improve performance (avoid a scan of complete log), have each T record a begin record when T starts. Also to improve performance, have each log record of a particular T be linked together (stack-wise). If transaction commits, write a commit record. If transaction aborts, do rollback, then write an abort record.
Savepoint record To increase flexibility in doing rollbacks, a transaction can specify a savepoint during its execution. -- Then one can do a partial rollback to a specified savepoint (especially useful for transaction error handling). Savepoint record contains transaction id, savepoint id (and any other useful information). To rollback to a specified savepoint, scan log backward to the specified savepoint record, applying the before-image to the DB.
Example of use of savepoint begin_transaction(); stmt1; sp1 := create_savepoint(); stmt2; sp2 := create_savepoint(); if (cond1) rollback(sp1); else if (cond2) rollback(sp2); (sp2); else … commit();
Concurrent transactions In order to use the log, the system must determine which transactions have completed (commited or aborted), and which are active. All active transactions need to be aborted.
What does commit mean here? If commit record has not been written to log and database fails, then the transaction will be rolled back. SO! Commit means the commit record has been written to the log.
Checkpoints A checkpoint record gives all currently active transactions (e.g., written by the transaction manager to the log). To use checkpoint record, scan backward to most recent checkpoint record. If T is listed there and there has been no completion for T (abort or commit) seen so far, then backward scan continues.
Log example B1// T1 begin B2 U1// T1 update C1// T1 commit CK: T2// checkpoint U2// (a) U2// (b) > --> scan back, undo (b), undo (a), discover only T2 is active, ignore C1, ignore U1, stop at B2.
Another log example... B2 B3/\ ok, T3 scan complete B1/\ ok, T1 scan complete C2// T2 commit/\ ignore B5/\ ignore U3/\ undo U5/\ ignore A5// T5 abort/\ ignore CK: T4, T1, T3/\ only T1, T3 matter U1/\ undo U4/\ can ignore B6/\ done with T6 C4// T4 commit/\ T4 completed! U6/\ T6 active -undo U1/\ T1 active -undo >
Yet another log example.../\ continue for T1 B6/\ T6 scan done U5/\ ignore U4/\ ignore CK: 1, 4, 5, 6/\ only T1, T6 matter A5/\ T5 done U4/\ ignore C4/\ T4 done U6/| T6 active-undo >
Write-ahead log MUST always write log before DB is updated. Suppose don’t do write-ahead, T executes update --> first change DB then write log. If crash between change DB and write log, there is no way to recover DB to a consistent state. Suppose do do write-ahead, T executes update --> first write log, then change DB. If crash between write log and change DB, the recovery will write the before image (which is the same as is currently stored in DB).
Performance stinks because each DB write requires two I/O writes! Use volatile storage for the last part of the log -- log buffer. Log buffer periodically flushed to log. When system crashes, the log buffer is not available. Note, using cache is analogous: Want cache to improve performance, but… Cache data (DB and maybe log buffer) are lost.
Modify previous scheme for log buffer and cache Recall, must write record to log before writing to DB. So, A dirty page in cache is not written to DB until after the log buffer containing corresponding data item is appended to log. Either: Append record to log buffer. Eventually the buffer is flushed and can write dirty cache page. Append record to log buffer, then immediately write log buffer. AKA forced. For a normal (unforced) write, DMA can proceed concurrently with transaction execution. BUT! for a forced write, cannot return from disk write system call until the write is complete.
Alternative implementation for lug buffer and cache Add (overhead) data: Add a log sequence number (LSN) to each log record. For each DB page, the LSN of the log record for the most recent change to the DB page.
Continuing alternative implementation When space needed in cache, choose a dirty page, P, to write out Determine if log buffer contains the update record whose LSN is the LSN stored in P. If so, must force write log buffer before P is written to DB. If not, the log on mass storage is already up to date wrt P.
Example: DISK: DBLog Page#LSN1 … O3 … P9599 U5(m) Q3 Volatile: CacheLog Buffer Page# LSNLSN record P’ (x, y, z) 95, 101, 102, Q’ (a, b, c) 99101U1(x) O (l, m, n) 3102 U2(y) 103 U2(x) To remove clean O, no change to DB To remove dirty Q, cache->Q-> LSN <= log’s maximum LSN. Therefore can just write out Q. To remove dirty P, cache->P->LSN > log’s maximimum LSN. Therefore, must force Log Buffer (from beginning to P->LSN), then can write out P.
Force policy (on commits) Force policy: T wants to commit, but first! If T’s last update is still in log buffer, force log buffer. (before image is durable) Pages (dirty) in cache updated by T are forced (new values durable) Then, log T’s commit into log buffer. When that part of the log buffer is written, then T is durable.
Example on board.
For no-force commit policy: New log record type: after-image After-image (aka redo record) is a copy of the new value of the item. The motivation for having after-image in log is to improve disk access performance. I.e., new data is durable if the log buffer has been written out (even though the page in cache has not). So, there is no required order between writing commit record (to disk) and writing dirty page).
Example on board.
Three pass recovery: Do, Undo, Redo Pass I: Scan log backward to the most recent checkpoint (determining which transactions to rollback, I.e., are active at crash) Pass 2: Replay log from checkpoint. For update records (commited, aborted and active) update corresponding items in DB (use after-image). Now DB is up-to-date wrt all changes prior to crash. Pass 3: Scan backward to roll back all transactions active at the time of crash. Se before-image to reverse DB value. This pass ends when begin of all roll back transactions have been reached.
Caveat to Do, Undo, Redo Checkpoint followed by T update, then T abort: Update was rolled back to data before abort was logged. So updates are restored, but not rolledback. To fix, an abort that had updated x need TWO records in the log: update (xold, xnew), followed by compensation (xnew, xold).
Example on board
Class discussion of ARIES.