CSCI Recovery Control Techniques 1 RECOVERY CONTROL TECHNIQUES Dr. Awad Khalil Computer Science Department AUC
CSCI Recovery Control Techniques 2 Content Types of Failures What is Recovery? System Concepts for Recovery Recovery Techniques Deferred Update Immediate Update Shadow Paging Recovery from Catastrophic Failures
CSCI Recovery Control Techniques 3 Types of Failures 1. A hardware or software error occurs in the computer system during transaction execution. 1. A computer failure (system crash): A hardware or software error occurs in the computer system during transaction execution. 2. Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. 2. A transaction or system error: Some operation in the transaction may cause it to fail, such as integer overflow or division by zero. 3. During transaction execution, certain conditions may occur that necessitate cancellation of the transaction. For example, data for the transaction may not be found, or insufficient account balance in a banking database. 3. Local errors or exception conditions: During transaction execution, certain conditions may occur that necessitate cancellation of the transaction. For example, data for the transaction may not be found, or insufficient account balance in a banking database. 4. The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock. 4. Concurrency control enforcement: The concurrency control method may decide to abort the transaction, to be restarted later, because it violates serializability or because several transactions are in a state of deadlock. 5. Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. 5. Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. 6. Such as power or air- conditioning failure, fire, theft, sabotage, overwriting disks by mistake, … etc. 6. Physical problems and catastrophes: Such as power or air- conditioning failure, fire, theft, sabotage, overwriting disks by mistake, … etc.
CSCI Recovery Control Techniques 4 What is Recovery? from transaction failures usually means that the database is restored to some state from the past so that a correct state - close to the time of failure - can be reconstructed from that past state. Recovery from transaction failures usually means that the database is restored to some state from the past so that a correct state - close to the time of failure - can be reconstructed from that past state. A typical strategy for recovery may be summarized informally as follows: If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a disk crash, the recovery method restores a past copy of the database that was dumped to archival storage and reconstruct a more current state by reapplying or redoing committed transaction operations from the log up to the time of failure. When the database is not physically damaged but has become inconsistent due to noncatastrophic failures of types 1 through 4, the strategy is to reverse the changes that caused the inconsistency by undoing some operations.
CSCI Recovery Control Techniques 5 System Concepts for Recovery The recovery process is often closely interwined with operating system functions. Typically, one or more disk pages that include the data item to be updated are cached into a main memory buffer and then updated in memory before being written back to disk, A directory for the cache is used to keep track of which database items are in the buffers. Some page-replacement strategy from operating systems, such as Least Recently Used (LRU) or First-In-First-Out (FIFO), can be used to select the buffers for flushing. Associated with each item in the cache is a dirty bit to indicate whether or not the item has been modified. When an item is flushed, it is written back to disk only if its dirty bit is 1.
CSCI Recovery Control Techniques 6 Flushing Techniques Two main strategies are employed when flushing a modified data item back to disk: Shadowing, writes a new item at a different disk location, so multiple copies of a data item can be maintained. The old value of the data item before updating is called before image (BFIM), and the new value after updating is called the after image (AFIM). In-place updating, writes the data item in the same disk location. Hence, a single copy of each data item is maintained on disk.
CSCI Recovery Control Techniques 7 Recovery Techniques (from non-catastrophic failures, i.e., failures that do not affect secondary storage or involve destruction of the DBMS) There are two major techniques for recovery from non- catastrophic transaction failures: Deferred updates Immediate updates In both schemes, failed or aborted transactions may be restarted later either automatically by recovery process or manually by the user by being resubmitted as brand new transactions.
CSCI Recovery Control Techniques 8 Deferred Updates The idea behind deferred update techniques is to defer or postpone any actual updates to the database itself until the transaction completes its execution successfully and reaches its commit point. During transaction execution, the updates are recorded only in the log and in the transaction workspace (buffer). After the transaction reaches its commit point and the log is force-written to disk, the updates are recorded in the database itself. If a transaction fails before reaching its commit point, there is no need to undo any operations, because the transaction has not affected the database in any way.
CSCI Recovery Control Techniques 9 Deferred Updates Protocol typical deferred update protocol can be stated as follows: 1. A transaction cannot change the database until it reaches its commit point. 2. A transaction does not reach its commit point until all its operations are recorded in the log and the log is force-written to disk. Because the database is never updated until after the transaction commits, there is never a need to UNDO any operations. Hence, this technique is known as NO- UNDO/REDO algorithm. The REDO is needed in case the system fails after the transaction commits but before all its changes are recorded in the database. In this case, the transaction operations are redone from the log entries.
CSCI Recovery Control Techniques 10 Deferred Updates Steps 1. When a transaction starts, write an entry, start_transaction(T) to the log. 2. When any operation is performed that will change values in the database, write a log entry write_item(T, X, old_value, new_value). 3. When a transaction is about to commit, write a log record of the form Commit(T), write all log records to disk. 4. Commit the transaction, using the log to write the updates to the database, the writing of data to disk need not occur immediately.
CSCI Recovery Control Techniques 11 Deferred Updates Rules ActivityLog write to disk Typical entry to the System log Changes written to Database buffer Changes written to Database Disk Start a transaction T NoStart_transaction(T)N/A Read data item X NoRead_item(T,X)N/A Write Value X NoWrite_item(T,X, old_value, new_value) No Commit transaction T YesCommit(T)YesYes (although writing back to disk may occur not immediately) Create a Checkpoint YesCheckpoint Yes (of committed transactions)
CSCI Recovery Control Techniques 12 Immediate Updates Techniques In these techniques, when a transaction issues an update command, the database can be updated immediately, without any need to wait for a transaction to reach its commit point. In many of these techniques, however, an update operation must still be recorded in the log (on disk) before it is applied to the database so that we can recover, in case of failure. When immediate update is allowed, provisions must be made for undoing the effect of update operations on the database, because a transaction can fail after it has applied some updates to the database itself. Hence, recovery schemes based on immediate updates must include the capability to roll back a transaction by undoing the effect of its write_item operations.
CSCI Recovery Control Techniques 13 Immediate Updates Protocol 1. When a transaction starts, write an entry, start_transaction(T) to the log. 2. When any operation is performed that will change values in the database, write a log entry write_item(T, X, old_value, new_value). 3. Write the log to disk. 4. Once the log record is written, write the update to the database buffers. 5. When convenient, write the database buffers to the disk. 6. When a transaction is about to commit, write a log record of the form Commit(T). 7. Write the log to disk.
CSCI Recovery Control Techniques 14 Immediate Updates Rules ActivityLog write to disk Typical entry to the System log Changes written to Database buffer Changes written to Database Disk Start a transaction T NoStart_transaction(T)N/A Read data item X NoRead_item(T,X)N/A Write Value X YesWrite_item(T,X, old_value, new_value) YesYes (although writing back to disk may occur not immediately) Commit transaction T YesCommit(T) Create a Checkpoint YesCheckpoint Yes (of committed transactions)
CSCI Recovery Control Techniques 15 Indirect Updates (Shadow Paging) In the Shadow Page scheme, the database is not directly modified but a copy, stored on permanent storage (e.g. disk), is made of the portion of the database to be modified and all modifications are made to this copy. Once a transaction commits, the modified copy replaces the original in an atomic manner, i.e., the replacement is carried out in its entirety or not at all. If the system crashes at this point, the old version is still available for recovery.
CSCI Recovery Control Techniques 16 Shadow Paging Page Management: Virtual memory is divided into pages that are all of a certain size (commonly 2K or 4K bytes). The virtual or logical pages are mapped onto physical disk blocks of the same size as the pages. The mapping is achieved by consulting a page table (or directory). The page table lists each logical page identifier and the address of the physical blocks that are actually stores the logical page. Shadow paging considers the database to be made up of a number of fixed-size disk pages (or disk blocks) - say, n - for recovery purposes. A page table (or directory) with n entries is constructed, where the ith page table entry points to the ith database page (block) on disk.
CSCI Recovery Control Techniques 17 Shadow Paging The page table is kept in main memory if it is not too large, and all references - reads or writes - to database pages on disk go through the page table. When a transaction begins executing, the current page table - whose entries point to the most recent or current database pages on disk - is copied into a shadow page table. The shadow page table is then saved on disk while the current page table is used by the transaction. During transaction execution, the shadow page table is never modified. When a write_item operation is performed, a new copy of the modified database page is created, but the old copy of that page is not overwritten. The current page table entry is modified to point to the new disk block. For pages updated by the transaction, two versions are kept. The old version is referenced by the shadow page table, and the new version by the current page table. To recover from failure during transaction execution, it is sufficient to free the modified database pages and to discard the current page table. The state of the database before transaction execution is available through the shadow page table, and that state is recovered by reinstating the shadow page table so that it becomes the current page table once more.
CSCI Recovery Control Techniques 18 Shadow Paging The advantage of shadow paging is that it makes undoing the effect of the executing transaction very simple. One disadvantage of shadow paging is that the updated database pages changes location on disk, this makes it difficult to keep related database pages close together on disk without complex storage management strategies.
CSCI Recovery Control Techniques 19 Recovery from Catastrophic Failures The recovery manager of a DBMS must be equipped to handle catastrophic failures such as disk crashes. The main technique used to handle such crashes is that of database backup. The whole database and the log are periodically copied onto a cheap storage medium such as magnetic tapes. In case of a catastrophic system failure, the latest backup copy can be reloaded from the tape to the disk, and the system can be restarted.
CSCI Recovery Control Techniques 20 Thank you