Lecture#23: Crash Recovery

Slides:



Advertisements
Similar presentations
Recovery Amol Deshpande CMSC424.
Advertisements

Chapter 16: Recovery System
Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 
1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 1 (R&G ch.
CS 440 Database Management Systems Lecture 10: Transaction Management - Recovery 1.
Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY
Crash Recovery R&G - Chapter 18.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 2 (R&G ch.
1 Crash Recovery Chapter Review: The ACID properties  A  A tomicity: All actions of the Xact happen, or none happen.  C  C onsistency: If each.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Introduction to Database Systems1 Logging and Recovery CC Lecture 2.
Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
Crash Recovery.
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
ICS (072)Database Recovery1 Database Recovery Concepts and Techniques Dr. Muhammad Shafique.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 1 (R&G ch.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Recovery system By Kotoua Selira. Failure classification Transaction failure : Logical errors: transaction cannot complete due to some internal error.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery (R&G ch. 18)
Chapter 17: Recovery System
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Transactional Recovery and Checkpoints Chap
Motivation for Recovery Atomicity: –Transactions may abort (“Rollback”). Durability: –What if DBMS stops running? (Causes?) crash! v Desired Behavior after.
Transactional Recovery and Checkpoints. Difference How is this different from schedule recovery? It is the details to implementing schedule recovery –It.

Database Recovery Techniques
Remote Backup Systems.
Database Recovery Techniques
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
Transactional Recovery and Checkpoints
CS 440 Database Management Systems
Database Applications (15-415) DBMS Internals- Part XIII Lecture 22, November 15, 2016 Mohammad Hammoud.
Crash Recovery Chapter 18
Transaction Management Overview
Backup and Recovery Techniques
File Processing : Recovery
Transaction Management Overview
Database Systems (資料庫系統)
Crash Recovery Chapter 18
Database Recovery Techniques
Recovery I: The Log and Write-Ahead Logging
Crash Recovery Chapter 18
CS 632 Lecture 6 Recovery Principles of Transaction-Oriented Database Recovery Theo Haerder, Andreas Reuter, 1983 ARIES: A Transaction Recovery Method.
Transaction Management Overview
Database Applications (15-415) DBMS Internals- Part XIII Lecture 25, April 15, 2018 Mohammad Hammoud.
Outline Introduction Background Distributed DBMS Architecture
Module 17: Recovery System
Recovery System.
Lecture 20: Intro to Transactions & Logging II
Backup and Recovery Techniques
Database Recovery 1 Purpose of Database Recovery
Crash Recovery Chapter 18
Transaction Management Overview
Database Applications (15-415) DBMS Internals- Part XIII Lecture 24, April 14, 2016 Mohammad Hammoud.
Remote Backup Systems.
Crash Recovery Chapter 18
Recovery Unit 4.4 Dr Gordon Russell, Napier University
Transaction Management Overview
Presentation transcript:

Lecture#23: Crash Recovery Faloutsos/Pavlo CMU 15-415/615 Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery (R&G ch. 18)

Last Class Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Motivation T1 Page Disk Memory BEGIN R(A) W(A) ⋮ COMMIT Buffer Pool Faloutsos/Pavlo CMU - 15-415/615 Motivation BEGIN R(A) W(A) ⋮ COMMIT T1 Buffer Pool A=1 A=2 A=1 Page Disk Memory Faloutsos/Pavlo CMU SCS 15-415/615

Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction atomicity and durability despite failures. Recovery algorithms have two parts: Actions during normal txn processing to ensure that the DBMS can recover from a failure. Actions after a failure to recover the database to a state that ensures atomicity, consistency, and durability. Faloutsos/Pavlo CMU SCS 15-415/615

Crash Recovery DBMS is divided into different components based on the underlying storage device. Need to also classify the different types of failures that the DBMS needs to handle. Faloutsos/Pavlo CMU SCS 15-415/615

Use multiple storage devices to approximate. Storage Types Volatile Storage: Data does not persist after power is cut. Examples: DRAM, SRAM Non-volatile Storage: Data persists after losing power. Examples: HDD, SDD Stable Storage: A non-existent form of non-volatile storage that survives all possible failures scenarios. Use multiple storage devices to approximate. Faloutsos/Pavlo CMU SCS 15-415/615

Failure Classification Transaction Failures System Failures Storage Media Failures Faloutsos/Pavlo CMU SCS 15-415/615

Transaction Failures Logical Errors: Internal State Errors: Transaction cannot complete due to some internal error condition (e.g., integrity constraint violation). Internal State Errors: DBMS must terminate an active transaction due to an error condition (e.g., deadlock) Faloutsos/Pavlo CMU SCS 15-415/615

System Failures Software Failure: Hardware Failure: Problem with the DBMS implementation (e.g., uncaught divide-by-zero exception). Hardware Failure: The computer hosting the DBMS crashes (e.g., power plug gets pulled). Fail-stop Assumption: Non-volatile storage contents are assumed to not be corrupted by system crash. Faloutsos/Pavlo CMU SCS 15-415/615

Storage Media Failure Non-Repairable Hardware Failure: A head crash or similar disk failure destroys all or part of non-volatile storage. Destruction is assumed to be detectable (e.g., disk controller use checksums to detect failures). No DBMS can recover from this. Database must be restored from archived version. Faloutsos/Pavlo CMU SCS 15-415/615

Problem Definition Primary storage location of records is on non-volatile storage, but this is much slower than volatile storage. Use volatile memory for faster access: First copy target record into memory. Perform the writes in memory. Write dirty records back to disk. Faloutsos/Pavlo CMU SCS 15-415/615

Problem Definition Need to ensure: The changes for any txn are durable once the DBMS has told somebody that it committed. No changes are durable if the txn aborted. Faloutsos/Pavlo CMU SCS 15-415/615

Undo vs. Redo Undo: The process of removing the effects of an incomplete or aborted txn. Redo: The process of re-instating the effects of a committed txn for durability. How the DBMS supports this functionality depends on how it manages the buffer pool… Faloutsos/Pavlo CMU SCS 15-415/615

Buffer Pool Management Faloutsos/Pavlo CMU - 15-415/615 Buffer Pool Management Is T1 allowed to overwrite A even though it hasn’t committed? Schedule Do we force T2’s changes to be written to disk? BEGIN R(A) W(A) ⋮ ABORT T1 T2 R(B) W(B) COMMIT Buffer Pool A=3 A=1 B=99 C=7 B=88 A=3 A=1 B=99 C=7 B=88 Page Disk Memory What happens when we need to rollback T1? Faloutsos/Pavlo CMU SCS 15-415/615

Buffer Pool – Steal Policy Whether the DBMS allows an uncommitted txn to overwrite the most recent committed value of an object in non-volatile storage. STEAL: Is allowed. NO-STEAL: Is not allowed. Faloutsos/Pavlo CMU SCS 15-415/615

Buffer Pool – Force Policy Whether the DBMS ensures that all updates made by a txn are reflected on non-volatile storage before the txn is allowed to commit: FORCE: Is enforced. NO-FORCE: Is not enforced. Force writes makes it easier to recover but results in poor runtime performance. Faloutsos/Pavlo CMU SCS 15-415/615

Faloutsos/Pavlo CMU - 15-415/615 NO-STEAL + FORCE NO-STEAL means that T1 changes cannot be written to disk yet. Schedule BEGIN R(A) W(A) ⋮ ABORT T1 T2 R(B) W(B) COMMIT Buffer Pool A=3 A=1 B=99 C=7 B=88 A=1 B=99 C=7 B=88 Page Disk FORCE means that T2 changes must be written to disk at this point. Memory Now it’s trivial to rollback T1. Faloutsos/Pavlo CMU SCS 15-415/615

NO-STEAL + FORCE This approach is the easiest to implement: Never have to undo changes of an aborted txn because the changes were not written to disk. Never have to redo changes of a committed txn because all the changes are guaranteed to be written to disk at commit time. But this will be slow… What if txn modifies the entire database? Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Write-Ahead Log Record the changes made to the database in a log before the change is made. Assume that the log is on stable storage. Log contains sufficient information to perform the necessary undo and redo actions to restore the database after a crash. Buffer Pool: STEAL + NO-FORCE Faloutsos/Pavlo CMU SCS 15-415/615

Write-Ahead Log Protocol All log records pertaining to an updated page are written to non-volatile storage before the page itself is allowed to be over-written in non-volatile storage. A txn is not considered committed until all its log records have been written to stable storage. Faloutsos/Pavlo CMU SCS 15-415/615

Write-Ahead Log Protocol Write a <BEGIN> record to the log for each txn to mark its starting point. Log record format: <txnId, objectId, beforeValue, afterValue> When a txn finishes, the DBMS will: Write a <COMMIT> record on the log Make sure that all log records are flushed before it returns an acknowledgement to application. Faloutsos/Pavlo CMU SCS 15-415/615

Write-Ahead Log – Example ObjectId Before Value WAL TxnId After Value BEGIN W(A) W(B) ⋮ COMMIT T1 <T1 begin> <T1, A, 99, 88> <T1, B, 5, 10> <T1 commit> ⋮ CRASH! Buffer Pool A=99 B=5 We have to redo T1! The result is deemed safe to return to app. Non-Volatile Storage A=99 A=88 B=10 B=5 Volatile Storage Faloutsos/Pavlo

WAL – Implementation Details When should we write log entries to disk? When the transaction commits. Can use group commit to batch multiple log flushes together to amortize overhead. When should we write dirty records to disk? Every time the txn executes an update? Once when the txn commits? Faloutsos/Pavlo CMU SCS 15-415/615

WAL – Deferred Updates Observation: If we prevent the DBMS from writing dirty records to disk until the txn commits, then we don’t need to store their original values. WAL <T1 begin> <T1, A, 99, 88> <T1, B, 5, 10> <T1 commit> X X Faloutsos/Pavlo CMU SCS 15-415/615

WAL – Deferred Updates Observation: If we prevent the DBMS from writing dirty records to disk until the txn commits, then we don’t need to store their original values. Replay the log and redo each update. Simply ignore all of T1’s updates. WAL <T1 begin> <T1, A, 88> <T1, B, 10> <T1 commit> CRASH! WAL <T1 begin> <T1, A, 88> <T1, B, 10> CRASH! Faloutsos/Pavlo CMU SCS 15-415/615

WAL – Deferred Updates This won’t work if the change set of a txn is larger than the amount of memory available. The DBMS cannot undo changes for an aborted txn if it doesn’t have the original values in the log. We need to use the STEAL policy. Faloutsos/Pavlo CMU SCS 15-415/615

WAL – Buffer Pool Policies Undo + Redo NO-STEAL STEAL NO-FORCE – Fastest FORCE Slowest NO-STEAL STEAL NO-FORCE – Slowest FORCE Fastest Runtime Performance No Undo + No Redo Recovery Performance Almost every DBMS uses NO-FORCE + STEAL Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Checkpoints The WAL will grow forever. After a crash, the DBMS has to replay the entire log which will take a long time. The DBMS periodically takes a checkpoint where it flushes all buffers out to disk. Faloutsos/Pavlo CMU SCS 15-415/615

Checkpoints Output onto stable storage all log records currently residing in main memory. Output to the disk all modified blocks. Write a <CHECKPOINT> entry to the log and flush to stable storage. Faloutsos/Pavlo CMU SCS 15-415/615

Checkpoints WAL <T1 begin> <T1, A, 1, 2> <T1 commit> <T2 begin> <T2, A, 2, 3> <T3 begin> <CHECKPOINT> <T2 commit> <T3, A, 3, 4> ⋮ CRASH! Any txn that committed before the checkpoint is ignored (T1). T2 + T3 did not commit before the last checkpoint. Need to redo T2 because it committed after checkpoint. Need to undo T3 because it did not commit before the crash. Faloutsos/Pavlo CMU SCS 15-415/615

Checkpoints – Challenges We have to stall all txns when take a checkpoint to ensure a consistent snapshot. Scanning the log to find uncommitted txns can take a long time. Not obvious how often the DBMS should take a checkpoint… Faloutsos/Pavlo CMU SCS 15-415/615

Checkpoints – Frequency Checkpointing too often causes the runtime performance to degrade. System spends too much time flushing buffers. But waiting a long time is just as bad: The checkpoint will be large and slow. Makes recovery time much longer. Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Logging Schemes Physical Logging: Record the changes made to a specific location in the database. Example: Position of a record in a page. Logical Logging: Record the high-level operations executed by txns. Example: The UPDATE, DELETE, and INSERT queries invoked by a txn. Faloutsos/Pavlo CMU SCS 15-415/615

Physical vs. Logical Logging Logical logging requires less data written in each log record than physical logging. Difficult to implement recovery with logical logging if you have concurrent txns. Hard to determine which parts of the database may have been modified by a query before crash. Also takes longer to recover because you must re-execute every txn all over again. Faloutsos/Pavlo CMU SCS 15-415/615

Physiological Logging Hybrid approach where log records target a single page but do not specify data organization of the page. This is the most popular approach. Faloutsos/Pavlo CMU SCS 15-415/615

Logging Schemes Physical Logical Physiological INSERT INTO X VALUES(1,2,3); Physical <T1, Table=X, Page=99, Offset=4, Record=(1,2,3)> Index=X_PKEY, Page=45, Offset=9, Key=(1,Record1)> Logical <T1, “INSERT INTO X VALUES(1,2,3)”> Physiological <T1, Table=X, Page=99, Record=(1,2,3)> Index=X_PKEY, IndexPage=45, Key=(1,Record1)> Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction atomicity and durability despite failures. Recovery algorithms have two parts: Actions during normal txn processing to ensure that the DBMS can recover from a failure. Actions after a failure to recover the database to a state that ensures atomicity, consistency, and durability. Faloutsos/Pavlo CMU SCS 15-415/615

Today's Class – ARIES Algorithms for Recovery and Isolation Exploiting Semantics Write-ahead Logging Repeating History during Redo Logging Changes during Undo Faloutsos/Pavlo CMU SCS 15-415/615

ARIES Developed at IBM during the early 1990s. Considered the “gold standard” in database crash recovery. Implemented in DB2. Everybody else more or less implements a variant of it. C. Mohan IBM Fellow Faloutsos/Pavlo CMU SCS 15-415/615

ARIES – Main Ideas Write-Ahead Logging: Repeating History During Redo: Any change is recorded in log on stable storage before the database change is written to disk. Repeating History During Redo: On restart, retrace actions and restore database to exact state before crash. Logging Changes During Undo: Record undo actions to log to ensure action is not repeated in the event of repeated failures. Faloutsos/Pavlo CMU SCS 15-415/615

ARIES – Main Ideas Write Ahead Logging Fast (fuzzy) checkpoints Fast, during normal operation Least interference with OS (i.e., STEAL, NO FORCE) Fast (fuzzy) checkpoints On Recovery: Redo everything. Undo uncommitted txns. Faloutsos/Pavlo CMU SCS 15-415/615

ARIES – Recovery Phases Analysis: Read the WAL to identify dirty pages in the buffer pool and active txns at the time of the crash. Redo: Repeat all actions starting from an appropriate point in the log. Undo: Reverse the actions of txns that did not commit before the crash. Faloutsos/Pavlo CMU SCS 15-415/615

ARIES - Overview Start from last checkpoint found via Master Record. Three phases. Analysis - Figure out which txns committed or failed since checkpoint. Redo all actions (repeat history) Undo effects of failed txns. Oldest log record of txn active at crash Oldest log record in dirty page table after Analysis Last checkpoint CRASH! A R U

Additional Crash Issues What happens if system crashes during the Analysis Phase? During the Redo Phase? How do you limit the amount of work in the Redo Phase? Flush asynchronously in the background. How do you limit the amount of work in the Undo Phase? Avoid long-running txns. Faloutsos/Pavlo CMU SCS 15-415/615

Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Recovery Protocol Shadow Paging Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging Maintain two separate copies of the database (master, shadow) Updates are only made in the shadow copy. When a txn commits, atomically switch the shadow to become the new master. Buffer Pool: NO-STEAL + FORCE Faloutsos/Pavlo CMU SCS 15-415/615

Portions courtesy of the great Phil Bernstein Shadow Paging Database is a tree whose root is a single disk block. There are two copies of the tree, the master and shadow The root points to the master copy. Updates are applied to the shadow copy. Portions courtesy of the great Phil Bernstein Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging – Example Memory Non-Volatile Storage Pages on Disk 1 2 3 4 Master Page Table DB Root Faloutsos/Pavlo CMU SCS 15-415/615

Portions courtesy of the great Phil Bernstein Shadow Paging To install the updates, overwrite the root so it points to the shadow, thereby swapping the master and shadow: Before overwriting the root, none of the transaction’s updates are part of the disk-resident database After overwriting the root, all of the transaction’s updates are part of the disk-resident database. Portions courtesy of the great Phil Bernstein Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging – Example Read-only txns access the current master. Memory Non-Volatile Storage Pages on Disk 1 2 3 4 X X X Master Page Table DB Root X Shadow Page Table 1 2 3 4 ✔ Active modifying txn updates shadow pages. Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging – Undo/Redo Supporting rollbacks and recovery is easy. Undo: Simply remove the shadow pages. Leave the master and the DB root pointer alone. Redo: Not needed at all. Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging – Advantages No overhead of writing log records. Recovery is trivial. Faloutsos/Pavlo CMU SCS 15-415/615

Shadow Paging – Disadvantages Copying the entire page table is expensive: Use a page table structured like a B+tree No need to copy entire tree, only need to copy paths in the tree that lead to updated leaf nodes Commit overhead is high: Flush every updated page, page table, & root. Data gets fragmented. Need garbage collection. Faloutsos/Pavlo CMU SCS 15-415/615

Summary Write-Ahead Log to handle loss of volatile storage. Use incremental updates (i.e., STEAL, NO-FORCE) with checkpoints. On recovery: undo uncommitted txns + redo committed txns. Faloutsos/Pavlo CMU SCS 15-415/615

Conclusion Recovery is really hard. Be thankful that you don’t have to write it yourself. Faloutsos/Pavlo CMU SCS 15-415/615