CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 1 (R&G ch.

Slides:



Advertisements
Similar presentations
Chapter 16: Recovery System
Advertisements

Crash Recovery John Ortiz. Lecture 22Crash Recovery2 Review: The ACID properties  Atomicity: All actions in the transaction happen, or none happens 
1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 1 (R&G ch.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 2 (R&G ch.
Crash Recovery, Part 1 If you are going to be in the logging business, one of the things that you have to do is to learn about heavy equipment. Robert.
1 Supplemental Notes: Practical Aspects of Transactions THIS MATERIAL IS OPTIONAL.
1 Crash Recovery Chapter Review: The ACID properties  A  A tomicity: All actions of the Xact happen, or none happen.  C  C onsistency: If each.
© Dennis Shasha, Philippe Bonnet 2001 Log Tuning.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
Transaction Management: Crash Recovery CS634 Class 20, Apr 16, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Transaction.
Transaction Management Overview R & G Chapter 16 There are three side effects of acid. Enhanced long term memory, decreased short term memory, and I forget.
Crash Recovery.
Crash Recovery. Review: The ACID properties A A tomicity: All actions in the Xaction happen, or none happen. C C onsistency: If each Xaction is consistent,
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
ICS (072)Database Recovery1 Database Recovery Concepts and Techniques Dr. Muhammad Shafique.
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity.
1 Implementing Atomicity and Durability Chapter 25.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Remote Backup Systems.
Recovery Basics. Types of Recovery Catastrophic – disk crash –Backup from tape; redo from log Non-catastrophic: inconsistent state –Undo some operations.
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
1 Transaction Management Overview Chapter Transactions  A transaction is the DBMS’s abstract view of a user program: a sequence of reads and writes.
Switch off your Mobiles Phones or Change Profile to Silent Mode.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Lecture 21 Ramakrishnan - Chapter 18.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Database Systems/COMP4910/Spring05/Melikyan1 Transaction Management Overview Unit 2 Chapter 16.
1 Transaction Management Overview Chapter Transactions  Concurrent execution of user programs is essential for good DBMS performance.  Because.
Recovery system By Kotoua Selira. Failure classification Transaction failure : Logical errors: transaction cannot complete due to some internal error.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery (R&G ch. 18)
Chapter 17: Recovery System
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
1 Database Systems ( 資料庫系統 ) January 3, 2005 Chapter 18 By Hao-hua Chu ( 朱浩華 )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.

Database Recovery Techniques
Remote Backup Systems.
Database Recovery Techniques
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
Database Recovery Techniques
File Processing : Recovery
CRASH RECOVERY (CHAPTERS 14, 16) (Joint Collaboration with Prof
CS 632 Lecture 6 Recovery Principles of Transaction-Oriented Database Recovery Theo Haerder, Andreas Reuter, 1983 ARIES: A Transaction Recovery Method.
Lecture#23: Crash Recovery
Outline Introduction Background Distributed DBMS Architecture
Module 17: Recovery System
Recovery System.
Backup and Recovery Techniques
Database Recovery 1 Purpose of Database Recovery
Remote Backup Systems.
Presentation transcript:

CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo Lecture#23: Crash Recovery – Part 1 (R&G ch. 18)

CMU SCS Last Class Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control Multi-Version+2PL Partition-based T/O Faloutsos/PavloCMU SCS /6152

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /6153

CMU SCS Motivation Faloutsos/PavloCMU SCS /6154 BEGIN R(A) W(A) ⋮ COMMIT T1 Buffer Pool Disk A=1 Page A=1 Memory A=2

CMU SCS Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction atomicity and durability despite failures. Recovery algorithms have two parts: –Actions during normal txn processing to ensure that the DBMS can recover from a failure. –Actions after a failure to recover the database to a state that ensures atomicity, consistency, and durability. Faloutsos/PavloCMU SCS /6155

CMU SCS Crash Recovery DBMS is divided into different components based on the underlying storage device. Need to also classify the different types of failures that the DBMS needs to handle. Faloutsos/PavloCMU SCS /6156

CMU SCS Storage Types Volatile Storage: –Data does not persist after power is cut. –Examples: DRAM, SRAM Non-volatile Storage: –Data persists after losing power. –Examples: HDD, SDD Stable Storage: –A non-existent form of non-volatile storage that survives all possible failures scenarios. Faloutsos/PavloCMU SCS /6157 Use multiple storage devices to approximate.

CMU SCS Failure Classification Transaction Failures System Failures Storage Media Failures Faloutsos/PavloCMU SCS /6158

CMU SCS Transaction Failures Logical Errors: –Transaction cannot complete due to some internal error condition (e.g., integrity constraint violation). Internal State Errors: –DBMS must terminate an active transaction due to an error condition (e.g., deadlock) Faloutsos/PavloCMU SCS /6159

CMU SCS System Failures Software Failure: –Problem with the DBMS implementation (e.g., uncaught divide-by-zero exception). Hardware Failure: –The computer hosting the DBMS crashes (e.g., power plug gets pulled). –Fail-stop Assumption: Non-volatile storage contents are assumed to not be corrupted by system crash. Faloutsos/PavloCMU SCS /61510

CMU SCS Storage Media Failure Non-Repairable Hardware Failure: –A head crash or similar disk failure destroys all or part of non-volatile storage. –Destruction is assumed to be detectable (e.g., disk controller use checksums to detect failures). No DBMS can recover from this. Database must be restored from archived version. Faloutsos/PavloCMU SCS /61511

CMU SCS Problem Definition Primary storage location of records is on non-volatile storage, but this is much slower than volatile storage. Use volatile memory for faster access: –First copy target record into memory. –Perform the writes in memory. –Write dirty records back to disk. Faloutsos/PavloCMU SCS /61512

CMU SCS Problem Definition Need to ensure: –The changes for any txn are durable once the DBMS has told somebody that it committed. –No changes are durable if the txn aborted. Faloutsos/PavloCMU SCS /61513

CMU SCS Undo vs. Redo Undo: The process of removing the effects of an incomplete or aborted txn. Redo: The process of re-instating the effects of a committed txn for durability. How the DBMS supports this functionality depends on how it manages the buffer pool… Faloutsos/PavloCMU SCS /61514

CMU SCS Buffer Pool Management Faloutsos/PavloCMU SCS /61515 Buffer Pool Disk A=1B=99C=7 Page A=1B=99C=7 Memory B=88 BEGIN R(A) W(A) ⋮ ABORT T1 T2 BEGIN R(B) W(B) COMMIT Schedule A=3 Do we force T2’s changes to be written to disk? Is T1 allowed to overwrite A even though it hasn’t committed? What happens when we need to rollback T1? B=88A=3

CMU SCS Buffer Pool – Steal Policy Whether the DBMS allows an uncommitted txn to overwrite the most recent committed value of an object in non-volatile storage. –STEAL: Is allowed. –NO-STEAL: Is not allowed. Faloutsos/PavloCMU SCS /61516

CMU SCS Buffer Pool – Force Policy Whether the DBMS ensures that all updates made by a txn are reflected on non-volatile storage before the txn is allowed to commit: –FORCE: Is enforced. –NO-FORCE: Is not enforced. Force writes makes it easier to recover but results in poor runtime performance. Faloutsos/PavloCMU SCS /61517

CMU SCS NO-STEAL + FORCE Faloutsos/PavloCMU SCS /61518 Buffer Pool Disk A=1B=99C=7 Page A=1B=99C=7 Memory B=88 BEGIN R(A) W(A) ⋮ ABORT T1 T2 BEGIN R(B) W(B) COMMIT Schedule A=3 B=88 FORCE means that T2 changes must be written to disk at this point. NO-STEAL means that T1 changes cannot be written to disk yet. Now it’s trivial to rollback T1.

CMU SCS NO-STEAL + FORCE This approach is the easiest to implement: –Never have to undo changes of an aborted txn because the changes were not written to disk. –Never have to redo changes of a committed txn because all the changes are guaranteed to be written to disk at commit time. But this will be slow… Faloutsos/PavloCMU SCS /61519

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /61520

CMU SCS Write-Ahead Log Record the changes made to the database in a log before the change is made. –Assume that the log is on stable storage. –Log contains sufficient information to perform the necessary undo and redo actions to restore the database after a crash. Buffer Pool: STEAL + NO-FORCE Faloutsos/PavloCMU SCS /61521

CMU SCS Write-Ahead Log Protocol All log records pertaining to an updated page are written to non-volatile storage before the page itself is allowed to be over- written in non-volatile storage. A txn is not considered committed until all its log records have been written to stable storage. Faloutsos/PavloCMU SCS /61522

CMU SCS Write-Ahead Log Protocol Log record format: – –Each transaction writes a log record first, before doing the change. –Write a record to mark txn starting point. When a txn finishes, the DBMS will: –Write a record on the log –Make sure that all log records are flushed before it returns an acknowledgement to application. Faloutsos/PavloCMU SCS /61523

CMU SCS Non-Volatile Storage WAL BEGIN W(A) W(B) ⋮ COMMIT T1 Write-Ahead Log – Example Faloutsos/Pavlo24 ⋮ CRASH! ObjectId TxnId The result is deemed safe to return to app. Buffer Pool A=99B=5 A=99B=5 A=88B=10 Before Value After Value Volatile Storage

CMU SCS WAL – Implementation Details When should we write log entries to disk? –When the transaction commits. –Can use group commit to batch multiple log flushes together to amortize overhead. When should we write dirty records to disk? –Every time the txn executes an update? –Once when the txn commits? Faloutsos/PavloCMU SCS /61525

CMU SCS WAL – Deferred Updates Observation: If we prevent the DBMS from writing dirty records to disk until the txn commits, then we don’t need to store their original values. Faloutsos/PavloCMU SCS /61526 WAL X X

CMU SCS WAL – Deferred Updates Observation: If we prevent the DBMS from writing dirty records to disk until the txn commits, then we don’t need to store their original values. Faloutsos/PavloCMU SCS /61527 WAL CRASH! WAL CRASH! Replay the log and redo each update. Simply ignore all of T1’s updates.

CMU SCS WAL – Deferred Updates This won’t work if the change set of a txn is larger than the amount of memory available. –Example: Update all salaries by 5% The DBMS cannot undo changes for an aborted txn if it doesn’t have the original values in the log. We need to use the STEAL policy. Faloutsos/PavloCMU SCS /61528

CMU SCS WAL – Buffer Pool Policies Faloutsos/PavloCMU SCS /61529 NO-STEALSTEAL NO-FORCE – Fastest FORCE Slowest – Runtime Performance NO-STEALSTEAL NO-FORCE – Slowest FORCE Fastest – Recovery Performance Undo + Redo No Undo + No Redo Almost every DBMS uses NO-FORCE + STEAL

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /61530

CMU SCS Checkpoints The WAL will grow forever. After a crash, the DBMS has to replay the entire log which will take a long time. The DBMS periodically takes a checkpoint where it flushes all buffers out to disk. Faloutsos/PavloCMU SCS /61531

CMU SCS Checkpoints Output onto stable storage all log records currently residing in main memory. Output to the disk all modified blocks. Write a entry to the log and flush to stable storage. Faloutsos/PavloCMU SCS /61532

CMU SCS Checkpoints Any txn that committed before the checkpoint is ignored (T1). T2 + T3 did not commit before the last checkpoint. –Need to redo T2 because it committed after checkpoint. –Need to undo T3 because it did not commit before the crash. Faloutsos/PavloCMU SCS /61533 WAL ⋮ CRASH!

CMU SCS Checkpoints – Challenges We have to stall all txns when take a checkpoint to ensure a consistent snapshot. Scanning the log to find uncommitted can take a long time. Not obvious how often the DBMS should take a checkpoint. Faloutsos/PavloCMU SCS /61534

CMU SCS Checkpoints – Frequency Checkpointing too often causes the runtime performance to degrade. –System spends too much time flushing buffers. But waiting a long time is just as bad: –The checkpoint will be large and slow. –Makes recovery time much longer. Faloutsos/PavloCMU SCS /61535

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /61536

CMU SCS Logging Schemes Physical Logging: Record the changes made to a specific location in the database. –Example: Position of a record in a page. Logical Logging: Record the high-level operations executed by txns. –Example: The UPDATE, DELETE, and INSERT queries invoked by a txn. Faloutsos/PavloCMU SCS /61537

CMU SCS Physical vs. Logical Logging Logical logging requires less data written in each log record than physical logging. Difficult to implement recovery with logical logging if you have concurrent txns. –Hard to determine which parts of the database may have been modified by a query before crash. –Also takes longer to recover because you must re-execute every txn all over again. Faloutsos/PavloCMU SCS /61538

CMU SCS Physiological Logging Hybrid approach where log records target a single page but do not specify data organization of the page. This is the most popular approach. Faloutsos/PavloCMU SCS /61539

CMU SCS Logging Schemes Faloutsos/PavloCMU SCS /61540 PhysicalLogicalPhysiological INSERT INTO X VALUES(1,2,3); <T1, Table=X, Page=99, Offset=4, Record=(1,2,3)> <T1, Index=X_PKEY, Page=45, Offset=9, Key=(1,Record1)> <T1, “INSERT INTO X VALUES(1,2,3)”> <T1, Table=X, Page=99, Record=(1,2,3)> <T1, Index=X_PKEY, IndexPage=45, Key=(1,Record1)>

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /61541

CMU SCS Shadow Paging Maintain two separate copies of the database (master, shadow) Updates are only made in the shadow copy. When a txn commits, atomically switch the shadow to become the new master. Buffer Pool: NO-STEAL + FORCE Faloutsos/PavloCMU SCS /61542

CMU SCS Shadow Paging Database is a tree whose root is a single disk block. There are two copies of the tree, the master and shadow –The root points to the master copy. –Updates are applied to the shadow copy. Faloutsos/PavloCMU SCS /61543 Portions courtesy of the great Phil Bernstein

CMU SCS Non-Volatile Storage Pages on Disk Memory Shadow Paging – Example Faloutsos/PavloCMU SCS /61544 Master Page Table DB Root

CMU SCS Shadow Paging To install the updates, overwrite the root so it points to the shadow, thereby swapping the master and shadow: –Before overwriting the root, none of the transaction’s updates are part of the disk- resident database –After overwriting the root, all of the transaction’s updates are part of the disk- resident database. Faloutsos/PavloCMU SCS /61545 Portions courtesy of the great Phil Bernstein

CMU SCS Non-Volatile Storage Pages on Disk Memory Shadow Paging – Example Faloutsos/PavloCMU SCS /61546 Shadow Page Table Master Page Table DB Root Read-only txns access the current master. Active modifying txn updates shadow pages. X X X X ✔

CMU SCS Shadow Paging – Undo/Redo Supporting rollbacks and recovery is easy. Undo: –Simply remove the shadow pages. Leave the master and the DB root pointer alone. Redo: –Not needed at all. Faloutsos/PavloCMU SCS /61547

CMU SCS Shadow Paging – Advantages No overhead of writing log records. Recovery is trivial. Faloutsos/PavloCMU SCS /61548

CMU SCS Shadow Paging – Disadvantages Copying the entire page table is expensive: –Use a page table structured like a B+tree –No need to copy entire tree, only need to copy paths in the tree that lead to updated leaf nodes Commit overhead is high: –Flush every updated page, page table, & root. –Data gets fragmented. –Need garbage collection. Faloutsos/PavloCMU SCS /61549

CMU SCS Today’s Class Overview Write-Ahead Log Checkpoints Logging Schemes Shadow Paging Examples Faloutsos/PavloCMU SCS /61550

CMU SCS Observation #1 You can only safely write a single page to non-volatile storage at a time. –Linux Default: 4KB How does a DBMS make sure that large updates are safely written? Faloutsos/PavloCMU SCS /61551

CMU SCS MySQL – Doublewrite Buffer When MySQL flushes dirty records from its buffer, it first writes them out sequentially to a doublewrite buffer and then fsyncs. If this is successful, then it can safely write records at their real location. On recovery, check whether the doublewrite buffer matches the record’s real location. –If not, then restore from doublewrite buffer. Faloutsos/PavloCMU SCS /61552

CMU SCS Observation #2 With a WAL, the DBMS has to write each update to stable storage at least twice: –Once in the log. –And again in the primary storage. The total amount of data per update depends on implementation (e.g., physical vs. logical) Faloutsos/PavloCMU SCS /61553

CMU SCS Storing BLOBs in the Database Every time you change a BLOB field you have to store the before/after image in WAL. Don’t store large files in your database! Put the file on the filesystem and store a URI in the database. More information: –To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem?To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem? Faloutsos/PavloCMU SCS /61554

CMU SCS Log-Structured Merge Trees No primary storage. The log is the database. –All writes create just one log entry (fast!). –All reads must search the log backwards to find the last value written for the target key. DBMS still must periodically take a checkpoint: –Log compaction instead of flushing buffers. Faloutsos/PavloCMU SCS /61555

CMU SCS Google’s fast storage library that provides an ordered mapping of key/value pairs. –MemTable: In-memory index of log entries –SSTable: Immutable MemTables on disk. LevelDB – LSM Faloutsos/PavloCMU SCS /61556 MemTable ⋮ SSTable KeyOffset …… KeyOffset …… SSTable Index

CMU SCS Observation #3 All of this has assumed that the database is stored on slow disks (HDD, SDD). What kind of logging should we use if the database is stored entirely in main memory? Faloutsos/PavloCMU SCS /61557

CMU SCS VoltDB – Command Logging Even more lightweight version of logical logging based on stored procedures. – Faloutsos/PavloCMU SCS /61558 Command <T1, Proc=UpdateAcct, Params=(123)>

CMU SCS Command vs. Physiological Faloutsos/PavloCMU SCS /61559 Runtime Performance (Higher is Better) Recovery Time (Lower is Better) Storage Speed (Relative to DRAM)

CMU SCS Summary Write-Ahead Log to handle loss of volatile storage. Use incremental updates (i.e., STEAL, NO- FORCE) with checkpoints. On recovery: undo uncommitted txns + redo committed txns. Faloutsos/PavloCMU SCS /61560

CMU SCS Next Class – ARIES Algorithms for Recovery and Isolation Exploiting Semantics –Write-ahead Logging –Repeating History during Redo –Logging Changes during Undo Faloutsos/PavloCMU SCS /61561