6/18/2015Transactional Information Systems15-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery.

Slides:



Advertisements
Similar presentations
Transaction Management: Crash Recovery, part 2 CS634 Class 21, Apr 23, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Advertisements

CS 440 Database Management Systems Lecture 10: Transaction Management - Recovery 1.
Crash Recovery R&G - Chapter 18.
Crash Recovery, Part 1 If you are going to be in the logging business, one of the things that you have to do is to learn about heavy equipment. Robert.
1 Crash Recovery Chapter Review: The ACID properties  A  A tomicity: All actions of the Xact happen, or none happen.  C  C onsistency: If each.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Introduction to Database Systems1 Logging and Recovery CC Lecture 2.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 23 Database Recovery Techniques.
1 Crash Recovery Chapter Review: The ACID properties  A  A tomicity: All actions in the Xact happen, or none happen.  C  C onsistency: If each.
COMP9315: Database System Implementation 1 Crash Recovery Chapter 18 (3 rd Edition)
Chapter 20: Recovery. 421B: Database Systems - Recovery 2 Failure Types q Transaction Failures: local recovery q System Failure: Global recovery I Main.
Chapter 19 Database Recovery Techniques
Chapter 19 Database Recovery Techniques Copyright © 2004 Pearson Education, Inc.
Crash Recovery.
Crash Recovery. Review: The ACID properties A A tomicity: All actions in the Xaction happen, or none happen. C C onsistency: If each Xaction is consistent,
Recovery 10/18/05. Implementing atomicity Note, when a transaction commits, the portion of the system implementing durability ensures the transaction’s.
ICS (072)Database Recovery1 Database Recovery Concepts and Techniques Dr. Muhammad Shafique.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 23 Database Recovery Techniques.
Chapter 19 Database Recovery Techniques. Slide Chapter 19 Outline Databases Recovery 1. Purpose of Database Recovery 2. Types of Failure 3. Transaction.
6/25/2015Transactional Information Systems16-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts Chapter 17: Recovery System Failure Classification Storage Structure Recovery and Atomicity.
1 Implementing Atomicity and Durability Chapter 25.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
1 CS 541 Database Systems Implementation of Undo- Redo.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY These are mostly the slides of your textbook !
Data Concurrency Control And Data Recovery
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Recovery system By Kotoua Selira. Failure classification Transaction failure : Logical errors: transaction cannot complete due to some internal error.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 15 Recovery. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.15-2 Topics in this Chapter Transactions Transaction Recovery System.
Chapter 10 Recovery System. ACID Properties  Atomicity. Either all operations of the transaction are properly reflected in the database or none are.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Chapter 17: Recovery System
1 Chapter 6 Database Recovery Techniques Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
1 A Theory of Redo Recovery David Lomet Microsoft Research, Redmond Mark Tuttle HP Research, Cambridge.
Motivation for Recovery Atomicity: –Transactions may abort (“Rollback”). Durability: –What if DBMS stops running? (Causes?) crash! v Desired Behavior after.
1 Logging and Recovery. 2 Review: The ACID properties v A v A tomicity: All actions in the Xact happen, or none happen. v C v C onsistency: If each Xact.
Database Applications (15-415) DBMS Internals- Part XIV Lecture 25, April 17, 2016 Mohammad Hammoud.
Jun-Ki Min. Slide Purpose of Database Recovery ◦ To bring the database into the last consistent stat e, which existed prior to the failure. ◦

Database Recovery Techniques
Database Recovery Techniques
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
ARIES: Algorithm for Recovery and Isolation Exploiting Semantics
CS 440 Database Management Systems
Crash Recovery Chapter 18
Chapter 10 Recover System
Transactional Information Systems:
Chapter 16: Recovery System
Crash Recovery Chapter 18
CSIS 7102 Spring 2004 Lecture 10: ARIES
Chapter 13: Page-Model Crash Recovery Algorithms
Database Recovery Techniques
Transactional Information Systems:
Recovery II: Surviving Aborts and System Crashes
Crash Recovery Chapter 18
CS 632 Lecture 6 Recovery Principles of Transaction-Oriented Database Recovery Theo Haerder, Andreas Reuter, 1983 ARIES: A Transaction Recovery Method.
Transactional Information Systems:
Outline Introduction Background Distributed DBMS Architecture
Recovery System.
Chapter 19: Recovery System
Database Recovery 1 Purpose of Database Recovery
Transactional Information Systems:
Crash Recovery Chapter 18
Crash Recovery Chapter 18
Presentation transcript:

6/18/2015Transactional Information Systems15-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen “Teamwork is essential. It allows you to blame someone else.”(Anonymous) © 2002 Morgan Kaufmann ISBN

6/18/2015Transactional Information Systems15-2 Part III: Recovery 11 Transaction Recovery 12 Crash Recovery: Notion of Correctness 13 Page-Model Crash Recovery Algorithms 14 Object-Model Crash Recovery Algorithms 15 Special Issues of Recovery 16 Media Recovery 17 Application Recovery

6/18/2015Transactional Information Systems15-3 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned “Success is a lousy teacher. ” (Bill Gates)

6/18/2015Transactional Information Systems Level Logging for Index Operations log entries for insert ij on B-tree path along pages r, n, l, with split of l into l and m: write ij1 (l) write ij2 (m) write ij3 (n) insert -1 ij  writes the original contents of l twice on the log (undo/redo info for l and m)

6/18/2015Transactional Information Systems15-5 Logical Logging for Redo of Index Splits log only L 1 operation for transaction redo (to save log space) and rely on careful flush ordering for subtransaction atomicity possible cases after a crash (because of arbitrary page flushing): 1) l, m, and n are in old state (none were flushed) 2) l is new, m and n are old 3) m is new, l and n are old 4) n is new, l and m are old 5) l and m are new, n is old 6) l and n are new, m is old 7) m and n are new, l is old 8) l, m, and n are in new state (all were flushed) must avoid cases 2 and 6 (all other cases are recoverable) by enforcing flush order m  l  n in addition, posting (n) could be detached from half-split (l and m) by link technique, so that m  l is sufficient

6/18/2015Transactional Information Systems15-6 The Need for Redo and Flush-Order Dependencies time LSN 100 copy (a, b) readset: {a} writeset: {b} LSN 200 copy (c, a) readset: {c} writeset: {a} redo dependency page b written by: 100 page a written by: 200 flush-order dependency Problem: if a were flushed before b and the system crashed in between, the copy operation with LSN 100 could not be redone

6/18/2015Transactional Information Systems15-7 Redo and Flush-Order Dependencies Opportunity: operations on large objects (BLOBs, stored procedure execution state) can achieve significant savings on log space by logical logging Difficulty:redo of partially surviving multi-page operations Definition: There is a redo dependency from logged operation f(...) to logged operation g(...) if f precedes g on the log and there exists page x such that x  readset(f) and x  writeset(g) Definition: There is a flush order dependency from page y to page z (i.e., page y must be flushed before page z) if there are logged operations f and g with y  writeset(f) and z  writeset(g) and a redo dependency from f to g.

6/18/2015Transactional Information Systems15-8 Cyclic Flush-Order Dependencies time LSN 100 copy (a, b) readset: {a} writeset: {b} LSN 300 merge (b, c, a) readset: {b, c} writeset: {a} redo dependencies page b written by: 100, 400 page a written by: 200, 300 flush-order dependencies LSN 400 merge (a, c, b) readset: {a, c} writeset: {b} LSN 200 copy (c, a) readset: {c} writeset: {a} Need to flush all pages on the cycle atomically or force physical, full-write, log entries (i.e., after-images) atomically

6/18/2015Transactional Information Systems15-9 Intra-Operation Flush-Order Dependencies time LSN 500 swap (a, b) readset: {a, b} writeset: {a, b} redo dependencies (read-write dependencies) page a written by: 500 page b written by: 500 flush-order dependencies LSN 1000 half-split (l) readset: {l} writeset: {l, m} page m written by: 1000 page l written by: 1000

6/18/2015Transactional Information Systems15-10 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned

6/18/2015Transactional Information Systems15-11 The Case for Partial Rollbacks Additional calls during normal operation (for partial rollbacks to resolve deadlocks or application-defined intra-transaction consistency points): save (trans)  s restore (trans, s) Approach: savepoints are recorded on the log, and restore creates CLEs Problem with nested rollbacks: l 1 (x) w 1 (x) l 1 (y) w 1 (y) w 1 -1 (y) u 1 (y) l 2 (y) w 2 (y) c 2 l 1 (y) (w 1 -1 (y) -1 w -1 (y) w -1 (x)  not prefix reducible Problem eliminated with NextUndoSeqNo backward chaining: l 1 (x) w 1 (x) l 1 (y) w 1 (y) w 1 -1 (y) u 1 (y) l 2 (y) w 2 (y) c 2 w -1 (x)  prefix reducible

6/18/2015Transactional Information Systems15-12 NextUndoSeqNo Backward Chain for Nested Rollbacks log during normal operation 10: write (t i,a) NextUndoSeqNo backward chain begin (t i ) 30: save- point (t i ) 20: write (t i,b) 45: save- point (t i ) 40: write (t i,c) 65: restore (t i, 45) 50: write (t i,d) 63: CLE (t i,e, 60) 70: write (t i,f) 73: CLE (t i,f, 70) 74: CLE (t i,c, 40) 75: restore (t i, 30) 83: CLE (t i,b, 20) crash first restore initiated second restore initiated abort initiated 60: write (t i,e) 64: CLE (t i,d, 50) log continued during restart 84: CLE (t i,a, 10) 85: roll- back (t i )...

6/18/2015Transactional Information Systems15-13 Savepoint Algorithm savepoint (transid): newlogentry.LogSeqNo := new sequence number; newlogentry.ActionType := savepoint; newlogentry.PreviousSeqNo := ActiveTrans[transid].LastSeqNo; newlogentry.NextUndoSeqNo := ActiveTrans[transid].LastSeqNo; ActiveTrans[transid].LastSeqNo := newlogentry.LogSeqNo; LogBuffer += newlogentry;

6/18/2015Transactional Information Systems15-14 Restore Algorithm restore (transid, s): logentry := ActiveTrans[transid].LastSeqNo; while logentry is not equal to s do if logentry.ActionType = write or full-write then newlogentry.LogSeqNo := new sequence number; newlogentry.ActionType := compensation; newlogentry.PreviousSeqNo:=ActiveTrans[transid].LastSeqNo; newlogentry.RedoInfo := inverse action of the action in logentry; newlogentry.NextUndoSeqNo := logentry.PreviousSeqNo; ActiveTrans[transid].LastSeqNo := newlogentry.LogSeqNo; LogBuffer += newlogentry; write (logentry.PageNo) according to logentry.UndoInfo; logentry := logentry.PreviousSeqNo; end /*if*/; if logentry.ActionType = restore then logentry := logentry.NextUndoSeqNo; end /*if*/ end /*while*/ newlogentry.LogSeqNo := new sequence number; newlogentry.ActionType := restore; newlogentry.TransId := transid; newlogentry.PreviousSeqNo := ActiveTrans[transid].LastSeqNo; newlogentry.NextUndoSeqNo := s.NextUndoSeqNo; LogBuffer += newlogentry;

6/18/2015Transactional Information Systems15-15 Savepoints in Nested Transactions t1t1 t 11 t 12 t 111 t 112 t 1121 t 1122 t 121 t 122 t 1221 t 1222 w(a) w(b) w(c) w(d) w(e)w(j) w(f) w(g) w(h) 0savepoints: beginnings of active subtransactions are feasible savepoints

6/18/2015Transactional Information Systems15-16 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned

6/18/2015Transactional Information Systems15-17 Exploiting Parallelism During Restart Parallelize redo by spawning multiple threads for different page subsets (driven by DirtyPages list), assuming physical or physiological log entries Parallelize log scans by partitioning the stable log across multiple disks based on hash values of page numbers Parallelize undo by spawning multiple threads for different loser transactions Incremental restart with early admission of new transactions right after redo by re-acquiring locks of loser transactions (or coarser locks) during redo of history, or right after log analysis by allowing access, already during redo, to all non-dirty pages p with p.PageSeqNo < OldestUndoLSN (p)

6/18/2015Transactional Information Systems15-18 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned

6/18/2015Transactional Information Systems15-19 Considerations for Main-Memory Data Servers Specific opportunities: crash recovery amounts to reloading the database  physical (after-image) logging attractive eager page flushing in the background amounts to “fuzzy checkpoint” in-memory versioning (with no-steal caching) can eliminate writing undo information to stable log log buffer forcing can be avoided by “safe RAM” incremental, page-wise, redo (and undo) on demand may deviate from chronological order Main-memory databases are particularly attractive for telecom or financial apps with < 50 GB of data, fairly uniform workload of short transactions, and very stringent response time requirements

6/18/2015Transactional Information Systems15-20 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned

6/18/2015Transactional Information Systems15-21 Architecture of Data-Sharing Clusters Data-sharing cluster: multiple computers (as data servers) with local memory and shared disks via high-speed interconnect for load sharing, failure isolation, and very high availability During normal operation: transactions initiated and executed locally pages transferred to local caches on demand (data shipping) coherency control eliminates stale page copies: multiple caches can hold up-to-date copies read-only upon update in one cache, all other caches drop their copies can be combined with page-model or object-model CC logging to global log on shared disk or partitioned log with static assignment of server responsibilities or private logs for each server for perfect scalability Upon failure of a single server: failover to surviving servers

6/18/2015Transactional Information Systems15-22 Illustration of Data-Sharing Cluster Stable Database Stable Log page x page q page y page p 4011 Cache page x page q page p 4215 Server 1 Cache page q 3155 page p 4215 Server 2 Cache page q page y Server n... Stable Log Stable Log Interconnect write(p,...) write(x,...) write(y,...) write(x,...) write(x,...)

6/18/2015Transactional Information Systems15-23 Recovery with “Private” Logs needs page-wise globally monotonic sequence numbers, e.g., upon update to page p (without any extra messages): p.PageSeqNo := max{p.PageSeqNo, largest local seq no} + 1 surviving server performs crash recovery on behalf of the failed one, with analyis pass on private log of failed seerver to identify losers, scanning and “merging” all private logs for redo, possibly with DirtyPages info from the failed server, (merging can be avoided by flushing before each page transfer across servers), scanning private log of failed server for undo recovery from failure of entire cluster needs analysis passes, merged redo passes, and undo passes over all private logs

6/18/2015Transactional Information Systems15-24 Chapter 15: Special Issues of Recovery 15.2 Logical Logging for Indexes and Large Objects 15.3 Intra-transaction Savepoints 15.4 Exploiting Parallelism During Restart 15.5 Main-Memory Data Servers 15.6 Data-Sharing Clusters 15.7 Lessons Learned

6/18/2015Transactional Information Systems15-25 Lessons Learned The redo-history algorithms from Chapter 13 and 14 can be extended in a fairly localized and incremental manner. Practically important extensions are:  logical log entries for multi-page operations  as an additional option  intra-transaction savepoints and partial rollbacks  parallelized and incremental restart for higher availability  special architectures like - main-memory data servers - for sub-second responsiveness and - data-sharing clusters - for very high availability