6/25/2015Transactional Information Systems16-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery.

Slides:



Advertisements
Similar presentations
Disk Arrays COEN 180. Large Storage Systems Collection of disks to store large amount of data. Performance advantage: Each drive can satisfy only so many.
Advertisements

Redundant Array of Independent Disks (RAID) Striping of data across multiple media for expansion, performance and reliability.
DISK FAILURES PROF. T.Y.LIN CS-257 Presenter: Shailesh Benake(104)
RAID Redundant Array of Inexpensive Disks Presented by Greg Briggs.
RAID (Redundant Arrays of Independent Disks). Disk organization technique that manages a large number of disks, providing a view of a single disk of High.
RAID Redundant Array of Independent Disks
Chapter 16: Recovery System
CSCE430/830 Computer Architecture
Parity Declustering for Continous Operation in Redundant Disk Arrays Mark Holland, Garth A. Gibson.
CS-3013 & CS-502, Summer 2006 More on File Systems1 More on Disks and File Systems CS-3013 & CS-502 Operating Systems.
R.A.I.D. Copyright © 2005 by James Hug Redundant Array of Independent (or Inexpensive) Disks.
Chapter 3 Presented by: Anupam Mittal.  Data protection: Concept of RAID and its Components Data Protection: RAID - 2.
RAID: HIGH PERFORMANCE, RELIABLE SECONDARY STORAGE P. M. Chen, U. Michigan E. K. Lee, DEC SRC G. A. Gibson, CMU R. H. Katz, U. C. Berkeley D. A. Patterson,
REDUNDANT ARRAY OF INEXPENSIVE DISCS RAID. What is RAID ? RAID is an acronym for Redundant Array of Independent Drives (or Disks), also known as Redundant.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
Recovery CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
CSCI 3140 Module 8 – Database Recovery Theodore Chiasson Dalhousie University.
Chapter 19 Database Recovery Techniques
1 Minggu 8, Pertemuan 16 Transaction Management (cont.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
6/18/2015Transactional Information Systems15-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Computer ArchitectureFall 2008 © November 12, 2007 Nael Abu-Ghazaleh Lecture 24 Disk IO.
©Silberschatz, Korth and Sudarshan17.1Database System Concepts 3 rd Edition Chapter 17: Recovery System Failure Classification Storage Structure Recovery.
Data Representation Recovery from Disk Crashes – 13.4 Presented By: Deepti Bhardwaj Roll No. 223_103 SJSU ID:
July 16, 2015ICS 5411 Coping With System Failure Chapter 17 of GUW.
Backup and Recovery Part 1.
Servers Redundant Array of Inexpensive Disks (RAID) –A group of hard disks is called a disk array FIGURE Server with redundant NICs.
TRANSACTIONS A sequence of SQL statements to be executed "together“ as a unit: A money transfer transaction: Reasons for Transactions : Concurrency control.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Backup & Recovery Concepts for Oracle Database
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
Redundant Array of Inexpensive Disks (RAID). Redundant Arrays of Disks Files are "striped" across multiple spindles Redundancy yields high data availability.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 6 – RAID ©Manuel Rodriguez.
Storage Systems CSE 598d, Spring 2007 Lecture 5: Redundant Arrays of Inexpensive Disks Feb 8, 2007.
Chapter 6 RAID. Chapter 6 — Storage and Other I/O Topics — 2 RAID Redundant Array of Inexpensive (Independent) Disks Use multiple smaller disks (c.f.
Lecture 4 1 Reliability vs Availability Reliability: Is anything broken? Availability: Is the system still available to the user?
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Storage Systems.
Lecture 9 of Advanced Databases Storage and File Structure (Part II) Instructor: Mr.Ahmed Al Astal.
7202ICT – Database Administration
Switch off your Mobiles Phones or Change Profile to Silent Mode.
HANDLING FAILURES. Warning This is a first draft I welcome your corrections.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
Recovery System By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
"1"1 Introduction to Managing Data " Describe problems associated with managing large numbers of disks " List requirements for easily managing large amounts.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Recovery.
Chapter 17: Recovery System
Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 17: Recovery System.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
Chapter 14: Mass-Storage Systems Disk Structure. Disk Scheduling. RAID.
Database Recovery Zheng (Godric) Gu. Transaction Concept Storage Structure Failure Classification Log-Based Recovery Deferred Database Modification Immediate.
Recovery Techniques 1.Recovery concepts 2.Recovery techniques based on Deferred Update –No-UNDO/REDO 3.Recovery techniques based on Immediate Update –UNDO/REDO.
© 1997 UW CSE 11/24/97O-1 Recovery Concepts Chapter 18 (lightly)
Magnetic Disks Have cylinders, sectors platters, tracks, heads virtual and real disk blocks (x cylinders, y heads, z sectors per track) Relatively slow,

Remote Backup Systems.
Database Recovery Techniques
AS ICT Module 2 Objectives: Security of Data
Maximum Availability Architecture Enterprise Technology Centre.
File Processing : Recovery
Transactional Information Systems:
RAID RAID Mukesh N Tekwani
ICOM 6005 – Database Management Systems Design
Overview Continuation from Monday (File system implementation)
UNIT IV RAID.
Recovery System.
Database Recovery 1 Purpose of Database Recovery
Transactional Information Systems:
RAID RAID Mukesh N Tekwani April 23, 2019
Remote Backup Systems.
Presentation transcript:

6/25/2015Transactional Information Systems16-1 Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery Gerhard Weikum and Gottfried Vossen “Teamwork is essential. It allows you to blame someone else.”(Anonymous) © 2002 Morgan Kaufmann ISBN

6/25/2015Transactional Information Systems16-2 Part III: Recovery 11 Transaction Recovery 12 Crash Recovery: Notion of Correctness 13 Page-Model Crash Recovery Algorithms 14 Object-Model Crash Recovery Algorithms 15 Special Issues of Recovery 16 Media Recovery 17 Application Recovery

6/25/2015Transactional Information Systems16-3 Chapter 16: Media Recovery 16.2 Log-based Method Database Backup and Archive Logging Database Restore Analyis of MTTDL 16.3 Storage Redundancy Techniques Based on Mirroring Techniques Based on Error-Correcting Codes 16.4 Disaster Recovery 16.5 Lessons Learned “More than any time in history mankind faces a crossroads. One path leads to despair and utter hopelessness, the other to total extinction. Let us pray that we have the wisdom to choose correctly. ” (Woody Allen)

6/25/2015Transactional Information Systems16-4 Failure Model and Assessment Criteria Failures whose repair requires media recovery: disk failures (damaged media) corrupted pages on disk (single-block read error) environmental failures fire, water damage, disasters serious bugs in operational server software erroneous user input Assessment criteria: availability (MTTF / (MTTF + MTTR) survivability level: number of simultaneous failures that can be repaired mean time to data loss (MTTDL)

6/25/2015Transactional Information Systems16-5 Log-based Media Recovery with limited, pragmatic form of environmental recovery by selectively skipping log entries 2-step recovery: replace failed disk (or remap corrupted disk blocks) and reload data from backup copy redo history using archive log from begin of last completed complete backup; can cope with rollbacks and crashes/restarts between time of backup and media recovery, using CLEs; then undo losers (like in crash recovery)

6/25/2015Transactional Information Systems16-6 Components for Log-based Media Recovery... begin- backup June redo pass undo pass write (x,...) end- backup June 27 begin- backup July 4 end (t i ) begin (t i ) begin (t k ) stable database backup June 27 backup June 20 archive log database disk failure soft crash Media RecoveryLSN... files for the stable log shadow database backup July 4

6/25/2015Transactional Information Systems16-7 Database Backup and Archive Logging complete or incremental (modified pages only) online backup of selected tablespaces: creates “fuzzy” copy on backup disk(s) or tape containing updates of active transactions scans page-mapping table and resets modified flags scan position saved in checkpoint log entries may copy (stale) pages directly from disk (bypassing cache) archive log copies (replicates) all log entries from stable log since the begin of the last completed complete backup can garbage-collect log entries older than MediaRecoveryLSN := min {begin-backup log entry of most recent completed backup, SystemRedoLSN as of begin-backup, current OldestUndoLSN}

6/25/2015Transactional Information Systems16-8 Database Restore restore (pageset): for each page in pageset do identify the most recent (incremental or complete) backup that contains a copy of the page; copy the page onto the replaced disk; end /*for*/; perform redo pass on the archive log using the redo-history algorithm, starting from MediaRecoveryLSN and ignoring all log entries not referring to pageset; perform analysis pass on the log, starting from most recent checkpoint, to identify loser transactions; perform undo pass on the log for loser transactions; can be accelerated by parallelizing redo, offline merging multiple incremental backups into complete backup, and/or applying redo offline to backup copy (“shadow database”)

6/25/2015Transactional Information Systems16-9 Correctness and Quality of Log-based Media Recovery Theorem 16.1: The backup/log-based media recovery algorithm provides correct recovery after media failures by reconstructing the data such that it captures exactly all winner transactions in the original serialization order.

6/25/2015Transactional Information Systems16-10 Analysis of MTTDL (1) db ok; backup and log ok db failed; backup and log ok db failed; backup or log failed db ok; backup or log failed 1 / MTTR backup 2 / MTTF 1 / MTTF 2 / MTTF 1 / MTTF 1 / MTTR recovery Markov chain model:

6/25/2015Transactional Information Systems16-11 Analysis of MTTDL (2) r ij : transition rate from state i to state j E ij = E[time from entering state i until entering state j] H i = E[time between entering and leaving state i] = p ik = P[transition from i to k | state i is left] = solve for given Markov chain: E 12 = H 1 + p 13 E 32 E 13 = H 1 + p 12 E23 E 14 = H 1 + p 12 E 24 + p 13 E 34 E 21 = H 2 E 23 = H 2 + p 21 E 13 E 24 = H 2 + p 21 E 14 E 31 = H 3 E 32 = H 3 + p 31 E 12 E 34 = H 3 + p 31 E 14 yielding

6/25/2015Transactional Information Systems16-12 Chapter 16: Media Recovery 16.2 Log-based Method 16.3 Storage Redundancy Techniques Based on Mirroring Techniques Based on Error-Correcting Codes 16.4 Disaster Recovery 16.5 Lessons Learned

6/25/2015Transactional Information Systems16-13 Mirrored Disk Pairs storage redundancy techniques provide protection against disk failure with continuous availability; recovery rebuilds contents of failed disk on hot spare disk 1 block 1.1 = 2.1' block 1.2 = 2.2' block 1.3 = 2.3' block 1.4 = 2.4'... disk 2 block 2.1 = 1.1' block 2.2 = 1.2' block 2.3 = 1.3' block 2.4 =1.4'... disk 3 block 3.1 =4.1' block 3.2 =4.2' block 3.3 =4.3' block 3.4 =4.4'... disk 4 block 4.1 = 3.1' block 4.2 = 3.2' block 4.3 = 3.3' block 4.4 =3.4'... mirrored disk pair writes routed to both disks of a pair, reads optimized for seek time or load balance

6/25/2015Transactional Information Systems16-14 Declustered Mirroring disk = 2.m+1' 1.2 = 3.m+2' 1.3 = 4.m+3' 1.4 = 2.m+4'... disk = 3.m+1' 2.2 = 4.m+2' 2.3 = 1.m+3' 2.4 =3.m+4'... disk =4.m+1' 3.2 =1.m+2' 3.3 =4.m+3' 3.4 =4.m+4'... disk = 1.m+1' 4.2 = 2.m+2' 4.3 = 3.m+3' 4.4 =1.m+4'... 2.m+1 = 1.1' 2.m+2 = 4.2' 2.m+3 = 3.3' 2.m+4 = 1.4'... 3.m+1 =2.1' 3.m+2 =1.2' 3.m+3 =4.3' 3.m+4 =2.4'... 4.m+1 = 3.1' 4.m+2 = 2.2' 4.m+3 = 1.3' 4.m+4 = 3.4'... 1.m+1 = 4.1' 1.m+2 = 3.2' 1.m+3 = 2.3' 1.m+4 = 4.4'... for group size G, replicas of blocks on disk j are placed round-robin on disks j+1,..., G, 1,..., j-1  copy of block j.k of disk j is on disk (j+1+(k mod (G-1))) mod G +1  less performance degradation during rebuild from G-1 disks

6/25/2015Transactional Information Systems16-15 RAID-4: Parity Groups for each block k of disks 1,..., G maintain a parity block on a dedicated parity disk G+1 upon write to block k of disk j: new parity (1.k,..., G.k) on parity disk G+1 := old parity (1.k,..., G.k)  old contents (j.k)  new contents (j.k) upon failure of disk j, block j.k can be reconstructed from blocks 1.k,..., (j-1).k, (j+1).k,..., G.k and the parity block (G+1).k RAID (redundant arrays of independent disks): lower storage overhead than mirroring, but higher write cost

6/25/2015Transactional Information Systems16-16 Illustration of RAID-4 (Parity Groups) disk 1 block 1.1 block 1.2 block 1.3 block disk 2 block 2.1 block 2.2 block 2.3 block disk N block N.1 block N.2 block N.3 block N.4... parity disk  ( N.1)  ( N.2)  ( N.3)  ( N.4)... spare disk  during normal operation disk 1 block 1.1 block 1.2 block 1.3 block disk 2disk N block N.1 block N.2 block N.3 block N.4... parity disk  ( N.1)  ( N.2)  ( N.3)  ( N.4)... spare disk  during repair block 2.1 block 2.2 block 2.3 block

6/25/2015Transactional Information Systems16-17 RAID-5: Parity Striping eliminates the bottleneck of single parity disk by placing the parity blocks of a group round-robin across the group‘s disks (striping): parity block for N blocks with number k resides on disk (k+N-1) mod (N+1) +1 disk 1 block 1.1  ( N+1.2) block 1.3 block disk 2 block 2.1 block 2.2  ( ) block disk 3 block N.1 block N.2 block 3.3  ( )... disk N+1  ( N.1) block N+1.2 block N+1.3 block N

6/25/2015Transactional Information Systems16-18 Extended RAID Systems Reducing the small-write penalty: parity logging (possibly in safe RAM) to defer and batch parity writes floating parity blocks written to convenient tracks (with dynamically adjusted block-mapping table) parity block declustering (clustered RAID): construct parity blocks for groups of G blocks and spread them uniformly across C > G+1 disks  shorter rebuild because of lower per-disk extra load in degraded mode Coping with multiple disk failures: use appropriate error-correcting code (e.g., Reed-Solomon code) (RAID-6) to mask two disk failures within a disk group

6/25/2015Transactional Information Systems16-19 Parity-Block Declustering (Clustered RAID) disk 1 group 1 group 2... disk 2 group 1 group 2... disk 3 group 1 group 2... disk 4 parity 1... disk 5 parity 2... group 3 parity 3group 3 group 4parity 4group 4 group 5 parity 5 Requirements for placement of n parity block groups: for each group of G+1 blocks, the blocks must be on different disk each disk holds n/C parity blocks for the m=n(G+1)/C groups represented by the blocks of a given disk, the mG blocks that belong to these groups are evenly distributed across all other C-1 disks  combinatorial block design C=5 G=3

6/25/2015Transactional Information Systems16-20 Rebuild Algorithms rebuild failed disk online without interrupting accesses to the data that resided on the failed disk  reconstruct blocks of the failed disk on demand optimizations: redirect disk-reads to the new disk for blocks that are already rebuilt, maintain parity like during normal operation for blocks that are already rebuilt cache blocks that are reconstructed for regular accesses and write them to the new disk when convenient (piggyback rebuilding work on regular disk-reads, thus rebuilding popular blocks early)

6/25/2015Transactional Information Systems16-21 Disk-Read Optimization in Degraded Mode disk-read (block (N+1).k): if block (N+1).k has already been rebuilt then fetch (block (N+1).k); else fetch (block 1.k);...; fetch (block N.k) using the algorithm as during normal operation; contents of block (N+1).k := 1.k XOR 2.k XOR... XOR N.k; return the contents of block (N+1).k; flush (block (N+1).k) at the discretion of the disk scheduling for disk N+1; mark block (N+1).k as rebuilt; end /*if*/;

6/25/2015Transactional Information Systems16-22 Disk-Write Optimization in Degraded Mode disk-write (block (N+1).k): if block (N+1).k has already been rebuilt then fetch (block (N+1).k) unless the block is still available in RAM; fetch (parity block j.k of the parity group to which (N+1).k belongs); else fetch (block 1.k);...; fetch (block N.k); old contents of block (N+1).k := 1.k XOR 2.k XOR... XOR N.k; let j.k be the parity block of this parity group; end /*if*/; compute new parity block j.k := old contents of block j.k XOR old contents of block (N+1).k XOR new contents of block (N+1).k flush (block (N+1).k) using the block's new contents; flush (block j.k) using new parity as block contents; mark block (N+1).k as rebuilt;

6/25/2015Transactional Information Systems16-23 Optimized Online Rebuild Algorithm rebuild (disk N+1) on spare disk: for each block k of the failed disk N+1 do if the block has not yet been rebuilt disk-write (block (N+1).k) using the algorithm for disk-writes in degraded mode, with low priority for the resulting fetch and flush I/O requests; end /*if*/; end /*for*/;

6/25/2015Transactional Information Systems16-24 Chapter 16: Media Recovery 16.2 Log-based Method 16.3 Storage Redundancy Techniques Based on Mirroring Techniques Based on Error-Correcting Codes 16.4 Disaster Recovery 16.5 Lessons Learned

6/25/2015Transactional Information Systems16-25 Specific Considerations for Disaster Recovery Backup resides at remote site Maintain archive log at remote site by log shipping: within distributed transactions (or even replicate the database remotely) without transactional control, but preserving the serialization order of log entries (with the risk of losing the tail of the log) Backup server could even be “hot standby” (with failover similar to data-sharing cluster architecture)

6/25/2015Transactional Information Systems16-26 Chapter 16: Media Recovery 16.2 Log-based Method 16.3 Storage Redundancy Techniques Based on Mirroring Techniques Based on Error-Correcting Codes 16.4 Disaster Recovery 16.5 Lessons Learned

6/25/2015Transactional Information Systems16-27 Lessons Learned The redo-history recovery algorithm is appropriate also for media recovery, based on a backup database and an archive log: MediaRecoveryLSN marks log-truncation and redo starting point Log-based media recovery is the most versatile method; storage-redundancy techniques are attractive for continuous availability Mirroring (with declustering) and RAID-5 are commodities, clustered RAID is the best technique in terms of MTTDL and MTTR, but complex to implement (needs block design) Disaster recovery can adopt media recovery techniques with remote backup/replication site