Presentation is loading. Please wait.

Presentation is loading. Please wait.

DB-15: Inside The Recovery Subsystem Plan to commit; Be prepared to rollback. Richard Banville Fellow, Technology and Product Architecture Progress OpenEdge.

Similar presentations


Presentation on theme: "DB-15: Inside The Recovery Subsystem Plan to commit; Be prepared to rollback. Richard Banville Fellow, Technology and Product Architecture Progress OpenEdge."— Presentation transcript:

1 DB-15: Inside The Recovery Subsystem Plan to commit; Be prepared to rollback. Richard Banville Fellow, Technology and Product Architecture Progress OpenEdge

2 © 2007 Progress Software Corporation 2 DB-15: Inside the Recovery Subsystem Recovery Types  Transaction Recovery* Before image rollback/undo and crash recovery  Hard Failure Recovery Roll forward after images Point in time, transaction, retry  Coordinated distributed txn consistency OpenEdge ® 2PC - Prepare Phase, Commit Phase  Heterogeneous distributed txn consistency (JTA) External distributed transaction coordinator Requires application changes Available for OpenEdge SQL only * Before Imaging is the focus of this presentation

3 © 2007 Progress Software Corporation 3 DB-15: Inside the Recovery Subsystem Agenda  The BI Units of Measure  Some Simple Rules  General Processing (the fun stuff)  Reliability Switches  Summary

4 © 2007 Progress Software Corporation 4 DB-15: Inside the Recovery Subsystem BI Layout: Notes and Blocks Notes are the basis for recording change in the database BI made up of many Notes Notes are variable sized Notes are organized in order of operation Notes are stored into BI blocks BI block size can be customized (1-16K) I/O is performed in BI Blocksize

5 © 2007 Progress Software Corporation 5 DB-15: Inside the Recovery Subsystem BI Layout: Clusters Notes are stored into BI blocks BI Block size can be customized (1-16K) I/O is performed in BI Blocksize Blocks are grouped to form a cluster BI cluster size can be customized (16KB – 256MB) Size affects checkpoint frequency (among other things)

6 © 2007 Progress Software Corporation 6 DB-15: Inside the Recovery Subsystem BI Layout: Clusters Clusters are allocated as needed Clusters are logically joined and ordered into a ring Only ever one cluster accepting BI writes

7 © 2007 Progress Software Corporation 7 DB-15: Inside the Recovery Subsystem BI Layout: Storage BI File The Primary Recovery Area: BI data stored in the extents of area #2 of the database It grows as needed Space is re-used when possible

8 © 2007 Progress Software Corporation 8 DB-15: Inside the Recovery Subsystem What’s in a note? Trid: 81180 code = RL_RMCR version = 2 Trid: 81180 area = 8 dbkey = 14528 update counter = 4770 Header Note Specific Info Data Portion (if needed)  Length & note version  Note code/identifier  Associates action  Note type  Transaction Id  Block pointer & area  Block update counter  Record #  Table number  Size of record  Split information  Block change data  i.e, Record data itself  Only if needed

9 © 2007 Progress Software Corporation 9 DB-15: Inside the Recovery Subsystem AI / BI Relationship  File I/O BI written first AI/BI Note Headers the same (OE 10.0A) Slightly less data written to AI  Rollforward Reads BI for rollback Does NOT record AI data DOES record “some” BI data (uses –i) Why is –i OK?

10 © 2007 Progress Software Corporation 10 DB-15: Inside the Recovery Subsystem Agenda  The BI Units of Measure  Some Simple Rules  General Processing (the fun stuff)  Reliability Switches  Summary

11 © 2007 Progress Software Corporation 11 DB-15: Inside the Recovery Subsystem Rules to live by  #1 - Write ahead logging (WAL) Recovery log notes written BEFORE data –Assures atomic and durable transactions –BI, AI - reliable write I/O –Can relax data write I/O  Write prior to BI-reuse  Cluster close  Missing data applied by redo  Deferring writes allows multiple updates to occur with a single I/O  #2 - Write ordering rule (FS and hardware) AI, BI writes get to disk in order requested

12 © 2007 Progress Software Corporation 12 DB-15: Inside the Recovery Subsystem Rules to follow  #3 - BI Space Reuse Only when cluster is closed Cluster closes when its last transaction ends –Checkpoint DOES NOT close a cluster –Checkpoint occurs when cluster fills up  #4 - Exclusive Block Access When changing data in database  #5 - Atomic Physical Changes Such as block chain manipulations Enforced by internal TXE mechanism SYSTEM ERROR: User 5 died during micro txn.

13 © 2007 Progress Software Corporation 13 DB-15: Inside the Recovery Subsystem Rule  #6 - Without exception: All DB changes are recorded in recovery log.

14 © 2007 Progress Software Corporation 14 DB-15: Inside the Recovery Subsystem Rules were meant to be broken  #6 - Without exception: All DB changes are recorded in recovery log.  Exception: Control Area (area #1) changes are not logged. –Why should I care? –Allows structural changes w/o affecting recovery  Such as adding space while in roll forward. –Recovery Mechanism: Builddb

15 © 2007 Progress Software Corporation 15 DB-15: Inside the Recovery Subsystem Agenda  The BI Units of Measure  Some Simple Rules  General Processing (the fun stuff)  Reliability Switches  Summary

16 © 2007 Progress Software Corporation 16 DB-15: Inside the Recovery Subsystem Forward Processing  Locate/Lock the data block to change Not all notes require a block –Transaction begin, end Not all DB changes require a block! –Acquiring additional space –Certain index sub-operations  Ensure begin transaction recorded  Record the change in the BI log (via the BI buffer pool) So you want to perform a database action

17 © 2007 Progress Software Corporation 17 DB-15: Inside the Recovery Subsystem Rollback Processing BI Buffer Pool – Recording a change -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Free List 15 Current Input Buffer 9 Backout Buffer 12 Backout Buffer BI Current Output Buffer New Notes (Actions) Forward Processing

18 © 2007 Progress Software Corporation 18 DB-15: Inside the Recovery Subsystem BI Buffer Pool – Recording a change -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Free List BI Current Output Buffer PROMON: Total BI Writes Records (notes) written Busy buffer waits Empty buffer waits Partial Writes New Notes (Actions) Forward Processing Is it OK to buffer dirty BI blocks? YES Is it OK to buffer committed BI data? Delayed commit is up to you!

19 © 2007 Progress Software Corporation 19 DB-15: Inside the Recovery Subsystem Forward Processing (continued)  Finally perform the DB action (make the change) Logical, physical or a mix  Data block’s update ctr is incremented Identifies if a noted change made it to disk yet Ensures changes re-applied in order  Dependency counter maintained in ctlr struct Ensures associated BI flushed if –B eviction  User may be forced to do (expensive) BI I/O On -B eviction or No BI buffers available Avoid with APWs, BIW and -bibufs The BI Note has been written…

20 © 2007 Progress Software Corporation 20 DB-15: Inside the Recovery Subsystem Helping avoid OLTP BI I/O

21 © 2007 Progress Software Corporation 21 DB-15: Inside the Recovery Subsystem Broker Processing -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Current Output Buffer Free List BI Delayed commit (Durability) Based on –Mf value, Broker may flush BI buffers to disk For aged txn ends Broker PROMON: Total BI Writes Records (notes) written Partial Writes New Notes (Actions) Helping Avoid OLTP BI I/O

22 © 2007 Progress Software Corporation 22 DB-15: Inside the Recovery Subsystem BIW Processing -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Current Output Buffer Free List BI B I W PROMON: Total BI Writes Records (notes) written BIW Writes New Notes (Actions) Partial Writes Helping Avoid OLTP BI I/O

23 © 2007 Progress Software Corporation 23 DB-15: Inside the Recovery Subsystem APW Processing -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Current Output Buffer Free List BI A P W db Checkpoint Queue 172 128 Associated BI Note (dependency ctr) Data Blocks New Notes (Actions) WAL 12 Helping Avoid OLTP BI I/O

24 © 2007 Progress Software Corporation 24 DB-15: Inside the Recovery Subsystem BI Clusters And Checkpointing

25 © 2007 Progress Software Corporation 25 DB-15: Inside the Recovery Subsystem The Precious Ring BI Files 4231 Database BI Cluster Layout 4231 -B buffer pool 1 3231 30 29 Modified Queue Current Out Buffer -bibufs BI blocks are grouped together to form a cluster of blocks. The cluster of blocks are logically joined together in a ring.

26 © 2007 Progress Software Corporation 26 DB-15: Inside the Recovery Subsystem Checkpoint – Synchronization point BI Files 4231 Database BI Cluster Layout 4231 -B buffer pool 1 3231 30 29 Modified Queue Current Out Buffer Db buffer pool scanned Db buffers previously marked for chkpt are written out (OUCH!) Dirty buffers are marked for chkpt & put on checkpoint queue File system cache is synchronized File System Cache No more sync delay -bibufs Fuzzy checkpointing avoids I/O All Database Changes Halted! BI buffer pool flushed

27 © 2007 Progress Software Corporation 27 DB-15: Inside the Recovery Subsystem Checkpoint (with –directio) BI Files 4231 Database BI Cluster Layout 4231 -B buffer pool 1 (unbuffered I/O) All Database Changes Halted! Db buffer pool scanned Db buffers marked for chkpt are written out Dirty buffers are marked for chkpt & put on checkpoint queue Fuzzy checkpointing avoids I/O BI buffer pool flushed

28 © 2007 Progress Software Corporation 28 DB-15: Inside the Recovery Subsystem The APW A P W db APW Queue 172128 Checkpoint Queue 2561024512 -B Buffer Pool 11521664 … PROMON: Buffers Flushed at checkpoint BIW Writes The APWs help w/checkpoints too

29 © 2007 Progress Software Corporation 29 DB-15: Inside the Recovery Subsystem Checkpoint – Size Does Matter  Larger cluster sizes Fewer checkpoints (sync points) –Will a crash result in additional lost data? Longer recovery time –Recovery starts at last cluster - 1 Longer BI format time (runtime) Longer BI format time after truncate –Use at least one fixed length extent  Also use a variable length extent –Use bigrow

30 © 2007 Progress Software Corporation 30 DB-15: Inside the Recovery Subsystem Checkpoints and Promon Seeing is believing… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 Ooops!!

31 © 2007 Progress Software Corporation 31 DB-15: Inside the Recovery Subsystem Checkpoints and Promon Seeing is believing… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 Len: begin to end time - Time cluster was actively available for writes Freq: begin time to begin time - Time between checkpoints Dirty: # data blocks newly updated – not incremented when “made dirtier” Time spent performing checkpoint operation: Freq - Len

32 © 2007 Progress Software Corporation 32 DB-15: Inside the Recovery Subsystem Checkpoints and Promon APW Specific Activity… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 CPT Q: # data buffers APW wrote from checkpoint queue (from prev chkpt) Scan: # data buffers APW wrote while scanning -B APW Q: # data buffers APW wrote from APW Q Dirty buffers added to APWQ from -B LRU eviction

33 © 2007 Progress Software Corporation 33 DB-15: Inside the Recovery Subsystem Checkpoints and Promon To be avoided… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 Flushes: Number of blocks written during checkpoint (marked from previous checkpoint) Len: Checkpointing too often should be avoided

34 © 2007 Progress Software Corporation 34 DB-15: Inside the Recovery Subsystem Reusing space in the BI file

35 © 2007 Progress Software Corporation 35 DB-15: Inside the Recovery Subsystem BI Space Reuse 1 BI Files 432243

36 © 2007 Progress Software Corporation 36 DB-15: Inside the Recovery Subsystem BI Space Reuse 15 BI Files 4322435

37 © 2007 Progress Software Corporation 37 DB-15: Inside the Recovery Subsystem BI Space Reuse 42315 BI Files 6 When can BI space be reused? No need to “Age” cluster anymore No open transactions in cluster W h y ?? Checkpoint DOES NOT close a cluster!! Changes have been written to data files If outstanding transaction were to roll back, where would the undo action come from? -G 0 vs –G 60 Thanks fdatasync() BI files grow to some working set size

38 © 2007 Progress Software Corporation 38 DB-15: Inside the Recovery Subsystem Rollback

39 © 2007 Progress Software Corporation 39 DB-15: Inside the Recovery Subsystem Rollback Processing -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Current Output Buffer Free List 15 Current Input Buffer 9 Backout Buffer 12 Backout Buffer BI.lbi PROMON: Input buffer hits Output buffer hits Mod buffer hits Busy buffer waits Total BI Reads Notes read ABL sub transaction rollback: ABL requests compensating action Read backwards & UNDO until tx begin

40 © 2007 Progress Software Corporation 40 DB-15: Inside the Recovery Subsystem What about BOB? -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Free List 15 Current Input Buffer 9 Backout Buffer 12 Backout Buffer BI Current Output Buffer PROMON: Input buffer hits Output buffer hits Mod buffer hits BO Buffer hits

41 © 2007 Progress Software Corporation 41 DB-15: Inside the Recovery Subsystem Crash Recovery

42 © 2007 Progress Software Corporation 42 DB-15: Inside the Recovery Subsystem BI Note Types  Physical (purely physical) Database Extend and Raising HWM Block chain manipulations Do not participate in rollback Participate in physical crash recovery  Logical (purely logical) Changes not relying on physical state (dynamic) Index sub-operations Rollback & logical part of crash recovery  Physiological (most popular)

43 © 2007 Progress Software Corporation 43 DB-15: Inside the Recovery Subsystem Crash Recovery  Performed on each database startup Only needed phases performed  Brings DB up to last known consistent state Physically sound In-flight transactions rolled back Missing committed transactions re-applied

44 © 2007 Progress Software Corporation 44 DB-15: Inside the Recovery Subsystem Physical Redo Oldest active txn Last Recorded Note Before-Image Log Bring DB up to point of crash *** Begin Physical Redo Phase, 4 at 0. Find last active cluster and backup one *** Physical Redo Phase Completed at block, off, upd… *** At end of Physical Redo, txn table is 128 Apply notes based on updctr No BI notes generated during redo redo phase - forward scan

45 © 2007 Progress Software Corporation 45 DB-15: Inside the Recovery Subsystem Physical Undo redo phase - forward scan Before-Image Log Backout physical DB changes (if needed) Oldest active txn *** Begin Physical Undo 10 txns at block 128 offset 1608 *** Physical Undo Completed at 128 (block #) Starts at crash point. U ndo physical and physiological notes Causes new BI notes to be generated Ends when 1 st transaction end encountered Physical undo Last Note

46 © 2007 Progress Software Corporation 46 DB-15: Inside the Recovery Subsystem Logical Undo redo phase - forward scan Before-Image Log Backout all uncommitted transactions Oldest active txn *** Begin Logical Undo Phase, 10 incomplete txns are being backed out. *** Logical Undo Phase Completed at Block 1135 offset 7743. Starts where physical undo left off Undo logical and physiological notes *** Logical Undo Phase begin at Block 1136 offset 1608. Logical undo backward scan Physical undo Last Note

47 © 2007 Progress Software Corporation 47 DB-15: Inside the Recovery Subsystem Agenda  The BI Units of Measure  Some Simple Rules  General Processing  Reliability Switches  Summary

48 © 2007 Progress Software Corporation 48 DB-15: Inside the Recovery Subsystem Switches: Reliability and Integrity  -I : No longer a valid parameter. Never had anything to do with crash recovery  -R : Default - Reliable BI I/O Writes bypass the FS cache Use for OLTP *** Before-Image File I/O (-r -R): Reliable. *** Crash Recovery (-i): Enabled.

49 © 2007 Progress Software Corporation 49 DB-15: Inside the Recovery Subsystem Switches: Reliability and Integrity  -r : BI writes are buffered (un-reliable) to FS Well tuned system overshadows any gain of -r All notes recorded Rollback will work Crash recovery likely to work Recovery from OS crash will most likely fail *** This session is running with the non-raw (-r) parameter. *** Before-Image File I/O (-r -R): Not Reliable. *** Crash Recovery (-i): Enabled. *** An earlier -r session crashed, the database may be damaged.

50 © 2007 Progress Software Corporation 50 DB-15: Inside the Recovery Subsystem Switches: Reliability and Integrity  -i : Does not record purely physical notes BI I/O is buffered (un-reliable) to FS No FS sync at checkpoint Rollback will work. OS or DB crash, abnormal termination –Must restore from backup *** This session is being run with the no-integrity (-i) option. *** Crash Recovery (-i): Not Enabled. *** Before-Image File I/O (-r -R): Not Reliable. Why provide it then?

51 © 2007 Progress Software Corporation 51 DB-15: Inside the Recovery Subsystem Switches: Last Resort  -F (dash Foolish) Enter DB without recovery Use as a last resort Integrity NOT maintained Usually need to –Validate Data Integrity –Dump and load

52 © 2007 Progress Software Corporation 52 DB-15: Inside the Recovery Subsystem Agenda  The BI Units of Measure  Some Simple Rules  General Processing  Reliability Switches  Summary

53 © 2007 Progress Software Corporation 53 DB-15: Inside the Recovery Subsystem Summary  Recovery is a complex thing  You can do things to improve the process  We make it simple for you

54 © 2007 Progress Software Corporation 54 DB-15: Inside the Recovery Subsystem Questions? -bibufs 10 NF - a NF - b NF - c NF - d NF - e 3231 30 29 Modified Queue Current Out Buffer Free List BI A P W db Checkpoint Queue 172 128 Associated BI Note 4231

55 © 2007 Progress Software Corporation 55 DB-15: Inside the Recovery Subsystem Thank you for your time!

56 © 2007 Progress Software Corporation 56 DB-15: Inside the Recovery Subsystem

57 © 2007 Progress Software Corporation 57 DB-15: Inside the Recovery Subsystem Other recovery related Switches  -bi  -biblocksize  -directio No need for sync at checkpoint time  -bwdelay  -bibufs, -aibufs  -bistall, -bithold

58 © 2007 Progress Software Corporation 58 DB-15: Inside the Recovery Subsystem Switches: Transactions  -Mf : Delayed commit # seconds a commit note can reside in –bibufs Some commits lost/Integrity Maintained  Group Commit Technique –groupdelay only runs w/-Mf 0 Only in multi user mode # milliseconds to sleep at commit time  -G : # seconds to age cluster (use & re-use) No longer needed with fdatasync()


Download ppt "DB-15: Inside The Recovery Subsystem Plan to commit; Be prepared to rollback. Richard Banville Fellow, Technology and Product Architecture Progress OpenEdge."

Similar presentations


Ads by Google