ARIES: Algorithm for Recovery and Isolation Exploiting Semantics
Outline Durability Atomicity ARIES Recovery in Postgres
Durability Schedule: Begin T1 Write(P1, x+1) Commit Buffer Volatile Expensive P1: x=42 Non-volatile
Make commits cheap again Schedule: Begin T1 Write(P1, x+1) Commit LSN TransID Type Pageid RedoInfo 1 T1 update P1 x + 1 2 T1 commit Log tail: LSN: log sequence number
Long live the commits Stable storage: abstract storage for log that survives any crash In practice, approximated by keeping multiple log copies
Durability revisited Schedule: Begin T1 Write(P1, x+1) Commit Buffer Log tail Volatile P1: x=43 LSN 1,2 P1: ? LSN 1,2 Non-volatile Stable storage Disk
Write-ahead logging aka WAL Persist an action instead of a page it affects Accumulate log entries in memory Log entry: necessary info to redo an action Write ahead the log tale to stable storage in batches Write ahead: write a log entry before the corresponding page Log tail: collection of log entries in memory Stable storage: the one that survives any crash Commited transaction: the one with the „commit“ log record on stable storage
REDOing things pageLSN: 1 P1: x=43 LSN TransID Type Pageid RedoInfo 1 T1 update P1 x + 1 pageLSN: 1 P1: x=43 pageLSN: LSN of the last log record that modifed this page
REDO LSNs Crash REDO REDO log record if pageLSN < record LSN
REDO logs Provide durable and efficient commits Normal operation: Employ WAL: flush all log records up to and including commit record of the tranasction being commited update pageLSN Recovery Replay the log entry if its LSN > pageLSN
Outline Durability Atomicity ARIES Recovery in Postgres
Atomicity Schedule: Begin T2 Write(P1, x=43) Abort Buffer Volatile Non-volatile
REDO recap LSNs Crash REDO We are here
Transaction Table (TT) What to undo Any tranasction without the commit record Transaction Table (TT) TransID lastLSN Status T2 2 Aborted lastLSN: LSN of the last log record on stable storage
How to undo LSN TransID Type Pageid RedoInfo 1 T2 update P1 43 UndoInfo prevLSN 42 null prevLSN: LSN of the record to undo next
UNDO logs Analyze the log to build the Transaction Table Undo the largest lastLSN by applying the UndoInfo For this tranasction, insert prevLSN into TT Log UNDOs in a special compensational log record (CLR) CLRs make UNDOs idempotent Same procedure applies for aborting a single tranasction
UNDO after REDO LSNs Crash REDO TT UNDO
Outline Durability Atomicity ARIES Recovery in Postgres
Algorithm for Recovery and Isolation Exploiting Semantics ARIES Algorithm for Recovery and Isolation Exploiting Semantics LSNs Crash Analysis Transaction and Dirty Page tables REDO UNDO
Distinctive features Uses write-ahead logging Repeats entire history, including uncommited transactions Logs UNDOs
OS and hardware support Assumption Support Configure file system Atomic page write DRAM with ECC CRC for log entries Correct memory content Log copies in distant places Stable storage for logs
OS and hardware support Assumption Support 1. Disable write-back caches 2. Synchronous commits 3. Use UPS units Logs actually reach stable storage
Outline Durability Atomicity ARIES Recovery in Postgres
Recovery in Postgres REDO log (Xlog) only Since version 7.1 (2001) No UNDO log, presumably due to MVCC Transactional DDL WAL for file operations Different format of log for different purposes Hot-stand-by, point-in-time recovery etc. Requires complex setup Hard to ensure durability Non-trivial performance-durability trade-offs
References
References
Algorithms for Recovery and Isolation Exploiting Semantics. ARIES Algorithms for Recovery and Isolation Exploiting Semantics. LSNs Crash Analysis Transaction and Dirty Page tables REDO UNDO