C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009
Concurrency Control vs. Recovery Concurrency Control Provide correct control of concurrent running of multiple transactions to maximize system throughput. i.e., the average number of transactions completed in a given time. Recovery Ensures database is fault tolerant, and not corrupted by software, system or media failure
Concurrency Control in C-Store Uses strict two-phase locking to control concurrent running of read-write transactions. each node (a site in the shared-nothing system architecture) sets locks on data objects that the runtime system reads or writes. Resolves deadlocks via timeouts. aborting one of the deadlocked transactions. Does not use strict two-phase distributed commit. avoiding the PREPARE phase.
Strict Two-Phase Locking (Strict 2PL) It is the most widely used locking protocol. Two rules (1) If a transaction T wants to read (respectively, modify) a database object, it first requires a shared (respectively, exclusive) lock on the object. (2) All locks held by a transaction are released when the transaction is completed.
Distributed COMMIT Processing in C- store (1): Master and Worker Each transaction T has a master that is responsible for assigning T ’s sub-transactions to appropriate nodes (workers). and determining the ultimate commit state of T.
Distributed COMMIT Processing in C- store (2): The Protocol 1 st Phase When the master receives a COMMIT statement for the transaction T, it waits until all workers have completed all outstanding actions And then issues a commit (or abort) message to each worker. 2 nd Phase Once a worker has received a commit message, it can releases all locks related to the transaction T And delete the UNDO log for T. T is completed, and hence has no need for UNDO in recovery.
Distributed COMMIT Processing in C- store (3): The Implications In C-Store, the master does not PREPARE the workers. So it is possible for a worker the master has told to commit to crash before writing any updates or log records related to a transaction to stable storage. The failed worker will recover its state from other projections on other nodes during recovery.
Overview of Recovery in C-Store Uses standard write-ahead logging protocol for recovery. Uses a STEAL, NO-FORCE policy for writing database objects. Possibly results in UNDO and REDO. Only logs UNDO records. Performs REDO by executing updates which have been queued on other nodes.
Write-Ahead Logging Property(WAL) The Protocol Each write must be recorded in the log (on disk) before the corresponding change is reflected in the database itself. To ensure this protocol, the DBMS must be able to selectively force a page in memory to disk. i.e., the page containing information on the write.
Contents of an Update Log Record The first 3 fields are common to all log records. The other fields are for updates.
STEAL / NO-FORCE STEAL Allowing an updated page P of an uncommitted transaction T to be swapped from memory to disk. T can abort later, so the DBMS must remember the old value of P to support UNDO. NO-FORCE When a transaction T commits, pages in the buffer that are modified by T are not forced to disk. System can crash before all the pages are written to disk, so the DBMS must remember the updates of T to support REDO.
the Recovery Algorithm ARIES: three phases 1. Analysis:Identifies dirty pages in the buffer (i.e., changes that have not been written to disk) and active transactions at the time of the crash. 2. REDO:Repeats all actions and restores the database state to what it was at the time of the crash. 3. UNDO:Undoes the actions of aborted transactions.
Recovery in C-Store Basic idea A crashed node recovers by running a query (copying state) from other projections. K-Safety Sufficient projections and join indexes are maintained, So that K nodes can fail within time t, the time to recover, And the system will be able to maintain transactional consistency. Three cases to consider.
Recovery: Case 1 If the failed node suffered no data loss, No dirty pages are found for aborted transactions. Then we can restore it by executing updates that will be queued for it elsewhere in the system. Assuming those updates are successfully saved in other nodes, and the updates can be identified by conditions on timestamp, transaction ID and etc. Pages of committed transactions were not written to disk. So we simply need REDO.
Recovery: Case 2 If both the RS and WS are destroyed in the failed node, Then we have to reconstruct both segments from other projections and join indexes in the system. First restore segments by exploiting Insertion Vectors and Deleted Record Vectors from other nodes. Second the queued updates must be run as in Case 1.
Recovery: Case 3 If WS is damaged but RS is good in the failed node, Then we can reconstruct the WS from other corresponding WS segments and/or RS segments. Identifying corresponding WS segments by checking the range of sort key. Using the sort keys to find storage keys, And then finding other tuple columns by following appropriate join indexes.
Queries for Recovering WS Note that each WS segment, S, contains only tuples with an insertion timestamp later than some time t lastmove (S).
References Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C- Store: A Column Oriented DBMS VLDB, pages , 2005.C- Store: A Column Oriented DBMS Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems (3rd edition). Database Management Systems