Don’t be lazy, be consistent: Postgres-R, A new way to implement Database Replication Paper by Bettina Kemme and Gustavo Alonso, VLDB 2000 Presentation and Discussion led by Nickolay Tchervenski Sept 27, 2007, CS 848, University of Waterloo
Intro Eager vs Lazy schemes. Issues and benefits. If we want consistency and have the resources go with eager Paper proposes: Run transaction locally with shadow copies Propagate updates at the end Acquire all locks in an atomic step DB engine modifications necessary
Replication Model Shadow copies – use local DBMS to do updates triggers/consistency checks/write- read dependencies for “free” Propagate all updates at commit time Local reads don’t tell anyone Group communication – multicast write set in total order and serialization order of transactions
Replication Protocol Local Read Phase and “shadow” updates Send Phase – multicast the write set Lock Phase Obtain all transaction’s locks in an atomic step Conflict test with local transaction. Abort it if in first two phases. Multicast abort when in send phase Enqueue lock request after other transactions in write phase Multicast commit if transaction local Write Phase – remote transactions wait until commit arrives
Architecture of Postgres-R
Postgres-R Implementation Highlights Shadow copies – tuple-based multiversion Locking – simple tuple level locking Index locking – issues with B-Trees Write set – SQL or data
Neat Ideas Local vs Remote transactions and how they are handled Updates are bundled in write-sets Two messages per transaction Write-set (ordered) – group communication Commit/abort (unordered) Significant reuse of DB and not hard to integrate into an appropriate existing DB
Issues Group communication necessary Abort readers when write arrives (Read vs Lock phases). Other ways to tackle this, different isolation levels – CS / SI. Need to modify DB engine code, not all DBs may be suitable for this (multiversion tuples, etc.) Performance tests – tweaks – not forcing bufferpool pages to commit Better communication module – less cpu intensive
Conclusion Synchronous model that ensures consistency at a lower cost and with better scalability Using group communication services, incl. for crash recovery Can be improved