School of Information Technologies Michael Cahill 1, Uwe Röhm and Alan Fekete School of IT, University of Sydney {mjc, roehm, Serializable Isolation for Snapshot Databases 1. also
School of Information Technologies Outline Snapshot isolation ≠ serializable Why you should care Previous work: applications deal with it Our approach: fix the database Implementation and evaluation 2
School of Information Technologies Snapshot isolation ≠ serializable Snapshot isolation: –Transactions read a consistent snapshot of data –DBMS maintains multiple versions of data items to avoid locking for reads –Transactions don’t see concurrent writes BUT: Not equivalent to a serial execution –In a serial execution, one transaction would see the other 3
School of Information Technologies Why you should care 4 DoctorShiftStatus Jones12 Juneon duty Smith12 Juneon duty DoctorShiftStatus Jones12 Juneon duty Smith12 Junereserve DoctorShiftStatus Jones12 Junereserve Smith12 Juneon duty T1 T2
School of Information Technologies Vendor advice Oracle: “Database inconsistencies can result unless such application-level consistency checks are coded with this in mind, even when using serializable transactions.” “PostgreSQL's Serializable mode does not guarantee serializable execution...” 5
School of Information Technologies Previous work H. Berenson, P. Bernstein, J. Gray, J. Melton, E. O'Neil, P. O'Neil in SIGMOD1995: “A Critique of ANSI SQL Isolation Levels” A. Bernstein, P. Lewis and S. Lu in ICDE2000: “Semantic Conditions for Correctness at Different Isolation Levels” A. Fekete, D. Liarokapis, E. O'Neil, P. O’Neil, D. Shasha in TODS2005: “Making Snapshot Isolation Serializable” –Analyze the graph of transaction conflicts –Conditions on the graph for application to be serializable at SI –If a dangerous structure is found, modify the application S. Jorwekar, A. Fekete, K. Ramamritham, S. Sudarshan in VLDB2007: “Automating the Detection of Snapshot Isolation Anomalies” M. Alomari, M. Cahill, A. Fekete, U. Röhm in ICDE2008: “The Cost of Serializability on Platforms That Use Snapshot Isolation” 6
School of Information Technologies Static analysis of SI anomalies 7 incoming conflict outgoing conflict cycle pivot Build static dependency graph, check for dangerous structures:
School of Information Technologies Limitations of previous work Determining the conflict graph is non-trivial Repeat for every change to the application Ad hoc queries not supported Difficult to automate: reasoning required to avoid false positives 8
School of Information Technologies Our approach New algorithm for serializable isolation –Online, dynamic –Modifications to standard Snapshot Isolation Core Idea: –Detect read-write conflicts at runtime –Abort transactions with consecutive rw-edges –Don’t do full cycle detection 9
School of Information Technologies Challenges During runtime, rw-conflicts can interleave arbitrarily Have to consider begin and commit timestamps: –which snapshot is a transaction reading? –can conflict with committed transactions Want to use existing engines as much as possible Low runtime overhead But minimize unnecessary aborts 10
School of Information Technologies SI anomalies: a simple case 11 pivot commits last
School of Information Technologies The algorithm in a nutshell Add two flags to each transaction (in & out) Set T0.out if rw-conflict T0 T1 Set if rw-conflict TN T0 Abort T0 (the pivot) if both and T0.out are set –If T0 has already committed, abort the conflicting transaction In the following, we illustrate the main cases; for full details, see the paper 12
School of Information Technologies Detection: write before read 13 read old y = true T0.out = true
School of Information Technologies Detection: read before write 14 lock x, SIREAD write lock x TN.out = true = true How can we detect this? How can we detect this?
School of Information Technologies Main Disadvantage: False positives 15 no cycle unnecessary abort
School of Information Technologies Prototype: Berkeley DB Implemented in Oracle Berkeley DB –Open source: extensible –Already includes SI and 2-phase locking (S2PL) –Page-level locking: avoids phantoms Modified 692 lines of code out of 200K –Most changes related to locking: increased locking code by 10% 16
School of Information Technologies Experimental setup Question: what are the costs and benefits of Serializable SI? Comparing –standard SI –serializable SI (SSI) –serializable isolation with two-phase locking (S2PL) SmallBank benchmark [ICDE2008] –Familiar banking-style transactions (balance, deposit, transfer, etc.) –Includes a write skew by design –Update-heavy Benchmark run on a commodity PC running Linux
School of Information Technologies Experimental scenarios Scenario 1: short transactions –medium/high contention (1% probability of collisions) –CPU bound (no waits for I/O) Scenario 2: long transactions –medium/high contention –I/O bound (flushing the log) Scenario 3: low contention –low probability of collisions (0.1%) –I/O bound Graphs show avg of 5 runs & 95% confidence intervals 18
School of Information Technologies Scenario 1 (short txns): Throughput 19 But SI is NOT serializable!
School of Information Technologies Scenario 1: abort rates at MPL 20 20
School of Information Technologies Scenario 2 (long txns): Throughput 21
School of Information Technologies Scenario 2: abort rates at MPL 20 22
School of Information Technologies Scenario 3 (low cont.): Throughput 23
School of Information Technologies Conclusions New algorithm for serializable isolation –Online, dynamic, and general solution –Modification to standard Snapshot Isolation –Keeps the features that make SI attractive: Readers don’t block writers, much better scalability than S2PL Feasible to add to a Snapshot Isolation DBMS with minor changes 24
School of Information Technologies Ongoing work Further reduce the runtime overhead –Less false positives Applying the algorithm to other engines –Row-level versioning, dealing with phantoms 25