1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan.

1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan (IIT Bombay) Presented By: Viraj Churi(10305072) Rupesh Bende(09305921)

2 Motivation To get serializable execution with non-blocking reads. Many industry applications run on systems that use Snapshot Isolation as the isolation level (highest level of consistency).  e.g. Oracle, PostgreSQL, SQL Server etc. Automation of SI anomalies detection in a set of transaction programs. (manual analysis already exists)

3 Contents Introduction Snapshot Isolation Protocol  Examples of SI-Anomalies  Detecting SI-Anomalies  Removing SI-Anomalies Syntactic Analysis Eliminating False Positive Tools used for Automation Conclusion

4 Introduction  Snapshot Isolation(SI) provides significantly improved concurrency over 2PL, allowing reads to be non-blocking  Unfortunately it can lead to non-serializable execution but it is widely used.  SI Anomalies Detection/Correction :  Step 1: Find set of programs possibly leading to SI anomalies  Step 2: Address the problem of False positive(programs wrongly identified as possibly leading to anomalies). NOTE:- Unlike earlier work it is designed to be automated rather than manually carried out.

5 Some important terms to be Understood : Isolation Level: ‘Isolation Level’ is a setting that decides how data which is a part of an ongoing transaction is made visible to other transactions.  High Level of Isolation(serializable):  High level of consistency  Low level of concurrency  Low Level of Isolation:  Low level consistency  High level of concurrency Serializability: A Tx schedule will be called serializable, if it is equivalent to serial serial schedule.

6 Concurrency : Allowing, Interleaving of read/writes by different applications at a time.  Usual Problems Lost update (txn states not present in final state) Inconsistent read (partial effect of txn seen) DBMS must control concurrent execution of transactions, to ensure consistenc  Solution: Concurrency Control mechanism  Lock Based:  Shared Lock(S)  Exclusive Lock(X)  Timestamp based:

7 Two-Phase Locking(2PL) Protocol: This is a protocol which ensures conflict-serializable schedules.  Phase 1: Growing Phase transaction may obtain locks transaction may not release locks  Phase 2: Shrinking Phase transaction may release locks transaction may not obtain locks Two-Phase Locking(2PL) Protocol: This is same as 2PL with some enhancement as “ It holds the exclusive lock until it commits”

8 Multiversion Protocol : With each data item Q, a sequence of versions is associated. Each version Qk contains three data fields:  Content is the value of version Qk.  W-timestamp(Qk) is the timestamp of the transaction that created version Qk.  R-timestamp(Qk) is the largest timestamp of any transaction that successfully read version Qk.  Transaction (Ti) issues a read(Q) or write(Q) operation. Let Qk denote the version of Q and W-timestamp(Qk)<=TS(Ti). 1. If transaction Ti issues a read(Q), then the value returned is the content of version Qk and R-timestamp(Qk) is updated by the largest value of TS(Ti) and current R-timestamp(Qk). 2. If transaction Ti issues write(Q), and if TS(Ti)<R-timestamp(Qk), then the system rolls back transaction Ti. On the other hand, if TS(Ti) = W-timestamp(Qk), the system overwrites the contents of Qk; otherwise it creates a new version of Q.

9 SI is an attractive optimistic concurrency control protocol widely implemented and widely used.  Attractive features: Reads are always non-blocking Consistency problems are not widely seen  Reason:  Data items that are read are also updated  Violation of integrity constraint, so rollback eliminates the anomalies.  Drawback: Vulnerable to anomalies e.g. “write skew”, “read-only Tx” Tx can lead to non-serializable & cause database inconsistent.

10 Snapshot Isolation Protocol SI is an extension of multiversion concurrency control. A transaction T1 executing with Snapshot Isolation  takes snapshot of committed data at start of T1 called start- timestamp  always reads/modifies data in its own snapshot  updates of concurrent transactions are not visible to T1  T1 is allowed to commit only when another Tx t2 running concurrently has not already written the data item that T1 intends to write.  Intuition: snapshot should be consistent, if the database was consistent before. Read doesn’t give current value (instead gives value at it was when transaction started) Not equivalent to a serial execution  In a serial execution, one transaction would see the other

11 Example: Transaction T2 and T3 read data from snapshot of committed data made by Transaction T1. T2 will not be allowed to commit if T3 has already written the data T2 intends to write. (First Committer wins) T1T2T3 W(Y:=0) R(Z:=2) Commit Start R(Y)  0 W(Y:=1) W(Y:=2) W(Z:=4) Commit R(Z)  2 R(Y)  1 Commit-Req Abort Concurrent updates not visible Own updates are visible Not first-committer of X Serialization error, T2 is rolled back

12 Benefits of SI No extra storage for multiple versions. Reading is never blocked, even by concurrent writer. So, throughput is good. Performance similar to Read Committed Better concurrency than serializable isolation level Prevents classical anomalies:  No lost update (because first committer wins)  No inconsistent read (all txn reads see the same set of complete txn)

13 Example of SI Anomalies SI breaks serializability when transactions modify. Among concurrent transaction, neither one sees the effect of the other. Doesn’t always give serializable execution, integrity constraints can be violated. Two common types of anomalies: [Fekete et al. SIGMOD’05]  Write skew anomaly  Read-only Transactional anomaly

14 Anomaly: Write Skew (with Updates) Constraint: X+Y>=0 Initially, X = 100 and Y = 0 T1: withdraw 70 from XT2: withdraw 90 from Y R(X,100) R(Y,0) R(X,100) R(Y,0) W(Y,-90) W(X,30) Commit X + Y = -60 Skewed Write

15 Anomaly: Write Skew (with Inserts) A voucher with unique voucher# is to be created for every bill Programmer codes : m = select max(vno) ; insert new tuple (billno, voucher#=m+1) ; Let max(vno)=10 and new vouchers for bill numbers X and Y are to be created T1: Insert bill no.XT2: Insert bill no.Y R(max(vno),10) Insert(X,11) Insert(Y,11) Commit Duplicate voucher# created Skewed Write

16 Whether a transaction leads to an anomaly can by determined by a serialization graph for an execution. Graph Structure: Nodes: Transaction Edges: Conflicts or Dependencies b/w Transactions. In the graph there will be an edge from Ti to Tj, if Ti is reading a version of and item and Tj produces a later version of the same item. Key Theorem: If graph is acyclic then execution will be serializable & no anomalies can occur. Detecting SI-Anomalies TiTj

17 Application consists of transaction programs  from which different transactions are generated depending on the parameter values (input) the control structures (decision based on value) Transactions might interleave in different ways. Hence, it is infeasible to enumerate every possible execution. Goal is to ensure that every possible execution in given application is serializable (not just a particular execution). Detecting SI-Anomalies(Cont…)

18 Detecting SI-Anomalies: Static Analysis SDG: Static Dependency Graph [Fekete et al. TODS’05] It is drawn for a given collection of application programs A. and denoted by SDG(A). Nodes : Application Programs Edges : For two application programs p1 and p2, an edge from p1 to p2 will be marked vulnerable, if there is some execution of the system in which T1 and T2 be any execution instances of P1and P2 respectively where  There is dependency between T1 and T2  T1 and T2 are concurrent Vulnerable edge is shown by dashed line. P2P1

19 Conflicts under SI Read Dependency (WR) P1→ P2  P1 modifies a data item, which is seen by P2’s read.  P1 must completely precede P2 (because of snapshot read) Write Dependency (WW) P1→ P2  P1 modifies a data item, that is later modified by P2  P1 must completely precede P2 (because of first-committer-wins) Antidependency (RW) P1→ P2  P1 reads a data item, and doesn’t see P2’s modifications  Either P1 precedes P2, or P1 and P2 are concurrent.

20 Static Analysis (contd…) Conditions for Vulnerability  RW conflict from T1 to T2, without WW conflict  T1 and T2 are concurrent. T1 T2 and T1 T2 Pivot (Dangerous Structure)  A transaction program P is a pivot in a SDG, if there is a cycle containing subpath with P Q R S PP3 Pivot RW WW

Static Analysis (contd…) RP Q RP Q If P,Q and R are programs from collection A such that There sre Vulnerable edges from – R to P – P to Q and path from Q to R then P becomes Pivot w.r.t A If Q==R then – Both P and R becomes Pivots. R P Pivot

22 Static Analysis (contd…) Theorem [Fekete TODS’05]  Absence of pivot in the SDG(A)  serializable execution under SI. What if pivot exists in the SDG ?  Modify some application programs without changing business logic.  Change at least one vulnerable edge to be non-vulnerable. Not always possible, also may have different impacts on concurrency.

23 Removing SI-Anomalies Strict 2PL for Pivots  Run the pivot programs with true serializability (using 2PL), rather than using SI.  This doesn’t require any modification in pivot programs, except for configuration information for the session.  Implementation: A version of an item X produced by an SI-Tx T must be protected by an exclusive lock before T commits. The lock is obtained by following the normal locking rules.  Microsoft server 2005 and MySQL use both SI and 2PL  Oracle and PostgreSQL use SI for Isolation level serializability

24 Removing SI-Anomalies(Cont…) Materializing Conflicts:  Programmers can explicitly introduce extra conflicts to transactions, in order to prevent the transactions from running concurrently.  Implementation: One option: make both transactions write the same row of some table (for a given vulnerable edge). This means “First-committer wins” is invoked, so Txs can’t run concurrently and edge becomes non vulnerable.

25 Removing SI-Anomalies(Cont…) Promotion:  In this approach we change the program at the tail end of the edge (the one with the read with read-write conflict). E.g. Read is promoted to write.  Implementation: By inserting an update statement along with select statement. e.g.P1(prior): “ select T.c from T where …” P1(later): “update T set T.c=T.c” “ select T.c from T where …” Similar effect is obtained by replacing “select..” by “select..for update” (which we abbreviate to SFU), used in Oracle. Reads by select statement are treated as updates. – SFU in PostgreSQL is slightly different. Promotion can be applied to Pivot P or predecessor of P when Pred(P) has read-write conflict with Pivot, by promoting read to write. Note: Promotion does not solve the phantom problem

26 Transaction Programs in SQL Transaction Programs (in SQL) These are set of SQL statements including SELECT, INSERT, DELETE, UPDATE. Parameterization is achieved by WHERE clause e.g. WHERE col=:UserInput Identifying Set of Transaction Programs (SQL)  Extracting Tx programs from the source code of the application program.  By logging the SQL statements submitted to the Database.  If the application has control flow the application level Tx is split into multiple straight–line Txs, because different execution paths in the application may generate different set of SQL statements

Transaction Programs in SQL Issue : Control flow in programs Ex if else statements,loops Gives rise to multiple execution paths and may generate different set of SQL statements Solution: Split application program into multiple straight line transaction

Transaction Programs in SQL Program: W; if C then X else Y; Z Either X or Y is executed but not both. Split into two straight-line transactions: if not(C) abort; W;X; Z if C abort; W; Y; Z

29 Mini Banking example Schema  account (accno, balance, acctype)  customer (id, name, address)  owner (id,accno)  txn (txnid, txntype, accno, id, amount, timestamp)  batchaudit (bid, start, end, inamount, outamount) Transactions  Creating new Account (CA1, CA2)  Update Customer Info (UCI)  Deposit (DEP)  Withdraw (W1, W2)  End-of-day Audit (EOD)

Mini Banking example Update, Deposit and Audit does not have any control flows and hence covered with only 1 transaction program each. Create Account is split into 2 programs. CAc1 is the program where customer is already recorded. CAc2 is the one where the customer is not already recorded, and a new customer record is entered in DB. Withdraw is also split into 2 programs. ShW1 is the one where the resultant balance is non- negative. ShW1 is the case where the balance is negative, and hence withdrawal will not be successful.

31 Identifying Dependencies Defining Syntactic read and write sets for each Tx.  rset(P) is the set of columns read by transaction program P  wset(P) is the set of columns written by P.  The entries in these sets are like “tablename.column”  They specify upperbound on data items read or written by transaction For UCI transaction shown below: begin; select * from customer where id=:id; update customer set name=?, address=? where id=:id; commit; rset(UCI) = {customer.id, customer.name, customer.address} wset(UCI) = {customer.name, customer.address}

32 Syntactic Column-based Analysis Column-based Syntactic Dependency Graph (CSDG) Nodes are transaction programs – An edge P i → P j is created when – (rset(Pi) ∩ wset(Pj) != ∅ ) ∨ – (wset(Pi) ∩ rset(Pj) != ∅ ) ∨ – (wset(Pi) ∩ wset(Pj) != ∅ ) – An edge P i → P j is marked as pseudovulnerable (PVUL) whenever rset(P i )  wset(P j )    P is a syntactic pseudopivot, if some cycles of edges in CSDG contains a sub path as: RPQ

33 CSDG for banking example CSDG for Banking Application Syntactic Pseudopivot Pseudovulnerable

34 Syntactic Analysis (contd…) Syntactic Analysis is safe, that is there are no false-negative(a potential anomaly is not identified). That is CSDG has an edge if SDG has and edge Every pivot is a syntactic pseudopivot. [but not vice versa] Theorem 1: If a set of transaction programs contains no syntactic pseudopivots, then every execution under SI will in fact be serializable.

False Positive Syntactical analysis is conservative and may lead to false positives. False positive: it is wrong identification of a threat or dangerous condition that turns out to be harmless. Transaction which is pseudo pivot but not pivot. We present techniques that identify some common false positives using properties of columns and of the programs.

36 False Positives CSDG for Banking Application Syntactic Pseudopivot Pseudovulnerable Many transactions which can never cause any anomaly are detected as syntactic pseudopivot. False Positives

37 Eliminating False Positives 1: Modification Protected Readset (MPR) The Oracle and PostgreSQL, implementation of SI treat a tuple as the lowest level data item. Read and Write sets identify rows instead of specific columns. Snapshot Isolation forces that two Txs are not concurrent if both commit updates on any columns of same row.These columns may not be overlapping. This may give rise to pseudopivots which are not true pivots.

Eliminating False Positives 1: Modification Protected Readset (MPR) Transaction-1: Select balance from account where accno = :accno Update account set balance=balance+1000 where accno= :accno T1 is a syntactic pseudopivot since there is a rw self-loop However, if two instances of above transaction run for the same value of :accno, they cannot successfully run concurrently under SI, since both update same row. Hence this transaction cannot be a real pivot. Idea: ignore rw conflicts if the transaction updates all rows that it reads. – This gives rise to broad definition which covers significant number of false positives.

Eliminating False Positives 1: Modification Protected Readset (MPR) But we have to be careful: Transaction-2: Select * from account where balance = :balval Update account set balance = balance + 1 where balance=:balval and name < “N” Update account set balance = balance - 1 where balance=:balval and name >= “N” The above transaction updates all rows that it reads. However, suppose two copies of the above transaction with :balval as 5 and 6 run concurrently. They write different rows so no ww conflict is present and both can commits. – However, the resultant execution is not equivalent to any serial execution of the two transactions Hence this is not just a pseudopivot, it is a true pivot.

Eliminating False Positives 1: Modification Protected Readset (MPR) Stable Predicate: Predicate C used in P 1 is stable w.r.t P 2, if and only if for every possible schedule H containing execution instances of P 1 and P 2 as T 1 and T 2 resp, the set of rows identified by C in T 1 doesn’t depend on the serialization order. T1 Update account set balance=balance+1000 where accno==5 T2 Update account set balance=balance+1000 where accno==5 Here accno==5 of T1 is stable w.r.t T2.

41 Modification Protected Readset (MPR) MPR-Select A select statement S in a transaction program P 1 is said to be MPR w.r.t transaction program P 2, if either

42 MPR Transactions MPR Transaction:  A transaction program P 1 is said to be MPR w.r.t P 2 if, Every select query as well as every sub-query of an insert, delete or update in P 1 is an MPR-Select w.r.t P 2. WHERE clause predicates of every update/delete statement in P 1 are stable w.r.t. P 2. Theorem 2: If a transaction program P 1 is MPR w.r.t P 2, and if the DBMS uses row-level granularity for the first-committer-wins check, the edge from P 1 to P 2 can not be vulnerable.

MPR Transactions Suppose P1 is MPR w.r.t. P2, T1 arises from executing P1, T2 arises from P2 and there is some read-write dependency from T1 to T2. T2 cannot affect a predicate based on which T1 select rows, so there is no predicate-read-to-write dependency. Thus the dependency must be data-item-read-to-write, but when T1 reads a row and T2 updates the row then T1 and T2 both modify that row, and so the two cannot run concurrently to commitment. Hence the edge T1 to T2 cannot be vulnerable. Removal of such edges may lead to removal of falsely identified pseudopivots which may not give rise to anomaly.

44 MPR Analysis MPR Analysis: We say that a transaction is found to be a false positive using MPR analysis if  It is detected as a syntactic pseudopivot, and  After eliminating vulnerable edges using theorem-2, the transaction is found not be a pivot MPR analysis depends on understanding predicates - Idea: create a simpler syntactic definition that does not need to understand predicates Insert-Delete Stable Table: A table t is said to be insert-delete stable w.r.t Tx program P, if P doesn’t contain any insert or delete statement which operates on table t.

45 MPR Analysis(Cont…) Syntactically Stable Column: Column c of table t, denoted by t.c is said to be syntactically stable w.r.t. Tx program P if t.c  wset(P) Syntactically Stable Predicate: Consider a predicate C and Tx program P. If for every tablename.column used in C is stable w.r.t. P and every table on which C operates is insert-delete stable w.r.t P, then C is syntactically stable w.r.t. P. Syntactically MPR-select: Same as MPR-select, except 1. “WHERE clause predicate C must be syntactically stable w.r.t. Transaction program P2.” 2. The where clause predicate C used in select statement is of the form (D and D’) where D is the predicate used by update/delete statement

46 MPR Analysis(Cont…) Example: Update customer transaction(UCI): rset and wset of UCI overlap with each other and hence there is pseudovulnerable self-loop from UCI to itself. Theorem 3: If S is a select statement in a transaction program P 1 such that S is syntactically MPR w.r.t. Tx program P 2 then S is MPR w.r.t. P 2

MPR Analysis(Cont…) Syntactically MPR-Select:. A select statement S in transaction program P1 is said to be syntactically MPR w.r.t. transaction program P2, if either rset(S) ∩ wset(P2) = ∅ or all of following conditions are true The WHERE clause predicate C used in S is syntactically stable w.r.t. P2. P1 contains a statement M, such that – M is an update or delete statement 3 – The WHERE clause predicate D used by M to identify rows to be modified is such that C = (D and D′) for some D′, and D must be syntactically stable w.r.t. P2. – Whenever the program executes S, it either also executes M or aborts. From above definition and using Theorem-2 and Theorem-3 it is clear that there cannot be vulnerable edge from UCI to itself.

48 Eliminating False Positives 2: New Identifier Generation Test begin; select max(accno)+1 as m from account; insert into account(accno, balance, type) values (:m, 0, :type); commit; New Identifier Generation Analysis:  for assigning new primary key (numeric)  if two transactions read same max value and create same identifier, SI will not prevent concurrent execution but primary key or referential constraint will!  Explicitly check Select-max conflict, outside snapshot rset(CA1) = {account.accno} wset(CA1) = {account.*}

49 Eliminating False Positives 3: Existence Check Before Insert begin; select accno as found from account where accno=:m; if(found==null) insert into account values (:m, 0, :type); else print ‘Error: Requested account number is already in use’; endif commit; rset(CA1) = {account.accno}, wset(CA1) = {account.*} Select with given PK... if not found (Insert values with same PK) Select using primary key can not conflict with Insert of other transaction having same pattern.

50 After Eliminating False Positives Remaining Syntactic Pseudopivot Eliminated False Positives: UCI: MPR DEP: MPR CA1 & CA2: NIGA / ECIA

51 Tool for Analyzing an application The automated tool has the following flow of activities: 1. Find the set of transaction programs. 2. Extract the syntactic read and write sets. 3. Create CSDG using conservative syntactic analysis and detect syntactic pseudopivots. 4. Eliminate/Reduce false positives. 5. Select appropriate techniques to avoid anomalies (currently done manually). After using the techniques to avoid anomalies, we can rerun the analysis to check whether they worked.

52 Experimental Results TPC-C Mini Bank Acad.Finance Distinct Transactions772634 Syntactic Pseudopivots detected 472534 EFP1: MPR Detected32114 EFP2: New Identifier Generation Protection detected 0233 EFP3: Existence check before Insert protection 0020 Remaining Potential pivots 03928 Verified true pivots0322

Implementation issue in Avoiding Anomalies It may not suffice use the promotion technique in some transactions where the conflict is between a predicate read and a write,because it does not prevent phantoms. ABCD E Two Choices Exist. For promotion.In large set of programs we need to minimize number of progrmas that need to be promoted.

54 Conclusion Theory of Syntactic Analysis to obtain a superset of transactions that may cause anomalies. Studied some general patterns of false positives and proposed sufficient conditions for identifying such transactions. Developed a tool that can automate the testing of database applications for safety against SI anomalies  identified some genuine problems in production code.  Implementation issues discussed in paper.

55 References 1. “Automating the Detection of Snapshot Isolation Anomalies” Sudhir Jorwekar, Alan Fekete, Krithi Ramamritham, S. Sudarshan VLDB 2007: 1263-1274 2. “Making snapshot isolation serializable” Alan Fekete, Dimitrios Liarokapis, Elizabeth O'Neil, Patrick O'Neil and Dennis Shasha ACM TODS, 30(2) June 2005 3. “ Allocating Isolation Levels to Transactions ”, Alan Fekete, PODS 2005

56 Thank You!

1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan.

Similar presentations

Presentation on theme: "1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan.

Similar presentations

Presentation on theme: "1 Automating the Detection of Snapshot Isolation Anomalies Sudhir Jorwekar (IIT Bombay) Alan Fekete (Univ. Sydney) Krithi Ramamritham (IIT Bombay) S. Sudarshan."— Presentation transcript:

Similar presentations

About project

Feedback