Distributed Systems Lecture 12 Concurrency and replication control 1
Previous lecture Internet Routing algorithms 2
Banking transaction for a customer (e.g., at ATM or browser) – Transfer $100 from saving to checking account; – Transfer $200 from money-market to checking account; – Withdraw $400 from checking account. Transaction (invoked at client): /* Every step is an RPC */ 1. savings.withdraw(100) /* includes verification */ 2. checking.deposit(100) /* depends on success of 1 */ 3. mnymkt.withdraw(200) /* includes verification */ 4. checking. deposit(200) /* depends on success of 3 */ 5. checking.withdraw(400) /* includes verification */ 6. dispense(400) 7. commit Client Server Transaction Transactions 3
Bank server: coordinator interface All methods are RPCs from a client to the server Transaction calls can be made at a client and return values from the server: openTransaction() -> trans Starts a new transaction and delivers a unique transaction identifier (TID) trans. This TID will be used in the other operations in the transaction closeTransaction(trans) -> (commit, abort) Ends a transaction: a commit return value indicates that the transaction has committed an abort return value indicates that it has aborted. abortTransaction(trans) Aborts the transaction Transactions can be implemented using RPCs/RMIs! 4
Bank Server: acount, branch interfaces deposit(amount) Deposit amount in the account withdraw(amount) Withdraw amount from the account getBalance() -> amount Return the balance of the account setBalance(amount) Set the balance of the account to amount create(name) -> account Create a new account with a given name lookup(name) -> account Return a reference to the account with the given name branchTotal() -> amount Return the total of all the balances at the branch Operations of the Branch interface Operations of the Account interface 5
Transaction definition Sequence of operations that forms a single step, transforming the server data from one consistent state to another – All or nothing principle: a transaction either completes successfully, and the effects are recorded in the objects, or it has no effect at all. (even with multiple clients, or crashes) A transactions is indivisible (atomic) from the point of view of other transactions – No access to intermediate results/states of other transactions – Free from interference by operations of other transactions However… – Transactions could run concurrently, i.e., with multiple clients – Transactions may be distributed, i.e., across multiple servers 6
Transaction: 1. savings.deduct(100) 2. checking.add(100) 3. mnymkt.deduct(200) 4. checking.add(200) 5. checking.deduct(400) 6. dispense(400) 7. commit Transaction failure modes A failure at these points means the customer loses money; we need to restore old state A failure at these points does not cause lost money, but old steps cannot be repeated This is the point of no return A failure after the commit point (ATM crashes) needs corrective action ; no undoing possible. 7
Transactions in traditional databases (ACID) Atomicity: store tentative object updates (for later undo/redo) – many different ways of doing this Durability: store entire results of transactions (all updated objects) to recover from permanent server crashes 8
Lost update problem One transaction causes loss of info for another: - Consider three account objects Transaction T1 Transaction T2 balance = b.getBalance() b.setBalance(balance*1.1) a.withdraw(balance* 0.1) c.withdraw(balance*0.1) T1/T2’s update on the shared object, “b”, is lost a: b:c: 280 c: 80 a: 220 b: 220 b: 9
Inconsistent retrieval problem Partial, incomplete results of one transaction are retrieved by another transaction Transaction T1 Transaction T2 a.withdraw(100) total = a.getBalance() total = total + b.getBalance() b.deposit(100) total = total + c.getBalance() T1’s partial result is used by T2, giving the wrong result for T a: b: 0.00 a: c: total 300 b: 10
An interleaving of the operations of 2 or more transactions is said to be serially equivalent if the combined effect is the same as if these transactions had been performed sequentially (in some order) Transaction T1 Transaction T2 balance = b.getBalance() b.setBalance(balance*1.1) balance = b.getBalance() b.setBalance(balance*1.1) a.withdraw(balance* 0.1) c.withdraw(balance*0.1) Serial equivalence a: b:c: 278 c: a: 242 b: == T1 (complete) followed by T2 (complete) 11
The effect of an operation refers to – The value of an object set by a write operation – The result returned by a read operation Two operations are said to be conflicting operations, if their combined effect depends on the order they are executed, e.g., read-write, write- read, write-write (all on same variables). NOT read-read, NOT on different variables. In other words – They are by different transactions – They are on the same object, and – At least one of them is a write Two transactions are serially equivalent iff – They involve the same actions of the same transactions, and – Every pair of conflicting actions is ordered the same way Can start from original operation sequence and swap the order of non-conflicting operations to obtain a series of operations where one transaction finishes completely before the second transaction starts Why is the above result important? – Serial equivalence is the basis for concurrency control protocols for transactions Checking serial equivalence: conflicting operations 12
Read and write conflict rules Operations of different transactions ConflictReason read NoBecause the effect of a pair ofread operations does not depend on the order in which they are executed readwriteYesBecause the effect of aread and awrite operation depends on the order of their execution write YesBecause the effect of a pair ofwrite operations depends on the order of their execution 13
How can we prevent isolation from being violated? Concurrent operations must be consistent: – If transaction T has executed a read operation on object A, a concurrent transaction U must not write to A until T commits or aborts – If T has executed a write operation on object A, a concurrent U must not read or write to A until T commits or aborts. How to implement this? – Locks Implementing concurrent transactions 14
Exclusive locks Transaction T1 Transaction T2 OpenTransaction() balance = b.getBalance() OpenTransaction() balance = b.getBalance() b.setBalance(balance*1.1) a.withdraw(balance* 0.1) CloseTransaction() b.setBalance(balance*1.1) c.withdraw(balance*0.1) CloseTransaction() Example: concurrent transactions Lock B Lock A UnLock B UnLock A Lock C UnLock B UnLock C … WAIT on B Lock B … 15
Transaction managers (on server side) set locks on objects they need. – A concurrent transaction cannot access locked objects Two phase locking: – In the first (growing) phase of the transaction, new locks are only acquired, and in the second (shrinking) phase, locks are only released – A transaction is not allowed acquire any new locks, once it has released any one lock Strict two phase locking: – Locking on an object is performed only before the first request to read/write that object is about to be applied. – Unlocking is performed by the commit/abort operations of the transaction coordinator. To prevent dirty reads and premature writes, a transaction waits for another to commit/abort However, use of separate read and write locks leads to more concurrency than a single exclusive lock Basic locking 16
non-exclusive lock compatibility Lock alreadyLock requested set readwrite none OK OK read OKWAIT writeWAITWAIT A read lock is promoted to a write lock when the transaction needs write access to the same object A read lock shared with other transactions’ read lock(s) cannot be promoted. Transaction waits for other read locks to be released Cannot demote a write lock to read lock during transaction – violates the 2P principle 2P Locking: non-exclusive lock (per object) 17
When an operation accesses an object? – If you can, promote a lock (nothing -> read -> write) – Do not promote the lock if it would result in a conflict with another transaction’s already-existing lock wait until all shared locks are released, then lock & proceed When a transaction commits or aborts: – release all locks that were set by the transaction Locking procedure in strict-2P locking 18
Non-exclusive Locks Transaction T1 Transaction T2 OpenTransaction() balance = b.getBalance() OpenTransaction() balance = b.getBalance() b.setBalance(balance*1.1) Commit Example: concurrent transactions R-Lock B … Cannot Promote lock on B, Wait Promote lock on B 19
Deadlocks Necessary conditions for deadlocks –Non-shareable resources (exclusive lock modes) –No preemption on locks –Hold & Wait or Circular Wait T U Wait for Held by Wait for A B T U Held by Wait for A B V W... Wait for Held by Hold & Wait Circular Wait 20
Naïve Deadlock Resolution Using Timeout Transaction TTransaction U OperationsLocksOperationsLocks a.deposit(100); write lock A b.deposit(200) write lockB b.withdraw(100) waits for U ’s a.withdraw(200); waits for T’s lock onB A (timeout elapses) T’s lock onA becomes vulnerable, unlockA,abort T a.withdraw(200); write locksA unlockA, B Disadvantages? 21
Strategies to fight deadlock Lock timeout (costly and open to false positives) Deadlock Prevention: violate one of the necessary conditions for deadlock (from 2 slides ago), e.g., lock all objects before transaction starts, aborting entire transaction if any fails Deadlock Avoidance: Have transactions declare max resources they will request, but allow them to lock at any time (Banker’s algorithm) Deadlock Detection: detect cycles in the wait-for graph, and then abort one or more of the transactions in cycle 22
How about handling multiple instances of same object? So far – Operations between multiple clients and one server Concurrency control What if object is replicated at multiple servers? Replication = multiple identical copies of same object/data 23
Enhances a service (object/data/service) – Increased Availability of service When servers fail or when the network is partitioned, service still available on at least on server – Fault Tolerance Under the fail-stop model, if up to f of f+1 servers crash, at least one is still alive – Load Balancing One approach: Multiple server IPs can be assigned to the same name in DNS, which returns answers/IPs round-robin. P: probability that one server fails= 1 – P= availability of service. e.g. P = 5% => service is available 95% of the time. P n : probability that n servers fail= 1 – P n = availability of replicated service. e.g. P = 5%, n = 3 => service available % of the time Why use replication? 24
Replication Transparency – User/client need not know that multiple physical copies of data exist Replication Consistency – Data is consistent on all of the replicas of an object (or is converging towards becoming consistent) Client Front End (FE) RM Client Front End (FE) Client Front End (FE) Service server Replica Manager Replication goals 25
Request Communication – Requests made from client are handled by FE. FE sends requests to either a one or more RMs Coordination: The RMs decide – Whether the request is to be applied – The order of requests FIFO ordering: If a FE issues r then r’, then any correct RM handles r and then r’. Causal ordering: If the issue of r “happened before” the issue of r’, then any correct RM handles r and then r’. Total ordering: If a correct RM handles r and then r’, then any correct RM handles r and then r’. Execution: – The RMs execute the request Replication management 26
Agreement: The RMs attempt to reach consensus on the effect of the request – E.g., Two phase commit or Paxos (this is per-object!) – If this succeeds, effect of request is made permanent Response – One or more RMs responds to the FE – The first response to arrive is good enough because all the RMs will return the same answer – Thus each RM is a replicated state machine “Multiple copies of the same State Machine begun in the Start state, and receiving the same Inputs in the same order will arrive at the same State having generated the same Outputs.” [Wikipedia, Schneider 90] Replication management 27
Let the sequence of read and update operations that client i performs in some execution be o i1, o i2,…. – “Program order” for the client A replicated shared object service is linearizable if for any execution (real), there is some interleaving of operations (virtual) issued by all clients that: – Meets the specification of a single correct copy of objects – Is consistent with the real times at which each operation occurred during the execution Main goal: any client will see (at any point of time) a copy of the object that is correct and consistent What the client sees: linearizability 28
The real-time requirement of linearizability is hard, if not impossible, to achieve in real systems A less strict criterion is sequential consistency: – A replicated shared object service is sequentially consistent if for any execution (real), there is some interleaving of clients’ operations (virtual) that: Meets the specification of a single correct copy of objects (globally) Is consistent with the program order in which each individual client executes those operations Total order not required across clients at run-time (but the interleaving is a total order of course that is consistent with what the clients saw) Each client’s ops always see the same global order of updates on objects – Though different clients may see a given update at different physical times Linearizability implies sequential consistency. Not vice-versa! Challenge with guaranteeing sequential consistency? – Ensuring that all replicas of an object are consistent Sequential consistency 29
Linearizability vs. sequential consistency Both care about the illusion of a single copy – From a client’s perspective the system should (almost) always behave as if there is a single copy Linearizability cares about time – Bob writes on Fabebook at 1:00 PM – Alice writes on Facebook at 1:15 PM – Everyone sees the updates in that order Sequential consistency cares about program order – It is not necessary that everyone will see the updates in that order But everyone will see the same order 30
Request communication – The request is issued to the primary RM and carries a unique request ID Coordination – Primary takes requests atomically, in order, checks ID (resends response if not new ID) Execution – Primary executes & stores the response Agreement – If update, primary sends updated state/result, req-ID and response to all backup RMs (1-phase commit enough) Response – primary sends result to the front end Client Front End RM Client Front End RM primary Backup …. Passive replication 31
The system implements linearizability since the primary sequences operations are in order If the primary fails a backup becomes primary by leader election, and the replica managers that survive agree on which operations had been performed at the point when the new primary takes over – To keep the system can remain linearizable in spite of crashes Can use view-synchronous group communication among the RM group – Causal-total order on multicasts and membership updates However, overhead of election Fault tolerance in passive replication 32
Request communication – The request contains a unique ID and is multicast to all by a reliable totally-ordered multicast Coordination – Group communication ensures that requests are delivered to each RM in the same order (but may be at different physical times) Execution – Each replica executes the request. (Correct replicas return same result since they are running the same program, i.e., they are replicated state machines) Agreement – No agreement phase is needed, because of multicast delivery semantics of requests Response – Each replica sends response directly to FE Client Front End RM Client Front End RM …. Active replica Processes must be deterministic In real world most servers are nondetermistic but method still used for real-time systems 33
RMs work as replicated state machines, playing equivalent roles – Each responds to a given series of requests in the same way. If any RM crashes, state is maintained by other correct RMs This system implements sequential consistency – Use FIFO-total ordering in multicasts from FE to RM group However (out of band): If clients or FEs are multi-threaded and communicate with one another while waiting for responses from the service, we may need to incorporate causal-total ordering Fault tolerance in active replication 34
Strong notion of consistency for transactions In a non-replicated system, transactions appear to be performed one at a time in some order. This is achieved by ensuring a serially equivalent interleaving of transaction operations One-copy serializability: The effect of transactions performed by clients on replicated objects should be the same as if they had been performed one at a time on a single set of objects (i.e., 1 replica per object). – Equivalent to combining serial equivalence + replication transparency/consistency Transactions: one one-copy serializability 35
Assume no crashes/failures All client requests are directed to a single primary RM Concurrency control is applied at the primary To commit a transaction, the primary communicates with the backup RMs and replies to the client View synchronous communication gives one-copy serializability Disadvantage – Performance is low since primary RM is bottleneck Replication + concurrency control: primary copy replication 36
An FE (front end) may communicate with any RM. Every write operation must be performed at all of the RMs – Each contacted RM sets a write lock on the object. A read operation can be performed at any single RM – A contacted RM sets a read lock on the object. Consider pairs of conflicting operations of different transactions on the same object. – W-W: Any pair of write operations will require locks at all of the RMs not allowed – W-R: A read operation and a write operation will require conflicting locks at some RM not allowed One-copy serializability is achieved Disadvantage? – Deadlocks – Failures block the system (e.g., writes) Read one write all 37
Concurrency control –Transaction control –Deadlocks Replication control –Active and passive replicas 38 Summary
Next lecture Gossiping 39