Reliable Distributed Systems Transactions in Large, Complex Systems and Web Services.

Slides:



Advertisements
Similar presentations
Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Advertisements

Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Recovery Amol Deshpande CMSC424.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 Transactions and Web Services. 2 Web Environment Web Service activities form a unit of work, but ACID properties are not always appropriate since Web.
Lock-Based Concurrency Control
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Quick Review of Apr 29 material
Jan. 2014Dr. Yangjun Chen ACS Database recovery techniques (Ch. 21, 3 rd ed. – Ch. 19, 4 th and 5 th ed. – Ch. 23, 6 th ed.)
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Transaction Processing Lecture ACID 2 phase commit.
Reliable Distributed Systems Fault Tolerance (Recoverability  High Availability)
Transaction Management and Concurrency Control
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 9: Sept. 21.
Recovery Fall 2006McFadyen Concepts Failures are either: catastrophic to recover one restores the database using a past copy, followed by redoing.
EEC 688/788 Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
EEC 693/793 Special Topics in Electrical Engineering Secure and Dependable Computing Lecture 12 Wenbing Zhao Department of Electrical and Computer Engineering.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Distributed Systems 2006 Group Membership * *With material adapted from Ken Birman.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
TRANSACTION PROCESSING TECHNIQUES BY SON NGUYEN VIJAY RAO.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Transaction log grows unexpectedly
Analyzing different protocols for E-business 1 Fatma Sayed Gad Elrab Supervisors Prof. Dr. Ezzat abd El Tawab Korany Dr. Saleh Abdel Shachour El Shehaby.
Transactional Web Services, WS-Transaction and WS-Coordination Based on “WS Transaction Specs,” by Laleci, Introducing WS-Transaction Part 1 & 2, by Little.
CS5412: TRANSACTIONS (I) Ken Birman CS5412 Spring 2012 (Cloud Computing: Birman) 1 Lecture XVII.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
TRANSACTIONS. Objectives Transaction Concept Transaction State Concurrent Executions Serializability Recoverability Implementation of Isolation Transaction.
Transaction Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 15: Transactions.
PAVANI REDDY KATHURI TRANSACTION COMMUNICATION. OUTLINE 0 P ART I : I NTRODUCTION 0 P ART II : C URRENT R ESEARCH 0 P ART III : F UTURE P OTENTIAL 0 R.
CS5412: TRANSACTIONS (I) Ken Birman CS5412 Spring 2015 (Cloud Computing: Birman) 1 Lecture XVI.
Lecture 13 Advanced Transaction Models. 2 Protocols considered so far are suitable for types of transactions that arise in traditional business applications,
XA Transactions.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Transactions. Transaction: Informal Definition A transaction is a piece of code that accesses a shared database such that each transaction accesses shared.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 8: Sept. 19.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Replication Steve Ko Computer Sciences and Engineering University at Buffalo.
NOEA/IT - FEN: Databases/Transactions1 Transactions ACID Concurrency Control.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Database recovery techniques
Transaction Management
CS5412: Transactions (I) Lecture XVI Ken Birman
CS514: Intermediate Course in Operating Systems
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Lecture 21: Replication Control
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
Transactions in Distributed Systems
Chapter 5 Architectural Design.
EEC 688/788 Secure and Dependable Computing
Lecture 21: Replication Control
Distributed Databases
Transaction Communication
Transactions, Properties of Transactions
Presentation transcript:

Reliable Distributed Systems Transactions in Large, Complex Systems and Web Services

Large complex systems They will often have many components Operations may occur over long periods of time Well need to ensure all-or-nothing outcomes but also need to allow high levels of concurrency

Concerns about transactions While running a transaction acquires locks Other transactions will block on these locks hence the longer a transaction runs the more it cuts system-wide concurrency Some subsystems may not employ transactional interfaces Application may be a script, not a single program

Transactions on distributed objects Idea was proposed by Liskovs Argus group Each object translates an abstract set of operations into the concrete operations that implement it Result is that object invocations may nest: Library update operations, do A series of file read and write operations that do A series of accesses to the disk device

Nested transactions Call the traditional style of flat transaction a top level transaction Argus short hand: actions The main program becomes the top level action Within it objects run as nested actions

Arguments for nested transactions It makes sense to treat each object invocation as a small transaction: begin when the invocation is done, and commit or abort when result is returned Can use abort as a tool: try something; if it doesnt work just do an abort to back out of it. Turns out we can easily extend transactional model to accommodate nested transactions Liskov argues that in this approach we have a simple conceptual framework for distributed computing

Nested transactions: picture T 1 : fetch(ken).... set_salary(ken, )... commit open_file... seek... read seek... write lower level operations...

Observations Can number operations using the obvious notation T 1, T Subtransaction commit should make results visible to the parent transaction Subtransaction abort should return to state when subtransaction (not parent) was initiated Data managers maintain a stack of data versions

Stacking rule Abstractly, when subtransaction starts, we push a new copy of each data item on top of the stack for that item When subtransaction aborts we pop the stack When subtransaction commits we pop two items and push top one back on again In practice, can implement this much more efficiently!!!

Data objects viewed as stacks x y z T0T0 T T 1.1 T Transaction T 0 wrote 6 into x Transaction T 1 spawned subtransactions that wrote new values for y and z

Locking rules? When subtransaction requests lock, it should be able to obtain locks held by its parent Subtransaction aborts, locks return to prior state Subtransaction commits, locks retained by parent... Moss has shown that this extended version of 2-phase locking guarantees serializability of nested transactions

Commit issue? Each transaction will have touched some set of data managers Includes those touched by nested sub-actions But not things done by sub-actions that aborted Commit transaction by running 2PC against this set Well discuss this in upcoming lectures but

2-Phase commit: Glimpse Goal is simply to ensure that either All processes do an update, or No process does the update For example, at the end of a transaction we want all processes to commit or all to abort The two phase aspect involves 1. Asking: Can you commit transaction t x ? 2. Then doing Commit or Abort

Experience with model? Some major object oriented distributed projects have successfully used transactions Seems to work only for database style applications (e.g. the separation of data from computation is natural and arises directly in the application) Seems to work only for short-running applications (Will revisit this issue shortly!)

Web Services Supports nested transaction model but many vendors might opt for only flat transactions Also provides a related model called business transactions Again, application accesses multiple objects Again, each access is a transaction But instead of a parent transaction, we use some form of script of actions and compensating actions to take if an action fails

Transactions in Web Services Imagine a travel agency that procures air tickets, hotel stays, and rental cars for traveling customers. And imagine that the agency wants to automate the whole process. Where all partners expose WS interfaces This process can be very lengthy. And typically spans multiple sub-processes, each in a different administrative domain. What to do when say the agency could find air- tickets and hotel accommodation,but no rental car?

Transaction Hierarchy in WS Basic unit is the activity : a computation executed as a set of scoped operations. Top-level process is "Business Activity" May run for a long time, so holding locks on resources until commit is not viable. Have to expose results of uncommitted business activities to concurrently executing activities.

Transaction Hierarchy in WS Small lower-level interactions are called Atomic Transactions Short; executed within limited trust domains. Satisfy ACID properties. Imagine a tree structure here (similar to nested txs)

Fault-tolerance We know how faults are handled in atomic transactions. What about faults in Business Activities? Say Business Transaction B contains atomic transactions A1 and A2, and A1 fails and A2 succeeds – need to undo A2 after it had committed Issue: since we arent using nested transactions, how can we obtain desired all- or-nothing outcome?

Compensating actions Idea is to write a form of script If then Else The compensation might undo some actions much as an abort would, but without the overheads of a full nested transaction model (Model has also been called sagas)

The WS-Coordination Spec. A standard that describes how different Web Services work together reliably. The coordination framework contains the Activation, Registration and Coordination Services…

Some Terminology The Coordination type identifies what kind the activity is (Atomic Transaction/ Business Activity) Each message sent by a participant contains a CoordinationContext for message to be understood: Has an activity identifier (unique for each activity) A pointer to the registration service used by the participant. The coordination type.

The Coordinator Activation Service: used to create activities Participants specify the coordination type Activation Service returns the CoordinationContext thats used in later stages. Registration Service: used by participants to register with (respective) coordinator for a given coordination protocol. Coordination Protocol Services: A set of these for each supported coordination type.

WS-Transaction Specifies protocols for each coordination type. Atomic Transactions Completion, PhaseZero, 2PC, etc. Business Transactions BusinessAgreement, BusinessAgreementWithComplete

Example WS-Coord Message Flow

Protocol Message Flow

Handling a Business Activity

Transactions in WS – Resources brary/en-us/dnglobspec/html/ws-coordination.asp brary/en-us/dnglobspec/html/ws-coordination.asp brary/en-us/dnglobspec/html/ws-transaction.asp brary/en-us/dnglobspec/html/ws-transaction.asp wstx1/ wstx1/ wstx2/ wstx2/

Recap Weve considered two mechanisms for applying transactions in complex systems with many objects Nested transactions, but these can hold locks for a long time Business transactions, which are a bit more like a command script In remainder of todays talk look at transactions on replicated data

Reliability and transactions Transactions are well matched to database model and recoverability goals Transactions dont work well for non- database applications (general purpose O/S applications) or availability goals (systems that must keep running if applications fail) When building high availability systems, encounter replication issue

Types of reliability Recoverability Server can restart without intervention in a sensible state Transactions do give us this High availability System remains operational during failure Challenge is to replicate critical data needed for continued operation

Replicating a transactional server Two broad approaches Treat replication as a special situation Leads to a primary server approach with a warm standby Most common in commercial products Just use distributed transactions to update multiple copies of each replicated data item Very much like doing a nested transaction but now the components are the replicas Well discuss this kind of replication in upcoming lectures

Server replication Suppose the primary sends the log to the backup server It replays the log and applies committed transactions to its replicated state If primary crashes, the backup soon catches up and can take over

Primary/backup primary backup Clients initially connected to primary, which keeps backup up to date. Backup tracks log log

Primary/backup primary backup Primary crashes. Backup sees the channel break, applies committed updates. But it may have missed the last few updates!

Primary/backup primary backup Clients detect the failure and reconnect to backup. But some clients may have gone away. Backup state could be slightly stale. New transactions might suffer from this

Issues? Under what conditions should backup take over Revisits the consistency problem seen earlier with clients and servers Could end up with a split brain Also notice that still needs 2PC to ensure that primary and backup stay in same states! Either want both to reflect a committed transaction, or (if the transaction aborted), neither to reflect it

Split brain: reminder primary backup Clients initially connected to primary, which keeps backup up to date. Backup follows log log

Split brain: reminder Transient problem causes some links to break but not all. Backup thinks it is now primary, primary thinks backup is down primary backup

Split brain: reminder Some clients still connected to primary, but one has switched to backup and one is completely disconnected from both primary backup

Implication? A strict interpretation of ACID leads to conclusions that There are no ACID replication schemes that provide high availability Well see more on this issue soon… Most real systems evade the limitation by weakening ACID

Real systems They use primary-backup with logging But they simply omit the 2PC Server might take over in the wrong state (may lag state of primary) Can use hardware to reduce or eliminate split brain problem

How does hardware help? Idea is that primary and backup share a disk Hardware is configured so only one can write the disk If server takes over it grabs the token Token loss causes primary to shut down (if it hasnt actually crashed)

Reconciliation This is the problem of fixing the transactions impacted by loss of tail of log in a failure Usually just a handful of transactions They committed but backup doesnt know because it never saw a commit record Someday, primary recovers and discovers the problem Need to apply the missing ones Also causes cascaded rollback Worst case may require human intervention Similar to compensation in Web Services

Summary? We looked at a variety of situations in which transactions touch multiple objects …because of nesting … because of complex business applications … because of primary/backup replication We left one major stone unturned: How should a server replicate its state for load balancing and high availability? Well see the answer in future lectures