Transactions in Distributed Systems

Slides:



Advertisements
Similar presentations
CM20145 Concurrency Control
Advertisements

Two phase commit. Failures in a distributed system Consistency requires agreement among multiple servers –Is transaction X committed? –Have all servers.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Transaction Management Overview Chapter 16.
TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
COS 461 Fall 1997 Transaction Processing u normal systems lose their state when they crash u many applications need better behavior u today’s topic: how.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
Exercises for Chapter 17: Distributed Transactions
Quick Review of May 1 material Concurrent Execution and Serializability –inconsistent concurrent schedules –transaction conflicts serializable == conflict.
Synchronization. Physical Clocks Solar Physical Clocks Cesium Clocks International Atomic Time Universal Coordinate Time (UTC) Clock Synchronization Algorithms.
Persistent State Service 1 Distributed Object Transactions  Transaction principles  Concurrency control  The two-phase commit protocol  Services for.
CS-550 (M.Soneru): Recovery [SaS] 1 Recovery. CS-550 (M.Soneru): Recovery [SaS] 2 Recovery Computer system recovery: –Restore the system to a normal operational.
Ext3 Journaling File System “absolute consistency of the filesystem in every respect after a reboot, with no loss of existing functionality” chadd williams.
1 More on Distributed Coordination. 2 Who’s in charge? Let’s have an Election. Many algorithms require a coordinator. What happens when the coordinator.
© Chinese University, CSE Dept. Distributed Systems / Distributed Systems Topic 11: Transactions Dr. Michael R. Lyu Computer Science & Engineering.
1 ICS 214B: Transaction Processing and Distributed Data Management Distributed Database Systems.
Transactions and concurrency control
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “DATABASE RECOVERY” (PART – 1) Academic Year 2014 Spring.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Commit Protocols. CS5204 – Operating Systems2 Fault Tolerance Causes of failure: process failure machine failure network failure Goals : transparent:
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Distributed Deadlocks and Transaction Recovery.
Distributed Transactions March 15, Transactions What is a Distributed Transaction?  A transaction that involves more than one server  Network.
Chapter 19 Recovery and Fault Tolerance Copyright © 2008.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Distributed Transactions Chapter 13
Distributed Transactions
Chapter 15 Recovery. Topics in this Chapter Transactions Transaction Recovery System Recovery Media Recovery Two-Phase Commit SQL Facilities.
Lecture 12 Recoverability and failure. 2 Optimistic Techniques Based on assumption that conflict is rare and more efficient to let transactions proceed.
1 Concurrency Control II: Locking and Isolation Levels.
Fault Tolerance CSCI 4780/6780. Distributed Commit Commit – Making an operation permanent Transactions in databases One phase commit does not work !!!
 Distributed file systems having transaction facility need to support distributed transaction service.  A distributed transaction service is an extension.
XA Transactions.
Commit Algorithms Hamid Al-Hamadi CS 5204 November 17, 2009.
Distributed Transactions Chapter – Vidya Satyanarayanan.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Section 06 (a)RDBMS (a) Supplement RDBMS Issues 2 HSQ - DATABASES & SQL And Franchise Colleges By MANSHA NAWAZ.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Recovery technique. Recovery concept Recovery from transactions failure mean data restored to the most recent consistent state just before the time of.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Jun-Ki Min. Slide Purpose of Database Recovery ◦ To bring the database into the last consistent stat e, which existed prior to the failure. ◦
Operating System Reliability Andy Wang COP 5611 Advanced Operating Systems.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Database Recovery Techniques
DURABILITY OF TRANSACTIONS AND CRASH RECOVERY
Recovery in Distributed Systems:
Database Transaction Abstraction I
Database System Implementation CSE 507
Operating System Reliability
Operating System Reliability
EECS 498 Introduction to Distributed Systems Fall 2017
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Operating System Reliability
Operating System Reliability
Chapter 10 Transaction Management and Concurrency Control
CSE 486/586 Distributed Systems Concurrency Control --- 3
Outline Introduction Background Distributed DBMS Architecture
Distributed Transactions
Chapter 2: Operating-System Structures
Operating System Reliability
Database Recovery 1 Purpose of Database Recovery
Exercises for Chapter 14: Distributed Transactions
Distributed Databases Recovery
UNIVERSITAS GUNADARMA
Chapter 2: Operating-System Structures
CSE 486/586 Distributed Systems Concurrency Control --- 3
Operating System Reliability
Operating System Reliability
Presentation transcript:

Transactions in Distributed Systems 4/24/2019

Distributed vs. Local Systems Slower to access information from remote node. Partial crashes in the system. Problems with message transmissions. Access of local information is much faster. Total Crashes. 4/24/2019

What is a transaction? In databases, a transaction is “a collection of operations that represent a unit of consistency and recovery” Overall view of a transaction: Initialization. (Memory, resources etc.) Reads and modifications to consistent resource objects. Abort: Changes are undone and not saved and resources are released Commit: Changes are saved and resources are released. Resources and information remain consistent. 4/24/2019

Distributed Transactions Problem and solution ideas are very similar to the ones in transactions of database systems. Solutions such as Concurrency, Atomicity and Recovery of transactions. Additional problems in distributed transactions due to the reliance on network communication. 4/24/2019

Problems with Distributed Transactions Concurrency control of transactions Inconsistencies due to Failures Recovery from System Crashes Presented solutions: Argus, QuickSilver, RVM 4/24/2019

Concurrency Solution Use of read and write locks to synchronize the access or modification of system resources. A two-phase lock mechanism to allow full seriability. 4/24/2019

Inconsistent system states due to Failures This is solved having atomicity of operations. An operation can either commit or abort. 4/24/2019

Solution to Recovery Save useful data of a transaction in stable storage. Have a mechanism to bring back the whole system to a consistent state before continuing the transaction or abort the transaction. 4/24/2019

Committing a transaction Have mechanism to ensure that all operations, remote or local, commit or abort at “near” the same time when transaction commits. Making sure that after transaction commit the entire system remains consistent. 4/24/2019

Argus -Barbara Liskov et al. A programming language and System for Distributed Programs Intended for programs that keep online data for long periods of time Guardians provide nice encapsulation of objects and resources. Actions allow atomicity of processes 4/24/2019

Assumptions Nodes can communicate only the exchange of messages. It is impossible for a failed node to send messages. Messages may be lost, delayed or out of order. Corruption in messages are detectable. 4/24/2019

Guardians Object that encapsulates resources in the system. Resides in a single node and each node can have more than one guardian. Resources are accessed through handlers(location independent). Guardians can create other Guardians at the specified node. Contains stable objects and volatile objects. 4/24/2019

Printer Guardian 4/24/2019

Atomic Actions (Actions) Actions are total, either it commits or aborts(using atomic objects). Actions can be nested(actions & subactions). When action commits it propagates its locks and the object versions of the local guardian to the parent action. The p-list (participating guardians of committed descendents) propagate to the parent. 4/24/2019

Actions are Serializable Strict two-phase locking is used(locks are held until action commits or aborts). Can prove that using a strict two-phase lock mechanism implies seriability of actions. 4/24/2019

Action Tree 4/24/2019

Concurrency Control Solution Synchronization access to resources is done via locks. Every operation is a read or a write. Seriability, totality and synchronization of actions to shared objects. 4/24/2019

Locks in Nested Transactions - J. Eliot B. Moss “An action can acquire a read lock iff all holders of write locks are ancestors” “It can acquire a write lock iff all holders of read or write locks are ancestors” 4/24/2019

Inconsistencies Inconsistencies are solved via the totality of actions. An action can either commit or abort. 4/24/2019

Recovery from Crashes Solved by the atomicity of actions and by using versions of the stable objects. Versions are passed up the action tree when action commits. 4/24/2019

Two-Phase Commit Protocol (I) Topaction guardian sends “prepare” messages to guardian participants in the transaction. Participants receive message, records versions of the objects modified by the descendents and a “prepare” record is saved in stable storage. Read locks are released. Participants replies with an “ok” message. If some participant can’t save the necessary information it sends a “refuse” message. If topaction receives any refused response or does not respond, it aborts the transaction and “abort” messages are sent. 4/24/2019

Two-Phase Commit Protocol (II) If every participant replies “ok”, topaction guardian writes a committed record and sends “commit” messages to participants. Participant writes a commit record, updates new versions of the objects, releases locks and sends a “done” message. Topaction guardian saves a record after all participants have acknowledge. 2Phase Commit Protocol makes sure that the transaction either commits or aborts. 4/24/2019

Data on Action Commits and Aborts 4/24/2019

Timing of Topactions 4/24/2019

QuickSilver -Schmuck & Wyllie Operating system that supports atomicity, recoverability and concurrency of transactions. Flexible Concurrency control policy, each node can have its own policy. Transaction commit is done via one-phase or two-phase commit protocol depending on the node. No nested transactions. 4/24/2019

Transaction Management It is made up of three main pieces. Transaction Manager. Transactional IPC. A log manager. 4/24/2019

Transaction Manager Transaction Manager handles the coordination of a transaction, from start to finish. TM starts a transaction when it receives an IPC request from a process. TM assigns globally unique TID and registers it with the kernel. 4/24/2019

Transactional IPC IPC are done on behalf of a transaction. A participant of a transaction is either a process that created the transaction or a process that received a request with an TID. Remote requests are handled by the local Communication Manager process (CM). 4/24/2019

Log Manager Records appended at the end of a file. Uses log to recover during the two-phase commit protocol. LM may be used as checkpoints in long running computations. 4/24/2019

Concurrency QuickSilver allows each node to have it’s own concurrency control policy. DFS does not enforce full seriability(improves performance). Write locks obtained only when directory is renamed, created or deleted. Reads lock are not required when reading a directory. Read locks on files are released when file is closed. Write locks on files are released until transaction commits. 4/24/2019

Recovery Recovery is handled by the QuickSilver distributed file system (DFS). Guarantees changes are saved on commit and it undoes any changes on abort. 4/24/2019

Transaction Commit TM collects information about all the participants from the kernel. If CM is a participant, TM asks for list of machines who received a request. TM requests all remote machines to recursively commit. Any communication or machine failures detected by CM causes an abort in the transaction. 4/24/2019

Comparing AIX and QuickSilver 4/24/2019

Recoverable Virtual Memory Satyanarayanan et al. Only addresses the problem of Recovery. Stores Virtual memory in external data segments found in stable storage. Portable with a library that is linked in with applications. “Value simplicity over generality” by adopting a layered approach. Provides independent control over atomicity and concurrency as well as other problems such as deadlocks and starvations. 4/24/2019

Layered Approach of RVM 4/24/2019

Segments and Regions Applications map regions of segments into their virtual memory. 4/24/2019

Sequence of Operations Select regions in virtual memory to be mapped. Get a global transaction ID. Successful commit saves segments in log. 4/24/2019

Crash Recovery Recovery consists of reading the log from tail to head and then reconstructing the last committed changes. Modifications are applied to the external data segment. Log is emptied. 4/24/2019

Truncation Reclaiming space in the log by applying changes to the external data segment. Necessary because space is finite. 4/24/2019

In Summary Argus provides a complete solution with atomicity, concurrency control and recovery. However, it is too complex, slow and unoptimized. QuickSilver shifts the problem of concurrency control to the individual nodes and performs as good as AIX. RVM addresses only the problem of recovery in VM and introduces a “neat” layered structure to address the other problems. 4/24/2019