Fault Tolerance - Transactions

Slides:



Advertisements
Similar presentations
Chapter 8 Fault Tolerance
Advertisements

TRANSACTION PROCESSING SYSTEM ROHIT KHOKHER. TRANSACTION RECOVERY TRANSACTION RECOVERY TRANSACTION STATES SERIALIZABILITY CONFLICT SERIALIZABILITY VIEW.
1 CSIS 7102 Spring 2004 Lecture 8: Recovery (overview) Dr. King-Ip Lin.
CMPT 401 Summer 2007 Dr. Alexandra Fedorova Lecture X: Transactions.
CMPT Dr. Alexandra Fedorova Lecture X: Transactions.
Transaction Processing Lecture ACID 2 phase commit.
1 Transaction Management Database recovery Concurrency control.
Last Class: Weak Consistency
Database Management Systems I Alex Coman, Winter 2006
Computer Science Lecture 16, page 1 CS677: Distributed OS Last Class:Consistency Semantics Consistency models –Data-centric consistency models –Client-centric.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
08_Transactions_LECTURE2 DBMSs should guarantee ACID properties (Atomicity, Consistency, Isolation, Durability). This is typically done by guaranteeing.
Transaction Lectured by, Jesmin Akhter, Assistant professor, IIT, JU.
Computer Science Lecture 12, page 1 CS677: Distributed OS Last Class Vector timestamps Global state –Distributed Snapshot Election algorithms –Bully algorithm.
Presented By: Shreya Patel ( ) Vidhi Patel ( ) Universal College Of Engineering And Technology.
The Relational Model1 Transaction Processing Units of Work.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Chapter 15: Transactions Loc Hoang CS 157B. Definition n A transaction is a discrete unit of work that must be completely processed or not processed at.
V1.7Fault Tolerance1. V1.7Fault Tolerance2 A characteristic of Distributed Systems is that they are tolerant of partial failures within the distributed.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Transactions.
Software System Lab. Transactions Transaction Concept A transaction is a unit of program execution that accesses and possibly updates various.
Introduction to Fault Tolerance By Sahithi Podila.
Advanced Database CS-426 Week 6 – Transaction. Transactions and Recovery Transactions A transaction is an action, or a series of actions, carried out.
10-Jun-16COMP28112 Lecture 131 Distributed Transactions.
Transactions.
Semantic Concurrency Control for Real-Time Diagramming
Chapter 14: Transactions
Database Systems (資料庫系統)
Database Management System
COMP28112 – Lecture 14 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 13-Oct-18 COMP28112.
Dependability Dependability is the ability to avoid service failures that are more frequent or severe than desired. It is an important goal of distributed.
Fault Tolerance - Transactions
ACID PROPERTIES.
Transactions Properties.
CS122B: Projects in Databases and Web Applications Winter 2018
Fault Tolerance - Transactions
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 19-Nov-18 COMP28112.
Outline Announcements Fault Tolerance.
Distributed Systems, Consensus and Replicated State Machines
Fault Tolerance Distributed Web-based Systems
Transactions.
Fault Tolerance - Transactions
Lecture 21: Concurrency & Locking
Introduction to Fault Tolerance
EEC 688/788 Secure and Dependable Computing
EEC 688/788 Secure and Dependable Computing
Database Security Transactions
Distributed Transactions
Lecture 21: Replication Control
Atomic Commit and Concurrency Control
COMP28112 – Lecture 13 Byzantine fault tolerance: dealing with arbitrary failures The Byzantine Generals’ problem (Byzantine Agreement) 22-Feb-19 COMP28112.
Lecture 20: Intro to Transactions & Logging II
Distributed Transactions
Transaction management
Distributed Transactions
Chapter 14: Transactions
Distributed Transactions
EEC 688/788 Secure and Dependable Computing
Distributed Transactions
Lecture 21: Replication Control
Abstractions for Fault Tolerance
Concurrency Control.
UNIT -IV Transaction.
CS122B: Projects in Databases and Web Applications Winter 2019
CS122B: Projects in Databases and Web Applications Spring 2018
Transaction Communication
Transactions, Properties of Transactions
Fault Tolerance - Transactions
Presentation transcript:

Fault Tolerance - Transactions COMP28112 Lecture 11 Fault Tolerance - Transactions 5-Apr-19 COMP28112 Lecture 11

Key Definitions “A characteristic feature of distributed systems that distinguishes them from single-machine (centralized) systems is the notion of partial failure”. [Tanenbaum, p.321] The goal is to tolerate faults, that is, to operate in an acceptable way, when a (partial) failure occurs. Being fault tolerant is strongly related to dependability: “Dependability is defined as the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers” [IFIP 10.4 Working Group on Dependable Computing and Fault Tolerance, http://www.dependability.org] 5-Apr-19 COMP28112 Lecture 11

Requirements for Dependability Availability: the probability that the system operates correctly at any given moment. Reliability: length of time that it can run continuously without failure. Safety: if and when failures occur, the consequences are not catastrophic for the system. Maintainability: how easily a failed system can be repaired. 5-Apr-19 COMP28112 Lecture 11

Types of Failures Crash: Server halts! Omission failures: Server fails to respond to incoming requests Server fails to receive incoming messages Server fails to send messages Response failures: A server’s response is incorrect Timing failures: Server fails to respond within a certain time Arbitrary (byzantine) failures: A component may produce output it should never have produced (which may not be detected as incorrect) – arbitrary responses at arbitrary times. Benign (i.e., omission/timing) failures are by far the most common; we’ll see problems related to byzantine failures later on. 5-Apr-19 COMP28112 Lecture 11

The two generals must attack at the same time to succeed! The two generals’ problem (or paradox)… (pitfalls and challenges of communication with unreliable links…) Two armies, each led by a general, are preparing to attack a village. The armies are outside the village, each on its own hill. The generals can communicate only by sending messengers passing through the valley. The two generals must attack at the same time to succeed! http://en.wikipedia.org/wiki/Two_Generals'_Problem 5-Apr-19 COMP28112 Lecture 11

Failure masking using redundancy Physical redundancy: A well-known engineering technique (e.g., B747s or A380s have four engines but – subject to certain conditions – can fly on three) Even nature does it! Time redundancy: An action is performed, if need be, again and again. Especially helpful when faults are transient and intermittent. Information redundancy: e.g., send extra bits when transmitting information to allow recovery. 5-Apr-19 COMP28112 Lecture 11

Redundancy… …creates several problems: …costs money! But, above all: Consistency of replicas (e.g., all data need to be updated). Should improve (overall) system performance. (we’ll return to these!) …costs money! But, above all: We still need to make sure that any failure won’t leave our system in an inconsistent (corrupted) state! 5-Apr-19 COMP28112 Lecture 11

Example: A Simple Application (a client communicating with a remote server) Transfer £100 from account 1 to account 2 x = read_balance(1); y = read_balance(2); write_balance(1, x – 100); write_balance(2, y + 100); Crashes can occur at any time during the execution What problems can arise because of this? 5-Apr-19 COMP28112 Lecture 11

CRASH CRASH CRASH CRASH Crash Acct Balance 1 100 2 200 Acct Balance 1 x = read_balance(1); Acct Balance 1 100 2 200 Acct Balance 1 100 2 Acct Balance 1 200 2 100 CRASH CRASH CRASH CRASH y = read_balance(2); write_balance(1, x - 100); write_balance(2, y + 100); 5-Apr-19 COMP28112 Lecture 11

The sequence of operations MUST execute as an ATOMIC operation All-or-Nothing Either ALL operations execute or NONE x = read_balance(1); y = read_balance(2); write_balance(1, x - 100); write_balance(2, y + 100); The sequence of operations MUST execute as an ATOMIC operation 5-Apr-19 COMP28112 Lecture 11

Concurrent Users Transfer £100 from acct 1 to 2 x = read_bal(1) Multiple users can be transferring funds simultaneously. What problems can arise because of this? Concurrent Users Transfer £100 from acct 1 to 2 x = read_bal(1) y = read_bal(2) write_bal(1, x-100) write_bal(2, y+100) Transfer £300 from acct 1 to 2 u = read_bal(1) v = read_bal(2) write_bal(1, u-300) write_bal(2, v+300) 5-Apr-19 COMP28112 Lecture 11

Possible Sequence of Events 1 x = read_bal(1) 2 u = read_bal(1) 3 v = read_bal(2) 4 write_bal(1, u-300) 5 y = read_bal(2) 6 write_bal(1, x-100) 7 write_bal(2, y+100) 8 write_bal(2, v+300) Acct Balance 1 -200 2 200 Acct Balance 1 100 2 200 Acct Balance 1 2 500 Acct Balance 1 2 300 Acct Balance 1 2 200 5-Apr-19 COMP28112 Lecture 11

Does all this remind you anything? What you expect What you got Acct Balance 1 -300 2 600 Acct Balance 1 2 500 The two transfers got in each other’s way Does all this remind you anything? 5-Apr-19 COMP28112 Lecture 11

Isolated Execution We must ensure that “concurrent” applications do not interfere with each other But what does interfere mean? 5-Apr-19 COMP28112 Lecture 11

Serial (=Sequential) Executions Concurrent executions do not interfere with each other if their execution is equivalent to a serial one: The reads and writes get the same result as if the transfers happened one at a time (i.e. they don’t interleave). Simple but naive solution: One transfer at a time Not scalable and very very slow How do we maximise concurrency without corrupting the data? Good question! 5-Apr-19 COMP28112 Lecture 11

Can crashes cause problems? x = read_balance(1); CRASH Acct Balance 1 2 300 Acct Balance 1 2 200 Acct Balance 1 100 2 200 y = read_balance(2); write_balance(1, x - 100); write_balance(2, y + 100); 5-Apr-19 COMP28112 Lecture 11

Data surviving crashes could be in anyone of these three states ? Acct Balance 1 100 2 200 Acct Balance 1 2 200 Acct Balance 1 2 300

Durable Updates are persistent once the application successfully completes Acct Balance 1 2 300 5-Apr-19 COMP28112 Lecture 11

An application should not violate a database’s integrity constraints Balance of ALL customers should not exceed their overdraft limit All account holders have a name and an address Transfer £500 from account 1 to account 2 Transfer should not be permitted if overdraft limit is £200 for account 1 Acct Balance 1 100 2 200 Consistency 5-Apr-19 COMP28112 Lecture 11

Wouldn’t it be great if we had an abstraction (and an implementation) that provided us with the ACID properties? Atomicity Consistency Isolation Durability 5-Apr-19 COMP28112 Lecture 11

Transactions (=individual, indivisible operations) to the rescue Originated from the database community Simple way to write database applications Provides the ACID properties Transaction either commits or aborts Fast, recovers from all sorts of failures, highly available, manages concurrency, ... In use everywhere and everyday begin_tx ... commit_tx 5-Apr-19 COMP28112 Lecture 11

How Transactions are Implemented Managing multiple “simultaneous” users Concurrency control algorithms Ensure the execution is equivalent to a “serial” execution (key assumption: transactions have a short duration in the order of milliseconds: you don’t want to “block” other transactions for too long) Durability Recovery algorithms Replay the actions of committed transactions and undo the effects left behind by aborted transactions 5-Apr-19 COMP28112 Lecture 11

Concurrency Control Two-phase locking Hmm, if only I was able to lock available hotel and band slots in lab exercise 2… it would make my life easier! Two-phase locking “Acquire locks” phase Get a read lock before reading Get a write lock before writing Read locks conflict with write locks Write locks conflict with read and write locks “Release locks” phase when the transaction terminates (commit or abort) What does all this remind you of? ( recall COMP25111, lectures on semaphores and thread synchronisation: there are some key problems in core Computer Science!) 5-Apr-19 COMP28112 Lecture 11

Conclusion Redundancy is the key to deal with failures We need to avoid corruption of data due to failures: Use transactions. Reading: Tanenbaum et al: Sections 1.3.2, 8.1-8.3 (weak on transactions). Coulouris et al: Sections 2.3.2, 13.1, 13.2. 5-Apr-19 COMP28112 Lecture 11