CSIS 7102 Spring 2004 Lecture 6: Distributed databases

Slides:



Advertisements
Similar presentations
Types of Distributed Database Systems
Advertisements

CM20145 Concurrency Control
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Database Systems (資料庫系統)
1 Concurrency Control Chapter Conflict Serializable Schedules  Two actions are in conflict if  they operate on the same DB item,  they belong.
1 CSIS 7102 Spring 2004 Lecture 9: Recovery (approaches) Dr. King-Ip Lin.
CS3771 Today: deadlock detection and election algorithms  Previous class Event ordering in distributed systems Various approaches for Mutual Exclusion.
Lock-Based Concurrency Control
Lecture 11 Recoverability. 2 Serializability identifies schedules that maintain database consistency, assuming no transaction fails. Could also examine.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
Transaction Management and Concurrency Control
CS 582 / CMPE 481 Distributed Systems
1 ICS 214B: Transaction Processing and Distributed Data Management Replication Techniques.
Lecture-12 Concurrency Control in Distributed Databases
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
Transaction Management
Chapter 18: Distributed Coordination (Chapter 18.1 – 18.5)
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Transaction Management and Concurrency Control.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
CMPT Dr. Alexandra Fedorova Lecture XI: Distributed Transactions.
Transaction. A transaction is an event which occurs on the database. Generally a transaction reads a value from the database or writes a value to the.
Distributed Databases
04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Distributed Deadlocks and Transaction Recovery.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 10 Transaction Management.
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
Databases Illuminated
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
Distributed Transaction Management. Outline Introduction Concurrency Control Protocols  Locking  Timestamping Deadlock Handling Replication.
Chapter 19 Distributed Databases. 2 Distributed Database System n A distributed DBS consists of loosely coupled sites that share no physical component.
IM NTU Distributed Information Systems 2004 Distributed Transactions -- 1 Distributed Transactions Yih-Kuen Tsay Dept. of Information Management National.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Multidatabase Transaction Management COP5711. Multidatabase Transaction Management Outline Review - Transaction Processing Multidatabase Transaction Management.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
CS3771 Today: Distributed Coordination  Previous class: Distributed File Systems Issues: Naming Strategies: Absolute Names, Mount Points (logical connection.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 22: Distributed.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Remote Backup Systems.
Chapter 18: Distributed Coordination
Transaction Management
Transaction Management and Concurrency Control
Lecture 17- Distributed Databases (continued)
Chapter 19: Distributed Databases
Concurrency Control.
Multiple Granularity Granularity is the size of data item  allowed to lock. Multiple Granularity is the hierarchically breaking up the database into portions.
Database System Implementation CSE 507
Commit Protocols CS60002: Distributed Systems
Outline Announcements Fault Tolerance.
Distributed Database Systems
Chapter 10 Transaction Management and Concurrency Control
Chapter 15 : Concurrency Control
Introduction of Week 13 Return assignment 11-1 and 3-1-5
Distributed Transactions
Lecture 21: Replication Control
Exercises for Chapter 14: Distributed Transactions
Distributed Databases Recovery
Distributed Transactions
Database System Architectures
Distributed Transactions
Module 18: Distributed Coordination
CONCURRENCY Concurrency is the tendency for different tasks to happen at the same time in a system ( mostly interacting with each other ) .   Parallel.
Lecture 21: Replication Control
Remote Backup Systems.
Transactions, Properties of Transactions
Presentation transcript:

CSIS 7102 Spring 2004 Lecture 6: Distributed databases Dr. King-Ip Lin

Table of contents Limitation of locking techniques Timestamp ordering View serializability Optimistic concurrency control Graph-based locking Multi-version schemes

Distributed databases So far, we assume a centralized database Data are stored in one location (e.g. a single hard disk) A centralized database management system to handle transaction To handle multiple requests, a client-server system is used Client send requests for data to server Server handle query, transaction management etc.

Distributed databases This is not the only possibility In many cases, it may be advantageous for data to be distributed Branches of a bank Different part of the government storing different kind of data about a person Different organizations sharing part of their data Thus, distributed databases

Distributed databases Data spread over multiple machines (also referred to as sites or nodes. Network interconnects the machines Data shared by users on multiple machines

Distributed databases Homogeneous distributed databases Same software/schema on all sites, data may be partitioned among sites Goal: provide a view of a single database, hiding details of distribution Heterogeneous distributed databases Different software/schema on different sites Goal: integrate existing databases to provide useful functionality

Distributed databases Advantages of distributed databases Sharing data – users at one site able to access the data residing at some other sites. Autonomy – each site is able to retain a degree of control over data stored locally. Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.

Distributed databases Key features of distributed databases Typically geographically distributed, with (relatively) slow connections Typically autonomous, in terms of both administration and execution However, many cases allows for a coordinator site for each transaction (different coordinator for different transaction) Local vs. global transactions A local transaction accesses data in the single site at which the transaction was initiated. A global transaction either accesses data in a site different from the one at which the transaction was initiated or accesses data in several different sites.

Distributed databases Global transactions  new issues in transaction processing Commit coordination: each node cannot unilaterally decide to commit Transaction cannot be committed at one site and aborted at another Data replication: The same data may reside in different sites Possibility for reading different copies  locking have to be careful Ensuring correctness  updates have to be careful

Distributed databases – rules of the game Transaction may access data at several sites. Each site has a local transaction manager responsible for: Maintaining a log for recovery purposes Participating in coordinating the concurrent execution of the transactions executing at that site. Each site has a transaction coordinator, which is responsible for: Starting the execution of transactions that originate at the site. Distributing subtransactions at appropriate sites for execution. Coordinating the termination of each transaction that originates at the site, which may result in the transaction being committed at all sites or aborted at all sites.

Atomicity in distributed databases Ensuring atomicity means guarding against failures. Many more kinds of failures in distributed databases Failure of a site. Loss of massages Handled by network transmission control protocols such as TCP-IP Failure of a communication link Handled by network protocols, by routing messages via alternative links Network partition A network is said to be partitioned when it has been split into two or more subsystems that lack any connection between them Note: a subsystem may consist of a single node Hard to distinguish between failure and partition

Atomicity in distributed databases Challenge with respect to atomicity Consistency over multiple sites Cannot allow one site to commit and the other site to abort Two basic protocols 2-phase commit (most common) 3-phase commit

Two-phase commit Goals Given a transaction that is running on multiple sites, ensure either all the sites commit together or abort together. Assume that when a site fail, it does not send wrong message to confuse anyone, it just stop working Need to handle the case that some sites fail during the 2-phase commit process

Two-phase commit Simple idea Issues Select one site as the coordinator (the other sites are called participants) Go ask all the sites whether each of them want to abort (phase 1) Wait to collect all the answers and make final decision; broadcast the decision to all the sites; sites act accordingly (phase 2) Issues If a site failed and then quickly recovered, how do I know what I have done? What if a site failed in the middle, does everybody have to wait for him? What if the coordinator fails?

Two-phase commit If a site failed and then quickly recovered, how do I know what I have done? Need to have a log to record what has been done Log in “stable storage” Should I log before I act? Write-ahead log

Two-phase commit What if a participant site failed? By our assumption, it will not respond The coordinator will wait for a time, and then decide that one site failed The decision should be: abort What if the coordinator failed? Trickier, will deal with it later

Two-phase commit: phase 1 Phase 1: coordinator ask for decision Coordinator (Ci) asks all participants to prepare to commit transaction T. Ci adds the records <prepare T> to the log and forces log to stable storage sends prepare T messages to all sites at which T executed Why should coordinator write the record before sending messages?

Two-phase commit: phase 1 Upon receiving message, transaction manager at site determines if it can commit the transaction if not, add a record <no T> to the log send abort T message to Ci if the transaction can be committed, then: add the record <ready T> to the log force all log records for T to stable storage send ready T message to Ci Why can’t the site commit right away?

Two-phase commit: phase 2 Phase 2: coordinator make the decision and broadcast the result T can be committed of Ci received a ready T message from all the participating sites: otherwise T must be aborted. Coordinator adds a decision record, <commit T> or <abort T>, to the log and forces record onto stable storage. Once the record stable storage it is irrevocable (even if failures occur) Notice that a transaction is deemed commited/aborted at this point of time

Two-phase commit: phase 2 Coordinator sends a message to each participant informing it of the decision (commit or abort) Participants take appropriate action locally. It also record on the log whether it commit <commit T> or abort <abort T>

Two phase commit : participant failures Suppose a participating site S fails. What must it do when it come back up? First, check what is in the log Case 1: S sees <commit T> Meaning: Coordinator has decided to commit T and the decision is final Thus: S should make sure the transaction commits at that site (redo T)

Two phase commit : participant failures Case 2: S sees <abort T> Meaning: Coordinator has decided to abort T and the decision is final Thus: S should make sure the transaction aborts at that site (undo T)

Two phase commit : participant failures Case 3: S sees <ready T> Meaning: T can be committed from the point of view of S only Does S know the final decision yet? Thus: S must query the coordinator about the final decision, and act accordingly

Two phase commit : participant failures Case 4: S sees nothing Meaning: S has not even respond to the initial query from the coordinator Thus: S must send its decision to the coordinator But is it really necessary? If the coordinator does not hear from S for a long time, it will assume S has failed, thus aborting the transaction Thus S can safely decide to abort without any problem and without sending its decision to the coordinator (Why?)

Two-phase commit: coordinator failure Suppose the coordinator fails Then participants must make a decision Case 1 : a site sees <commit T> Meaning: T has commited Thus: broadcast the result and ensure everyone commited Case 2 : a site see <abort T> Meaning: T has aborted Thus: broadcast the result and ensure everyone aborted

Two-phase commit: coordinator failure Case 3 : a site sees nothing Meaning: No decision has been made (or a decision has been made to abort) Thus: it is safe to abort (instead of waiting for the coordinator)

Two-phase commit: coordinator failure Case 4 : none of the above Meaning: every participant that is alive has told the coordinator that it can commit Thus, it is possible that the coordinator have made a decision but have yet to send it out Note that the result may still be T to be aborted All participant must wait for the coordinator to return for its decision Thus two-phase commit is blocking in this case

Two-phase commit: network partition If the coordinator and all its participants remain in one partition, the failure has no effect on the commit protocol. If the coordinator and its participants belong to several partitions: Sites that are not in the partition containing the coordinator think the coordinator has failed, and execute the protocol to deal with failure of the coordinator. No harm results, but sites may still have to wait for decision from coordinator. The coordinator and the sites are in the same partition as the coordinator think that the sites in the other partition have failed, and follow the usual commit protocol. Again, no harm results

Three-phase commit Limitation of two-phase commit Blocking when coordinator dies To overcome it, create a new phase called pre-commit Coordinator tells at least k sites that it wants to commit Thus now, 3-phases Phase 1 : Coordinator check if T can commit, participant send their choice to coordinator Phase 2 : Coordinator makes decision If commit, send pre-commit message to k sites If abort, send message to everyone to abort Phase 3 : If commit, final commit decision is broadcast and everyone commits

Three-phase commit What does 3-phase buys: Limitations: If coordinator aborts, then participants can figure out commit decision by pre-commit and then go on commit If no pre-commit message is find, one can safely abort No blocking Limitations: No more than k sites can fail Otherwise, pre-commit message may be lost Network partition can cause problem Maybe pre-commit all resides in one section Thus, not widely used

Concurrency control in distributed databases Modify concurrency control schemes for use in distributed environment. Assumptions: Each site participates in the execution of a commit protocol to ensure global transaction atomicity. Data item may be replicated at multiple sites However, updates (writes) have to be done on ALL the copies of an item

Locking protocols in distributed databases Two-phase locking based protocols Key question: Who to manage the locks? Centralized vs. Distributed How many item to locks? In case when data have copies of multiple sites Tradeoff between efficiency and concurrency Efficiency includes message send between sites

Locking protocols in distributed databases – centralized vs Locking protocols in distributed databases – centralized vs. distributed Centralized lock manager All lock requests for all items go to one site Even if the item does not reside in that site When a transaction needs to lock a data item, it sends a lock request to Si and lock manager determines whether the lock can be granted immediately If yes, lock manager sends a message to the site which initiated the request If no, request is delayed until it can be granted, at which time a message is sent to the initiating site

Locking protocols in distributed databases – centralized vs Locking protocols in distributed databases – centralized vs. distributed Centralized lock manager After obtaining the lock A transaction can read from any one site that contain the item A transaction must write to ALL sites that contain the item Advantages Simple to implement Simple deadlock handling Disadvantage Bottleneck for lock manager Vulnerability – site when down, everything is blocked

Locking protocols in distributed databases – centralized vs Locking protocols in distributed databases – centralized vs. distributed Distributed lock manager Each site has its own lock manager to handle request for items Need special protocol to access data Advantages Distributed workload Fault-tolerant Disadvantages Deadlock handling complicated Potentially more messages.

Locking protocols in distributed databases – Distributed protocols Primary copy Choose one replica of data item to be the primary copy. Site containing the replica is called the primary site for that data item Different data items can have different primary sites When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. Implicitly gets lock on all replicas of the data item

Locking protocols in distributed databases – Distributed protocols Primary copy Benefit Concurrency control for replicated data handled similarly to unreplicated data - simple implementation. Drawback If the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible

Locking protocols in distributed databases – Distributed protocols Majority protocol Local lock manager at each site administers lock and unlock requests for data items stored at that site. When a transaction wishes to lock an unreplicated data item Q residing at site Si, a message is sent to Si ‘s lock manager. If Q is locked in an incompatible mode, then the request is delayed until it can be granted. When the lock request can be granted, the lock manager sends a message back to the initiator indicating that the lock request has been granted.

Locking protocols in distributed databases – Distributed protocols Majority protocol In case of replicated data If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored. The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q. When writing the data item, transaction performs writes on all replicas.

Locking protocols in distributed databases – Distributed protocols Majority protocol Benefit Can be used even when some sites are unavailable details on how handle writes in the presence of site failure later Drawback Requires 2(n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests. Potential for deadlock even with single item - e.g., each of 3 transactions may have locks on 1/3rd of the replicas of a data. Can be overcome by predetermine order of sites being locked

Locking protocols in distributed databases – Distributed protocols Biased protocol (read-once, write-all) Local lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q. Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q. Advantage - imposes less overhead on read operations. Disadvantage - additional overhead on writes

Locking protocols in distributed databases – Distributed protocols Quorum Consensus Protocol A generalization of both majority and biased protocols Each site is assigned a weight. Let S be the total of all site weights Choose two values read quorum Qr and write quorum Qw Such that Qr + Qw > S and 2 * Qw > S Quorums can be chosen (and S computed) separately for each item Each read must lock enough replicas that the sum of the site weights is >= Qr Each write must lock enough replicas that the sum of the site weights is >= Qw For now we assume all replicas are written Extensions to allow some sites to be unavailable described later

Deadlocks in distributed databases Deadlock can occur in distributed databases Even worse, deadlocks can be distributed Consider the following two transactions and history, with item X and transaction T1 at site 1, and item Y and transaction T2 at site 2: T1: write (X) write (Y) T2: write (Y) write (X)

Deadlocks in distributed databases However, the following schedule can occur Now there is a deadlock between T1 and T2 However, at site 1, the only thing happening is T1 waiting for T2 At site 2, the only thing happening is T2 waiting for T1 So no deadlock is detected at individual sites X-lock(X) Write(X) X-lock(Y) -- wait X-lock(Y) Write(Y) X-lock(X) -- wait T1 T2

Deadlocks in distributed databases Deadlock detection need to be more careful Local wait-for graph constructed on each site Global wait-for graph combining information from each site Deadlock is detected from global wait-for graph Notice that no cycle for local wait-for graph  no cycle for global wait-for graph

Deadlocks in distributed databases Local Global

Deadlocks in distributed databases A global wait-for graph is constructed and maintained in a single site; the deadlock-detection coordinator Real graph: Real, but unknown, state of the system. Constructed graph: Approximation generated by the controller during the execution of its algorithm. The real graph can be unknown due to Network delays (changes are not propagated) Network partition

Deadlocks in distributed databases the global wait-for graph can be constructed when: a new edge is inserted in or removed from one of the local wait-for graphs. a number of changes have occurred in a local wait-for graph. the coordinator needs to invoke cycle-detection. If the coordinator finds a cycle, it selects a victim and notifies all sites. The sites roll back the victim transaction.

Deadlocks in distributed databases Limitations: false cycles Suppose the local wait for graph is as the r.h.s: Now suppose T2 release the resources on S1 Edge from T1 to T2 should be deleted Then, T2 request resources held by T3 on S2 Edge from T2 to T3 should be added at site 2 If the second message arrive before the first, then a deadlock is detected while in fact it isn’t This can be avoided if (global) 2-phase locking is maintained

Timestamp ordering in distributed databases Timestamp based techniques can be used in distributed databases Main issues: how to generate unique timestamps for transactions across multiple sites Solution: Each site generates a unique local timestamp using either a logical counter or the local clock. Global unique timestamp is obtained by concatenating the unique local timestamp with the unique identifier.

Timestamp ordering in distributed databases A site with a slow clock will assign smaller timestamps Still logically correct: serializability not affected But: “disadvantages” transactions To fix this problem Define within each site Si a logical clock (LCi), which generates the unique local timestamp Require that Si advance its logical clock whenever a request is received from a transaction Ti with timestamp < x,y> and x is greater that the current value of LCi. i.e. whenever a site see a timestamp that is larger then its clock, it advances its clock accordingly In this case, site Si advances its logical clock to the value x + 1.

Issues with replication Replication is useful in distributed database Data at multiple sites  lower access times Data warehouses In some cases, no need to access the most recent version of data However, it may have adverse effect on consistency/isolation Reading different version of data  non-serializability

Issues with replication E.g.: master-slave replication: updates are performed at a single “master” site, and propagated to “slave” sites. Propagation is not part of the update transaction: its is decoupled May be immediately after transaction commits May be periodic Data may only be read at slave sites, not updated No need to obtain locks at any remote site Particularly useful for distributing information E.g. from central office to branch-office Also useful for running read-only queries offline from the main database

Issues with replication Replicas should see a transaction-consistent snapshot of the database That is, a state of the database reflecting all effects of all transactions up to some point in the serialization order, and no effects of any later transactions. E.g. Oracle provides a create snapshot statement to create a snapshot of a relation or a set of relations at a remote site snapshot refresh either by recomputation or by incremental update Automatic refresh (continuous or periodic) or manual refresh

Issues with replication With multimaster replication (also called update-anywhere replication) updates are permitted at any replica, and are automatically propagated to all replicas Basic model in distributed databases, where transactions are unaware of the details of replication, and database system propagates updates as part of the same transaction Coupled with 2 phase commit Many systems support lazy propagation where updates are transmitted after transaction commits Allow updates to occur even if some sites are disconnected from the network, but at the cost of consistency

Issues with replication Two approaches to lazy propagation Updates at any replica translated into update at primary site, and then propagated back to all replicas Updates to an item are ordered serially But transactions may read an old value of an item and use it to perform an update, result in non-serializability Updates are performed at any replica and propagated to all other replicas Causes even more serialization problems: Same data item may be updated concurrently at multiple sites!

Issues with replication Conflict detection is a problem Some conflicts due to lack of distributed concurrency control can be detected when updates are propagated to other sites (will see later, in Section 23.5.4) Conflict resolution is very messy Resolution may require committed transactions to be rolled back Durability violated Automatic resolution may not be possible, and human intervention may be required