04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.

Slides:



Advertisements
Similar presentations
Types of Distributed Database Systems
Advertisements

V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Lecture-12 Concurrency Control in Distributed Databases
José Alferes Versão modificada de Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapters 22: Distributed Databases.
Chapter 18.2: Distributed Coordination Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 18 Distributed Coordination Chapter.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
Distributed Databases
Concurrency Control in Distributed Databases
Distributed Databases
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
1 Distributed Data Management Distributed Systems Department of Computer Science UC Irvine.
Distributed Databases By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING COLLEGE TIRUVANNAMALAI.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts 1 Chapter 19: Distributed Databases Heterogeneous and Homogeneous Databases Distributed.
Session-8 Data Management for Decision Support
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
Concurrency Control in Distributed Databases Gul Sabah Arif.
Operating Systems Distributed Coordination. Topics –Event Ordering –Mutual Exclusion –Atomicity –Concurrency Control Topics –Event Ordering –Mutual Exclusion.
A/18-849B/95-811A/19-729A Internet-Scale Sensor Systems: Design and Policy Lecture 7 Part 1. Distributed Databases Part 2. IrisNet Query Processing.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Distributed Transaction Management. Outline Introduction Concurrency Control Protocols  Locking  Timestamping Deadlock Handling Replication.
1 Transaction System Processes. 2 Data Servers Used in LANs, where there is a very high speed connection between the clients and the server, the client.
Chapter 19 Distributed Databases. 2 Distributed Database System n A distributed DBS consists of loosely coupled sites that share no physical component.
19.1Database System Concepts - 6 th Edition Chapter 19: Distributed Databases Heterogeneous and Homogeneous Databases Distributed Data Storage Distributed.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
©Silberschatz, Korth and Sudarshan18.1Database System Concepts 3 rd Edition 1 Chapter 18: Distributed Databases Distributed Data Storage Network Transparency.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 22: Distributed.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Databases
Lecture 17- Distributed Databases (continued)
Chapter 19: Distributed Databases
CSIS 7102 Spring 2004 Lecture 6: Distributed databases
Distributed Database Systems
MANAGING DATA RESOURCES
Distributed Databases
A View over Distributed databases
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

04/20/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Concurrency Control Global transaction automicity by 2PC or persistent message data DBMS data DBMS data DBMS data DBMS Distributed Database System How to handle concurrent transactions?

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Concurrency Control Lock based Time stamp based Validation based

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Single-Lock-Manager Approach data DBMS data DBMS data DBMS data DBMS Distributed Database System Designated lock manager

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Single-Lock-Manager Approach The transaction can read the data item from any one of the sites at which a replica of the data item resides. Writes must be performed on all replicas of a data item Advantages of scheme:  Simple implementation  Simple deadlock handling Disadvantages of scheme are:  Bottleneck: lock manager site becomes a bottleneck  Vulnerability: system is vulnerable to lock manager site failure.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Lock Manager data DBMS data DBMS data DBMS data DBMS Distributed Database System Lock manager

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Lock Manager Advantage: work is distributed and can be made robust to failures Disadvantage: deadlock detection is more complicated  Lock managers cooperate for deadlock detection

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Dealing with Replica Primary copy Majority protocol Biased protocol Quorum consensus

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Primary Copy Choose one replica of data item to be the primary copy.  Site containing the replica is called the primary site for that data item  Different data items can have different primary sites When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q.  Implicitly gets lock on all replicas of the data item Benefit  Concurrency control for replicated data handled similarly to unreplicated data - simple implementation. Drawback  If the primary site of Q fails, Q is inaccessible even though other sites containing a replica may be accessible.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Majority Protocol In case of replicated data  If Q is replicated at n sites, then a lock request message must be sent to more than half of the n sites in which Q is stored.  The transaction does not operate on Q until it has obtained a lock on a majority of the replicas of Q.  When writing the data item, transaction performs writes on all replicas. Benefit  Can be used even when some sites are unavailable Need to handle writes in the presence of site failure Drawback  Requires 2(n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests.  Potential for deadlock even with single item - e.g., each of 3 transactions may have locks on 1/3rd of the replicas of a data.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Biased Protocol Local lock manager at each site as in majority protocol, however, requests for shared locks are handled differently than requests for exclusive locks. Shared locks. When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site containing a replica of Q. Exclusive locks. When transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites containing a replica of Q. Advantage - imposes less overhead on read operations. Disadvantage - additional overhead on writes

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Quorum Consensus Protocol A generalization of both majority and biased protocols Each site is assigned a weight.  Let S be the total of all site weights Choose two values read quorum Q r and write quorum Q w  Such that Q r + Q w > S and 2 * Q w > S  Quorums can be chosen (and S computed) separately for each item Each read must lock enough replicas that the sum of the site weights is >= Q r Each write must lock enough replicas that the sum of the site weights is >= Q w

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Local Global Deadlock Handling

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Timestamping Timestamp based concurrency-control protocols can be used in distributed systems Each transaction must be given a unique timestamp

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Query Processing For centralized systems, the primary criterion for measuring the cost of a particular strategy is the number of disk accesses. In a distributed system, other issues must be taken into account:  The cost of a data transmission over the network.  The potential gain in performance from having several sites process parts of the query in parallel.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Query Transformation Translating algebraic queries on fragments.  It must be possible to construct relation r from its fragments  Replace relation r by the expression to construct relation r from its fragments Consider the horizontal fragmentation of the account relation into account 1 =  branch-name = “Hillside” (account) account 2 =  branch-name = “Valleyview” (account) The query  branch-name = “Hillside” (account) becomes  branch-name = “Hillside” (account 1  account 2 ) which is optimized into  branch-name = “Hillside” (account 1 )   branch-name = “Hillside” (account 2 )

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Example Query (Cont.) Since account 1 has only tuples pertaining to the Hillside branch, we can eliminate the selection operation. Apply the definition of account 2 to obtain  branch-name = “Hillside” (  branch-name = “Valleyview” (account) This expression is the empty set regardless of the contents of the account relation. Final strategy is for the Hillside site to return account 1 as the result of the query.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Simple Join Processing Consider the following relational algebra expression in which the three relations are neither replicated nor fragmented account depositor branch account is stored at site S 1 depositor at S 2 branch at S 3 For a query issued at site S I, the system needs to produce the result at site S I

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Possible Query Processing Strategies Ship copies of all three relations to site S I and choose a strategy for processing the entire locally at site S I. Ship a copy of the account relation to site S 2 and compute temp 1 = account depositor at S 2. Ship temp 1 from S 2 to S 3, and compute temp 2 = temp 1 branch at S 3. Ship the result temp 2 to S I. Devise similar strategies, exchanging the roles S 1, S 2, S 3 Must consider following factors:  amount of data being shipped  cost of transmitting a data block between sites  relative processing speed at each site

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Semijoin Strategy Let r 1 be a relation with schema R 1 stores at site S 1 Let r 2 be a relation with schema R 2 stores at site S 2 Evaluate the expression r 1 r 2 and obtain the result at S 1.  1. Compute temp 1   R1  R2 (r1) at S1.  2. Ship temp 1 from S 1 to S 2.  3. Compute temp 2  r 2 temp1 at S 2  4. Ship temp 2 from S 2 to S 1.  5. Compute r 1 temp 2 at S 1. This is the same as r 1 r 2.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Formal Definition The semijoin of r 1 with r 2, is denoted by: r 1 r 2 it is defined by:   R1 (r 1 r 2 ) Thus, r 1 r 2 selects those tuples of r 1 that contributed to  r 1 r 2. In step 3 above, temp 2 =r 2 r 1. For joins of several relations, the above strategy can be extended to a series of semijoin steps.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Join Strategies that Exploit Parallelism Consider r 1 r 2 r 3 r 4 where relation ri is stored at site S i. The result must be presented at site S 1. r 1 is shipped to S 2 and r 1 r 2 is computed at S 2 : simultaneously r 3 is shipped to S 4 and r 3 r 4 is computed at S 4 S 2 ships tuples of (r 1 r 2 ) to S 1 as they produced; S 4 ships tuples of (r 3 r 4 ) to S 1 Once tuples of (r 1 r 2 ) and (r 3 r 4 ) arrive at S 1 (r 1 r 2 ) (r 3 r 4 ) is computed in parallel with the computation of (r 1 r 2 ) at S 2 and the computation of (r 3 r 4 ) at S 4.

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Heterogeneous Distributed Databases Many database applications require data from a variety of preexisting databases located in a heterogeneous collection of hardware and software platforms Data models may differ (hierarchical, relational, etc.) Transaction commit protocols may be incompatible Concurrency control may be based on different techniques (locking, timestamping, etc.) System-level details almost certainly are totally incompatible. A multidatabase system is a software layer on top of existing database systems, which is designed to manipulate information in heterogeneous databases  Creates an illusion of logical database integration without any physical database integration

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Advantages Preservation of investment in existing  hardware  system software  Applications Local autonomy and administrative control Allows use of special-purpose DBMSs Step towards a unified homogeneous DBMS  Full integration into a homogeneous DBMS faces Technical difficulties and cost of conversion Organizational/political difficulties  Organizations do not want to give up control on their data  Local databases wish to retain a great deal of autonomy

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Unified View of Data Agreement on a common data model  Typically the relational model Agreement on a common conceptual schema  Different names for same relation/attribute  Same relation/attribute name means different things Agreement on a single representation of shared data  E.g. data types, precision,  Character sets ASCII vs EBCDIC Sort order variations Agreement on units of measure Variations in names  E.g. Köln vs Cologne, Mumbai vs Bombay

04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Query Processing Several issues in query processing in a heterogeneous database Schema translation  Write a wrapper for each data source to translate data to a global schema  Wrappers must also translate updates on global schema to updates on local schema Limited query capabilities  Some data sources allow only restricted forms of selections E.g. web forms, flat file data sources  Queries have to be broken up and processed partly at the source and partly at a different site Removal of duplicate information when sites have overlapping information  Decide which sites to execute query Global query optimization