UNIT 12 Distributed Database

Slides:



Advertisements
Similar presentations
Distributed databases 1. 2 Outline introduction principles / objectives problems.
Advertisements

1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Distributed Databases and Its Twelve Objectives CS157B Name: Yingying Wu Professor: Sin-Min Lee Reference Book: An introduction to Database Systems By.
V. Megalooikonomou Distributed Databases (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU) Temple University – CIS.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Distributed Database UNIT 12 Distributed Database.
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Databases Reference Books: An introduction to Database Systems - By C.J. Database Systems and Concepts – Silberchatz, Korth and Sudarshan Lecture.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed Database System
Distributed database system
1 Distributed Databases Chapter 21, Part B. 2 Introduction v Data is stored at several sites, each managed by a DBMS that can run independently. v Distributed.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Distributed Databases and Client-Server Architectures
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Concepts
Distributed Databases
Parallel and Distributed Databases
The Client/Server Database Environment
Distributed DBMS Concepts of Distributed DBMS
Chapter 19: Distributed Databases
#01 Client/Server Computing
Chapter 17: Database System Architectures
Outline Announcements Fault Tolerance.
Distributed Database Systems
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases
Outline Announcements Lab2 Distributed File Systems 1/17/2019 COP5611.
Database Architecture
Distributed Databases
Outline Review of Quiz #1 Distributed File Systems 4/20/2019 COP5611.
Introduction of Week 14 Return assignment 12-1
Database System Architectures
Distributed Databases
#01 Client/Server Computing
Presentation transcript:

UNIT 12 Distributed Database

Contents 12.1 Introduction 12.2 The Twelve Objectives 12.3 Problems of Distributed Database Systems 12.4 Gateways 12.5 Client/Server Systems

12.1 Introduction

Distributed Database System A system involving multiple sites connected together via communication network. User at any site can access data stored at any site. Each site is a database system in its own right: its own local database, local users, local DBMS, local DC manager. User Communication manager DBMS database Communication Network Fig 12.1: A typical distributed database system

Distributed Database System (cont.) Assumptions The system is homogeneous, in the sense that each site is running its own copy of the same DBMS. (can be relaxed => heterogeneous DBMS) The communication network is slow: component sites are geographically dispersed. Advantages Enables the structure of database to mirror that of the enterprise. Combines efficiency of processing with increased accessibility. Disadvantage Complex for implementation

Distributed Database System (cont.) Sample Systems Prototypes SDD-1: Computer Corporation of America (CCA), late 1970s and early 1980s. R*: IBM, early 1980s Distributed INGRES: Berkeley, early 1980s. Commercial Products : INGRES / STAR: Relational Technology Inc. SQL*STAR: Oracle Corp. DB2 version 2 release 2: IBM SQL server: Microsoft A Fundamental Principle To the user, a distributed system should look exactly like a non-distributed system.

12.2 The Twelve Objectives

The Twelve Objectives 1. Local Autonomy all operations at a given site are controlled by that site, should not depend on other sites. local data is locally owned and managed. Not wholly achievable => sites should be autonomous to the maximum extend possible. 2. No Reliance on a Central Site all sites must be treated as equals. the central site may be bottleneck. 3. Continuous Operation Reliability Availability Never require the system to be shutdown to perform some function: e.g. add a new site.

The Twelve Objectives (cont.) 4. Location Independence ( Location Transparency ) user should not need to know at which site the data is stored, but should be able to behave as if the entire database were stored at their own local site. a request for some remote data => system should find the data automatically. Advantages <1> Simplify user programs and activities <e.g.> <2> allow data to be moved from one site to another at any time without invalidating any program or activities. C A SELECT S# FROM S AT SITE A WHERE SNAME = 'John' B

The Twelve Objectives (cont.) 5. Fragmentation Independence ( Fragmentation Transparency ) Data Fragmentation a given local object can be divided up into pieces (fragments) for physical storage purpose. <e.g.> user perception New York fragment EMP# DEPT# SALARY E1 DX 45K E2 DY 40K E3 DZ 50K E4 DY 63K E5 DZ 40K EMP E5 DZ 40K London physical storage Fig. 12.2: An example of fragmentation.

The Twelve Objectives (cont.) Data Fragmentation A fragmentation can be any subrelation derivable via restriction and projection (with primary key). Advantage: data can stored at the location where it is most frequently used. Fragmentation independence user should be able to behave as if the relations were not fragmented at all. one reasons why relational technology is suitable for DBMS. user should be presented with a view of data. => system must support updates against join and union views. Advantages (1) simplify user program and activity. (2) allow data to be re-fragmented at any time.

The Twelve Objectives (cont.) 6. Replication Independence ( Replication Transparency ) Data Replication USER PERCEPTION EMP# DEPT# SALARY E1 E2 E3 E4 E5 DX DY DZ 45K 40K 50K 63K EMP EMP# DEPT# SALARY E1 E3 E5 DX DZ 45K 50K 40K EMP# DEPT# SALARY E2 E4 DY 63K New York fragment London fragment replica of London fragment replica of New York fragment physical storage New York London copy Fig. 12.3: An example of replication.

The Twelve Objectives (cont.) Data Replication A given fragment of relation can be represented at the physical level by many distinct copies of the same object at many distinct sites. Unit of replication: fragment (may not a complete relation) Advantage: better performance and availability Disadvantage: update propagation problem. Replication Independence User should be able to behave as if the data is not replicated at all. Advantages (1) simplify user programs and activities. (2) allow replicas to be created and destroyed dynamically.

The Twelve Objectives (cont.) 7. Distributed Query Processing message transfer cost optimization 8. Distributed Transaction Management concurrency control recovery control 9. Hardware Independence: IBM, DEC, HP, PC, ... 10. Operating System Independence: VMS, UNIX, ... 11. Network Independence: BITNET, INTERNET, ARPANET, ... 12. DBMS Independence: Relational, hierarchical, network, ... distributed system may be heterogeneous.

12.3 Problems of Distributed Database Systems

Basic Point: Network are slow ! Overriding Objective : minimize the number and volume of messages. Give rise to the following problem Query Processing Update Propagation Concurrency Recovery Catalog Management .

Query Processing: Example Query Optimization is more important in a distributed system. Example (Date, Vol.2 p.303) Database: S ( S#, CITY ) 10,000 tuples, stored at site A. P ( P#, COLOR) 100,000 tuples, stored at site B. SP ( S#, P# ) 1,000,000 tuples, stored at site A. Assume each tuple is 100 bits long. Site A: S SP Site B: P

Query Processing: Example (cont.) Query: "Select S# for London suppliers of Red Parts" Estimates # of Red Parts = 10 # of Shipments by London Supplier = 100,000 Communication Assumption : Data Rate = 10,000 bits per second Access Delay = 1 second T[i] = total communication time for strategy i = total access delay + total data volume / data rate = (# of messages * 1 sec) + (total # of bits / 10,000 ) sec. SELECT S.S# FROM S, P, SP WHERE S.CITY = "London" AND S.S# = SP.S# AND SP.P# = P.P# AND P.COLOR = 'Red' site A site B S, SP P S SP

Query Processing: Example (cont.) site A site B S, SP P Strategy 1 1. Join S and SP at site A 2. Select tuples from ( S SP ) for which city is 'London' ( 100,000 tuples ) 3. For each of those tuple, check site B to see if the part is red. (2 messages: 1 query, 1 response) T[1] = ( 100,000 * 2 ) * 1 = 2.3 days Strategy 2 Move relations S and SP to site B and process the query at B. T[2] = 2+(10,000+1,000,000)*100/10,000 = 28 hours Strategy 3 Move relation P to site A and process the query at A T[3] =1+(100,000*100) /10,000 = 16.7 min

Query Processing: Example (cont.) Strategy 4 1. Select tuples from P where color is red. (10 tuples) 2. Check site A to see if there exists a shipment relating the part to a London Supplier. ( 2*10 messages ) T[4] = 2*10*1 = 20 sec Strategy 5 1. Select tuples from P where color is red (10 tuples) 2. Move the result to site A and complete the processing at A. T[5] = 1 + ( 10*100) / 100,000 = 1.01 sec Note: Each of the five strategies represents a plausible solution, but the variation in communication time is enormous. site A site B S, SP P

Query Processing: Semijoin Semijoin: (used in SDD - 1) A B <e.g.> Database : Regular Join: Ref. p.529 [18.15] p.626 [21.26] site A site B S SP S# S' SP' S: 1,000 tuples, at site A SP: 2,000 tuples, at site B # of tuples in S where S.S#=SP.S#: 100, length of a S tuple: 100 bit length of a SP tuple: 100 bit length of the S# field: 10 bit <1> Ship S to site B ( 1000 * 100 bits ) <2> Join S and SP at site B communication time = 1 + 1000*100/10000 = 11 sec

Query Processing: Semijoin (cont.) <1> site B: step 1. Project SP on S# (get SP') step 2. ship to site A <2> site A: step 3. Join the projection of SP' on S# with S step 4. The result S‘, ship to site B <3> site B: step 5. Join S' with SP site A site B S SP S# S' SP' communication time = 1+10*2000/10000+1+100*100/10000 = 1+2+1+1= 5 sec Site A Site B S# S# P# # = 1,000 #=2,000 100 bits 10 bits SP' S1 S4 ... S921 # =< 2,000 Join S' # = 100 SP S

Update Propagation Basic problem with data replication An update to any given logical data object must be propagated to all stored copies of that object. some sites may be unavailable (because of site or network failure) at the time of the update => Data is less available ! A possible Solution: Primary Copy (used in distributed INGRES) one copy of each object is designated as the primary copy. primary copies of different objects will generally be at different sites. Update Operation 1. Complete as soon as the primary copy has been updated. 2. Control is returned and the transaction can continue execute. 3. The site holding the primary copy broadcasts the update to all other sites. Further Problem: violation of the local autonomy objective.

Concurrency Most distributed systems are based on locking . Requests to test, set and release locks are messages overhead. <e.g.>: Consider a transaction T that needs to update an object for which there exists replicas at n remote sites. Several orders of magnitude greater than in a centralized system. Solution adopt the primary copy strategy. the site holding the primary copy of X handles all locking operations involving X. 1 lock request, 1 lock grant, n updates, n ack, and 1 unlock request (2n+3<=5n). Problem : loss of local autonomy. n lock requests n lock grants n update messages n acknowledgments n unlock requests 5n :

Concurrency (cont.) Global Deadlock Problem Neither site can detect it using only information that is internal to that site. i.e. no cycles in two local wait-for-graph, but a cycle in the global. <e.g.> Global deadlock detection needs further communication overhead. holes lock Lx SITE X T1x T2x wait for T1x to release Lx holes lock Ly T1y T2y wait for T2y to release Ly wait for T1y to complete wait for T2x

Recovery Two-Phase Commit Points arising important whenever a given transaction can interact with multiple, independent 'resource manager' (site). the transaction issues a single system-wide COMMIT (or ROLLBACK), which is handled by a new system component: Coordinator. The coordinator goes through the following two-phase process : <1> Coordinator requests all participant to decide whether commit ( reply OK ) or rollback. <2> If all participants reply OK, the coordinator broadcast 'COMMIT', otherwise broadcast 'ROLLBACK' , and all participants must obey. Points arising the coordinator function must be performed by different sites for different transactions => No reliance on a central site. (P. 612) the participants must do what is told by the coordinator => loss of local autonomy. Coordinator must communicate with participants => more messages and more overhead. No protocol guarantee that all participants will commit or rollback in unison.

Catalog Management Contents of catalog: Where and How ? not only data regarding relations, indexes, users, etc, but also all the necessary control information for independence. Where and How ? 1. Centralized: violate no reliance on a central site. 2. Fully replicated: loss of local autonomy. 3. Partitioned: nonlocal operations are very expensive. 4. Combination of (1) and (3): violate no reliance on a central site.

12.4 Gateways

Gateways DBMS independence Suppose INGRES provides a "gateway", that "making ORACLE look like INGRES" INGRES user INGRES (SQL) INGRES/ STAR GATE WAY ORACLE (SQL) INGRES database ORACLE database distributed INGRES database Fig. 12.4: A hypothetical INGRES-provided gateway to ORACLE

12.5 Client/Server Systems

Client/Server Architecture Applications DBMS Transparent remote access Client machine Server machine Fig. 12.5: A client/server system some sites are client, and others are server sites a great deal of commercial products little in "true" general-purpose distributed system (but long-term trend might be important) client: application or front-end server: DBMS or backend Several variations

Client/Server Systems Client/Server Standards SQL/92 standard ISO (International Organization for Standardization) Remote Data Access standard: RDA. Distributed Relational Database Architecture standard: DRDA. Client/Server Application Programming Stored procedure: a precompiled program that is stored at the server site. invoked from the client by a remote procedure call (RPC). one stored procedure can be shared by many clients. optimization can be done at the precompiled time instead of at run time. provide better security. but, no standards in this area.

National Dong Hwa University Department of Information Management ADVANCED DATABASE SYSTEMS 國立東華大學 資訊管理系 National Dong Hwa University Department of Information Management 楊維邦 教授 Prof. Weipang Yang

Contents: Part I ---------------------------------------------------- Unit 0 Introduction to Database Systems Unit 1 Introduction to DBMS Unit 2 DB2 and SQL Unit 3 The Relational Model Unit 4 The Hierarchical Model Unit 5 The Network Model Unit 6 Access Methods ---------------------------------------------------- References: 1. C. J. Date, An Introduction to Database Systems, 8th edition, 2004. 2. J. D. Ullman, Principles of Database and Knowledge-Base, Vol.I, 1988. 3. Cited papers

Contents: Part II ---------------------------------------------------- Unit 7 Logical Database Design Unit 8 Database Recovery Unit 9 Concurrency Control Unit 10 Security and Integrity Unit 11 Query Optimization Unit 12 Distributed Database ---------------------------------------------------- References: 1. C. J. Date, An Introduction to Database Systems, 8th edition, 2004. 2. J. D. Ullman, Principles of Database and Knowledge-Base, Vol. I, 1988. 3. Cited papers

Contents: Part III Unit 13 Object-oriented Database Unit 14 Logic-Based Database Unit 15 Image Database Unit 16 Multimedia Database Unit 17 Multidatabase Unit 18 Real-time Database Unit 19 Parallel Database Unit 20 Temporal Database Unit 21 Active Database Unit 22 Bioinformatic Database Unit 23 ….

Study and Research on Databases Level 5: Doing Research Level 4: Survey Papers: Special Topics (Unit 13 - ) Level 3: DBMS: Advanced Topics (Unit 7 – 12) Date, Vol. 1, 2 Ullman Level 2: DBMS: Fundamentals (Unit 1 – 6) Date, Vol. 1 Using DB2, Project: Design a “mini” DBMS Level 1: Using DBMS

Where to find Database papers? Journals: ACM Trans. On Database Systems (TODS) IEEE Trans. On Software Engineering (TOSE) IEEE Trans. On Knowledge and Data Engineering ACM Trans. On Office Information Systems (TOOIS) Journal of Information Science and Engineering … Conferences: VLDB (Very Large Data Base) ACM SIGMOD PODS (Principal of Database Systems) IEEE Data Engineering NCS, ICS (National/International Computer Symposium)