Distributed Database Systems

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
Chapter 12 Distributed Database Management Systems
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
1 Distributed Databases CS347 Lecture 13 May 23, 2001.
Distributed Databases
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Distributed databases
Distributed Databases and DBMSs: Concepts and Design
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed Databases Midterm review. Lectures covered Everything until (including) March 2 nd Everything until (including) March 2 nd Focus on distributed.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
DDBMS Distributed Database Management Systems Fragmentation
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
Distributed Databases and Client-Server Architectures
CHAPTER 25 - Distributed Databases and Client–Server Architectures
Distributed Database Management Systems
DISTRIBUTED DATABASE ARCHITECTURE
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Distributed Databases
Introduction of Week 14 Return assignment 12-1
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Distributed Database Systems Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Definitions: Distributed Database : is a collection of multiple logically interrelated databases distributed over a computer network. Distributed database management systems (DDBMS): The software that permits the management of DDBS and makes the distribution transparent to the users. Distributed database system (DDBS) = DDB + D–DBMS The two important terms in this definitions are: Logically interrelated. (The Application) Distributed over a network. Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Motivation for Distributed Database The development of computer network promotes de-centralization In a company, the database organization might reflect the organizational structure, which is distributed into units. Each unit maintains its own database Sharing of data can be achieved by developing a distributed database system which: Makes data accessible by all units Stores data close to where it is most frequently used Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi DDBMS Advantages: Data are located near “greatest demand” site Faster data access Faster data processing Growth facilitation Improved communications Reduced operating costs User-friendly interface Less danger of a single-point failure Processor independence Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi DDBMS Disadvantages Complexity of management and control Security Lack of standards Increased storage requirements Greater difficulty in managing the data environment Increased training cost Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi The concept of DDB: A DDBS is not a collection of files that can be individually stored at each node of computer network. To form a DDBS, files should not only be logically related, but there should be structure among the files, and access should be via a common interface. Dr. Mohamed Osman Hegazi

Distributed Database Management Systems Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi An Example EMP(ENO, ENAME, TITLE) ASG(ENO, PNO, DUR, RESP) PROJ(PNO, PNAME, BUDGET) PAY(TITLE,SAL) Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Distributed Query If these table is stored in one place then we can “for example” using the following query to get the name and the salary of the employee who works more than 12 months. SELECT ENAME, SAL FROM EMP, ASG, PAY WHERE ASG. DUR >12 AND EMP.ENO=ASG.ENO AND PAY.TITLE=EMP.TITLE  But if these table are distributed over deferent site then the execution of this query needs allot of process to be done , DDMS do this process and let the end user feel like database’s only user (transparence) Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Distributed Database Transparency The concepts of DDB is to fragment the data and store each fragment on its site. Data may be replicated on different site (replication) DDBMS hide these details from the user and makes the distribution transparent to the users. Distributed Database Transparency Features Distribution transparency Transaction transparency Failure transparency Performance transparency Heterogeneity transparency Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Distributed DB Design Top-down approach: have a database how to split and allocate to individual sites Two issues in top-down design Fragmentation Allocation Multi-databases (or bottom-up): combine existing databases how to deal with heterogeneity & autonomy Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Fragmentation Horizontal Primary depends on local attributes R Derived depends on foreign relation Vertical R Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Example Employee relation E (#,name,loc,sal,…) 40% of queries: 40% of queries: Qa: select * Qb: select * from E from E where loc=Sa where loc=Sb and… and ... Motivation: Two sites: Sa, Sb Qa   Qb Sa Sb Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi # Name Loc Sal 5 Joe Sa 10 E 7 Sally Sb 25 8 Tom Sa 15 .. .. F = {F1,F2} 5 Joe Sa 10 7 Sally Sb 25 .. At Sb 8 Tom Sa 15 At Sa .. F1 = loc=Sa(E) F2 = loc=Sb(E)  primary horizontal fragmentation Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Loc=SA  sal < 10 Qa: Select … loc = SA ... Qb: Select … loc = SB ... Loc=SA  sal  10 F3 F2 Prefer F2 to F1 and F3 F1 Loc=SB  sal < 10 Loc=SB  sal  10 Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Horizontal Fragmentation : Peer to peer relationship – brothers Dr. Mohamed Osman Hegazi

Vertical fragmentation Example: E E2 E1 R[T]  R1[T1], R2[T2],…, Rn[Tn] Ti  T  Just like normalization of relations Dr. Mohamed Osman Hegazi

Vertical Fragmentation example New York PROJ PNO PNAME BUDGET LOC P1 Instrumentation 150000 Montreal P3 CAD/CAM 250000 P2 Database Develop. 135000 P4 Maintenance 310000 Paris P5 500000 Boston PROJ1: information about project budgets PROJ2: information about project names and locations PROJ1 PROJ2 PNO BUDGET PNO PNAME LOC P1 Instrumentation Montreal P3 CAD/CAM New York P2 Database Develop. P4 Maintenance Paris P5 Boston P1 150000 P2 135000 P3 250000 P4 310000 P5 500000 Dr. Mohamed Osman Hegazi

Grouping Attributes E1(#,NM,LOC) E2(#,SAL) Example: E(#,NM,LOC,SAL) E1(#,NM) E2(#,LOC) E3(#,SAL) Which is the right vertical fragmentation? ….. Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Vertical Fragmentation : branch relationship – parents and son Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Hybrid Fragmentation R  HF HF R1 R2   VF VF VF VF VF      R11 R12 R21 R22 R23 Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Allocation Example: E  F1 = loc=Sa(E); F2 = loc=Sb(E) Fragment E F1 F1 Site c Site a F2 Site b Do we replicate fragments? Where do we place each copy of each fragment? Dr. Mohamed Osman Hegazi

Allocation Alternatives Non-replicated partitioned : each fragment resides at only one site Replicated fully replicated : each fragment at each site partially replicated : each fragment at some of the sites Rule : If replication is advantageous, otherwise replication may cause problems read - only queries update queries 1 Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Optimization problem What is the best placement of fragments and/or best number of copies to: minimize query response time maximize throughput minimize “some cost” ... Subject to constraints Available storage Available bandwidth, processing power,… Keep 90% of response time below X Very hard problem Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Static data allocation & Dynamic data allocation Static data allocation : No change on allocation sites or no need for extra storage space (no expanding on the size . No increasing on data ) Dynamic data allocation: dynamically changed the location of the data as a result of expansion in the data, which usually results because of the nature of the systems producing data. Problems of data sites can be treated through two types of models: Adaptive Models: models that apply when the reason for the change of location due to system activity,( online systems- data storage on line( Example airline bookings). These models saves the additional temporary copies of data and then dealing with these copies by processors duplicate copies (replication). non-adaptive models: These models solve dynamically allocation at the stage of establishing the system or at the stage of reorganization the system Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Replication Replication is to store copies of the same data in more than one location (site) and then these copies must be consistency updated "Despite the distance from each other" Controlling the updating of these copies is done by one of two techniques: Lazy replication: it is to update the data after the completion of work on one of the copies (master copy). This means that update is done outside the boundaries of transaction Eager replication: is to update the replicated data within the transaction boundaries while working on one of the copies. central update(initial copy primary copy): update the primary copy first and then update the secondary copy. This method leads to lack of synchronization of the update, which facilitates control of consistency, but may lead to the problems of the bottleneck Or update everywhere: ​​updating the copies in all places make all the copies of equal opportunities for the update. Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Where/when Eager Lazy Primary Copy Early Solutions in Ingres Sybase/IBM/Oracle Placement Strat. Serialization- Graph Based Update Everywhere ROWA/ROWAA Quorum based Oracle Synchr. Repl. Oracle Advanced Repl. Weak consistency Strat Dr. Mohamed Osman Hegazi

Distributed Query Processing The aim of queries processing in distributed data is to let the work on distributed data appear like a single database system. The problem of query processing in distributed data can be fragmented into several levels according to the problems of data The query processing takes SQL statements or OQL as input and then process it through several stages until it is executing the query. Query Decomposition & Data Localization: The first stages of the distributed query processing is to analyze the query to the relation algebra, then the second stages localize the data by distribute the query. Query Optimization: The third stages is to achieve optimal implementation of the query by making the executive be as little as possible and delete the unneeded expression. The query optimization is one of the important aspects in dealing with queries, there are many algorithms used in the investigation of this aspect . Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi

Dr. Mohamed Osman Hegazi Concurrency Control in distributed database Concurrency control in databases is the activities that make the transactions consistence among all the system data. DDBMS take care of synchronized the data that distributed over the network side, each of these sites are running programs dealing with it is own data. In this situation the process of controlling the concurrency of the distributed data is one of the more complex issues. There are four techniques used to control the concurrence on distributed database: locking techniques Timestamp. Optimistic algorithm: make all operations on the data performed except for the operation that updates the data in this case operation updates the local data first. Complex algorithm for timestamps. Dr. Mohamed Osman Hegazi

Distributed Concurrency Control Nonreplicated Scheme Each site maintains a local lock manager to administer lock and unlock requests for local data Deadlock handling is more complex Single-Coordinator Approach The system maintains a single lock manager that resides in a single chosen site Can be used with replicated data Advantages simple implementation simple deadlock handling Disadvantages bottleneck vulnerability Dr. Mohamed Osman Hegazi

. . . Distributed Concurrency Control Majority Protocol A lock manager at each site When a transaction wishes to lock a data item Q, which is replicated in n different sites, it must send a lock request to more than half of the n sites in which Q is stored complex to implement difficult to handle deadlocks Dr. Mohamed Osman Hegazi