Distributed Databases

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

ISOM Distributed Databases Arijit Sengupta. ISOM Learning Objectives Understand the concept and necessity of distributed databases Understand the types.
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Chapter 12 (Online): Distributed Databases
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Chapter 13 (Web): Distributed Databases
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
Distributed Databases
DISTRIBUTED DATABASES AND DDBMS.  Understand the concept of “Distributed Data”  Describe various Distributed Data and DDBMS implementations  Explain.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
Database Design – Lecture 16
1 Chapter 13: Distributed Databases. Chapter 13 2 Definitions Distributed Database: A single logical database that is spread physically across computers.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Lecture 11 Distributed Databases and Cloud computing
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed database system
Distributed Databases
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 12 (Online): Distributed Databases Modern Database Management 10 th Edition Jeffrey.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Database Administration: The Complete Guide to Practices and Procedures Chapter 19 Data Movement and Distribution.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 11 Distributed Databases Modern Database Management 9 th Edition Jeffrey A. Hoffer,
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
LM 9. Distributed Database Dr. Lei Li 1. Note: The content of the slides including figures are mainly based on a publicly available textbook chapter:
Introduction & Uses of Distributed Database
Distributed DBMS Concepts of Distributed DBMS
Chapter 19: Distributed Databases
Distributed Databases
Distributed Databases
A View over Distributed databases
Distributed Databases
Presentation transcript:

Distributed Databases

Objectives key terms in the distributed database area Distributed vs. Decentralized Database Homogenous vs. Heterogeneous Decentralized Database Location transparency vs. local autonomy Asynchronous vs. Synchronous distributed databases Horizontal vs. Vertical partitioning Full refresh vs. differential refresh Push replication vs. Pull replication Local transaction vs. Global Transaction

Objectives Describe salient characteristics of distributed database environments Explain advantages and risks of distributed databases Explain strategies and options for distributed database design Discuss synchronous and asynchronous data replication and partitioning Discuss optimized query processing in distributed databases

Distributed vs. Decentralized Database Both are stored on computers in multiple locations Distributed Database Geographical distribution of a SINGLE database Decentralized Database A collection of independent databases on non-networked computers Users at various sites cannot share data

Distributed Database Require multiple DBMS running at remote sites There are different types of distributed database environments The degree to which these DBMS cooperate Having a master site to coordinate requests involving data from multiple sites

Reasons for Distributed Database Distribution and Autonomy of Business Units Departments/Facilities are geographically distributed Each has the authority to create and control own data Business mergers create this environment Data sharing Consolidate data across local databases on demand. Data communication costs and reliability Economical and reliable to locate data where needed. High cost for remote transactions / large data volumes Dependence on data communications can be risky

Reasons for Distributed Database Multiple application vendor environment Each unit may have different vendor applications A distributed DBMS can provide functionality that cuts across separate applications Database recovery Replicating data on separate computers may ensure that a damaged database can be quickly recovered

Homogeneous vs. Heterogeneous Distributed Database Homogeneous Distributed Database - The same DBMS is used at each node Difficult for most organizations to force a homogeneous environment Heterogeneous Distributed Database Potentially different DBMS are used at each node Much more difficult to manage

Typical Homogeneous Environment Data distributed across all the nodes. Same DBMS at each node. A central DBMS coordinates database access and update across the notes No exclusively local data All access is through one, global schema. The global schema is the union of all the local schema.

Figure 13-2 – Homogeneous Database Everyone is a GLOBAL user Identical DBMSs

Typical Heterogeneous Environment Data distributed across all the nodes. Different DBMSs may be used at each node. Local access is done using the local DBMS and schema. Remote access is done using the global schema.

Figure 13-3 –Typical Heterogeneous Environment Local user accesses his own data Non-identical DBMSs

Major Objectives of Distributed Database Allow users to share data yet be able to operate independently when network link fails. Location Transparency User does not have to know the location of the data Data requests automatically forwarded to appropriate sites Local Autonomy Local site can operate with its database when network connections fail Each site controls its own data, security, logging, recovery

Trade-Offs in Distributed Database When do you update data across the database? Synchronous Distributed Database All copies of the same data are always identical Updates apply immediately to all copies throughout network Good for data integrity High overhead  slow response times Asynchronous Distributed Database Some data inconsistency is tolerated Data update propagation is delayed Lower data integrity Less overhead  faster response time

Advantages of Distributed Database Increased reliability and availability Even when a component fails the database may continue to function albeit at a reduced level Allow Local control over data. Local control promotes data integrity and administration Modular growth Easy to add a connection to a new location Less chance of disrupting existing users during expansion Lower communication costs. Faster response for certain queries. Query local data Parallel queries

Disadvantages of Distributed Database Software cost and complexity. Processing overhead. Data integrity exposure. Slower response for certain queries. If data are not distributed properly, according to their usage, or if queries are not formulated correctly, queries can be extremely slow

Options for Distributing a Database Data Replication Horizontal Partitioning Vertical Partitioning Combinations of the above

Data Replication Advantages Reliability – if one node fails, you can find data at another node Fast response at sites that have a full copy May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.) De-couples nodes -transactions proceed even if some nodes are down. Reduced network traffic at prime time, if updates can be delayed to non-primetime hours

Data Replication Disadvantages - Storage requirements Complexity and cost of updating. Integrity exposure of getting incorrect data if replicated data is not updated simultaneously.

Data Replication Best for non-volatile/static, non-collaborative data Catalogs Telephone directories Train Schedules Not good for on-line applications Airline reservations ATM transactions

Types of Data Replication Push Replication Updating site sends changes to other sites Pull Replication Receiving sites control when update messages will be processed

Types of Push Replication Snapshot Replication Changes periodically sent to master site Master collects updates in log Near Real-Time Replication Broadcast update orders without requiring confirmation Update messages stored in message queue until processed by receiving site

Issues in Data Replication Use Data timeliness – high tolerance for out-of-date data may be required DBMS capabilities – if DBMS cannot support multi-node queries, replication may be necessary Performance implications – refreshing may cause performance problems for busy nodes Network heterogeneity – complicates replication Network communication capabilities – complete refreshes place heavy demand on telecommunications

Horizontal Partitioning Different rows of a table at different sites Advantages - Data stored close to where it is used  efficiency Local access optimization  better performance Only relevant data is available  security Unions across partitions  ease of query Disadvantages Accessing data across partitions  inconsistent access speed If no data replication  backup vulnerability

Vertical Partitioning Different columns of a table at different sites Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins (instead of unions)

Factors in Choice of Distributed Strategy No approach to data distribution is ALWAYS best Choice depends on Funding, autonomy, security. Site data referencing patterns. Growth and expansion needs. Technological capabilities. Costs of managing complex technologies. Need for reliable service.

Distributed DBMS Distributed database requires distributed DBMS Functions of a distributed DBMS: Locate data with a distributed data dictionary Determine location from which to retrieve data and process query components DBMS translation between nodes with different local DBMSs (handle heterogeneous DBMS translation using middleware) Data consistency (via multiphase commit protocols) Global primary key control Scalability Security, concurrency, query optimization, failure recovery

Distributed DBMS Data Reference Local Transaction - references local data. Global Transaction - references non-local data.

Distributed DBMS Architecture

Distributed DBMS Transparency Objectives Location Transparency User/application does not need to know where data resides Replication Transparency User/application does not need to know about duplication Failure Transparency Either all of the actions of a transaction are committed or else none of them is committed. If a transaction fails at one site it don’t commit at other sites A system should detect a failure (broken communication link, erroneous data, disk head crash), reconfigure the system and recover Each site has a transaction manager Logs transactions and before and after images Requires special commit protocol

Failure Transparency Two-Phase Commit Commit Protocol: Ensures that a global transaction is either successfully completed at each site or else aborted. Two-Phase Commit Prepare Phase: Check if operation ok at all participating sites Commit Phase: Only if all participating sites agree, do you issue the commite

Distributed DBMS Transparency Objectives Concurrency Transparency Allow multiple users to run transactions concurrently, with each transaction appears as if it is the only activity in the system Timestamping Ensure that even if two events occur simultaneously at different sites, each will have a unique timestamp. Alternative to locks in distributed databases

Distributed DBMS Vendors Oracle Microsoft Informix Sybase IBM Computer Associates Ingress Others……