DISTRIBUTED COMPUTING

Slides:



Advertisements
Similar presentations
Types of Distributed Database Systems
Advertisements

Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
Database Systems, 8 th Edition Concurrency Control with Time Stamping Methods Assigns global unique time stamp to each transaction Produces explicit.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
ICS 421 Spring 2010 Distributed Transactions Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 3/16/20101Lipyeow.
Manajemen Basis Data Pertemuan 9 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed Database Management Systems
Overview Distributed vs. decentralized Why distributed databases
Distributed Databases
Manajemen Basis Data Pertemuan 10 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Distributed Transactions Transaction may access data at several sites. Each site has a local.
Chapter 12 Distributed Database Management Systems
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed databases
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
III. Current Trends: 2 - Distributed DBMSsSlide 1/47 III. Current Trends Distributed DBMSs: Advanced Concepts 3C13/D63C13/D6.
Distributed DBMSs - Concepts and Design Transparencies
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
Transaction Communications Yi Sun. Outline Transaction ACID Property Distributed transaction Two phase commit protocol Nested transaction.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Lecture 16- Distributed Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
Distributed Databases DBMS Textbook, Chapter 22, Part II.
Instructor: Marina Gavrilova. Outline Introduction Types of distributed databases Distributed DBMS Architectures and Storage Replication Synchronous replication.
Distributed Databases
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
II.I Selected Database Issues: 2 - Transaction ManagementSlide 1/20 1 II. Selected Database Issues Part 2: Transaction Management Lecture 4 Lecturer: Chris.
XA Transactions.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
Chapter 1 Database Access from Client Applications.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Databases – Advanced Concepts Chapter 25 in Textbook.
Distributed Database Concepts
Distributed DBMS Concepts of Distributed DBMS
Outline Announcements Fault Tolerance.
Chapter 10 Transaction Management and Concurrency Control
Distributed Databases and DBMSs: Concepts and Design
Distributed Databases
Distributed Databases Recovery
Distributed Databases
Presentation transcript:

DISTRIBUTED COMPUTING Sunita Mahajan, Principal, Institute of Computer Science, MET League of Colleges, Mumbai Seema Shah, Principal, Vidyalankar Institute of Technology, Mumbai University

Chapter - 12 Distributed Database Management System

Topics Introduction Distributed DBMS architectures Data storage in a distributed DBMS Distributed catalog management Distributed query processing Distributed transactions Distributed concurrency control Distributed database recovery Mobile databases Case study: Distribution and replication in Oracle

Introduction

Distributed Database Concepts Distributed Database (DDB) Distributed database Management System (DDBMS) Distributed Processing Parallel Database Advantage of DDBMS Disadvantages of DDBMS

Nationalized Bank’s Database A logically interrelated collection of shared data physically distributed over a computer network

Distributed Database Management Systems Database is split in multiple fragments stored at different nodes/sites Characteristics of DDBMS Logically related shared data can be collected Fragments can be replicated Fragments/replicas allotted to more than one site All sites are interconnected All local applications handled by on-site DBMS Each DBMS takes part in at least one global application

Distributed Database Different transparencies in DD Distribution transparency Replication Transparency Fragmentation transparency Data resides in databases at individual nodes

Distributed Processing Difference between Distributed processing and distributed DBMS Distributed processing consists of a set of processing units networked together enabling access to a centralized data A distributed database fragments centralized data on multiple nodes and accesses them as a homogenized entity

Distributed processing Data resides in a centralized database

Parallel DBMS -1 Shared memory architecture

Parallel DBMS -2 Shared Disk Shared Nothing

Advantages of DDBMS Reflection of organizational structure Improved shareability and local autonomy Improved availability and reliability Improved performance Improved Economics Modular growth

Disadvantages of DDBMS Complexity Cost Security More difficult integrity control Lack of proper standards Lack of experience More complex design

Functions of DDBMS Communication services to provide remote data access Keeping track of data System catalog management Distributed query processing Replicated data management Distributed database recovery Security Distributed directory management

Types of Distributed Databases Homogeneous DDBMS Heterogeneous database Multi-database systems

Homogeneous and heterogeneous DDBMS

Multi database systems

MDBMS can be classified as Unfederated and Federated

Distributed DBMS Architectures

Distributed DBMS Architectures Client-server architecture Collaborating server architecture Middle ware architecture

subquery

Data Storage in DDBMS

Data Storage in DDBMS A single relation either fragmented across several sites Objectives for definition and allocation of fragments Locality of reference Improved reliability and availability Acceptable performance Balanced storage capacities and costs Minimal communication costs

Data Allocation Motivation for data allocation Increased availability of data Faster query evaluation Strategies for data allocation Centralized Partitioned / Fragmented Complete replication Selective replication

A Comparison of Data Allocation strategies

Fragmentations Why fragmentation Disadvantages of fragmentation Usage Efficiency Parallelism Security Disadvantages of fragmentation Performance integrity

Fragmentation Horizontal - Vertical Correctness rules – Completeness, Reconstruction, Disjointness

Replication Some relations are replicated and stored in multiple sites. Replication helps in increased availability of data and faster query evaluation

Distributed Catalog Management Centralized global catalog Replicated global catalog Dispersed catalog Local-master catalog Naming objects Catalog structure Distributed data independence

Naming objects Every data item must have a system-wide unique name Data item should be located efficiently Location of data item should be changed transparently Each site should create data item autonomously Solution: use names with multiple fields – local name field and birth site field

Catalog Structure R* Distributed Database Project Each site maintains a local catalog for all copies of data stored at the site Catalog at birth site keeps track of locations of replicas and fragments This catalog contains a precise description of Each replica’s contents List of columns for vertical fragments Selection condition for horizontal fragments

Distributed Data Independence Queries should be written irrespective of how the relation is fragmented or replicated Users need not specify full name for the data objects accessed while evaluating query User may create a synonym for the global relation name to refer to relations created by other users DBMS maintains a table of synonyms as a part of system catalog

Distributed Query Processing

Distributed query processing Non-join queries in a DDBMS Joins in a DDBMS Semijoins Bloomjoins Cost-based query optimization challenges Minimizing communication costs Preserving the autonomy of individual sites

Updating Distributed Data

Distributed transactions Atomicity of global transactions should be ensured ACID properties should be present : *Atomicity *Consistency *Isolation *Durability Data modules present are: transaction manager, scheduler, buffer manager , recovery manager and transaction coordinator

Distributed transactions

Distributed Concurrency Control

Distributed Concurrency Control Some definitions Schedule : a sequence of operations by a set of concurrent transactions Serial schedule: operations of each transactions executed without any interleaving from other transactions Non-serial schedule: operations from a set of transactions are interleaved Locking : procedure to control concurrent access to database Shared lock: allows only reading data item Exclusive lock: allows reading and updating data item

Objectives of concurrency control All concurrency mechanisms must preserve data consistency and complete each atomic action in finite time Important capabilities are Be resilient to site and communication link failures Allow parallelism to enhance performance requirements Incur optimal cost and optimize communication delays Place constraints on atomic actions

Distributed serializability A serializable local schedule leads to global schedule being serializable provided local schedules are identical Two major approaches for concurrency control are : Locking Timestamping Locking guarantees that concurrent execution is nearly equal to some serial execution of those transactions Timestamping guarantees that concurrent execution is equal to specific serial execution specified by these timestamps

Locking protocols Centralized 2PL ( two phase locking ) Primary copy 2PL Distributed 2PL Majority locking Biased protocol Quorum consensus protocol

Timestamp protocol Objective is to order a transaction globally such that older transactions ( smaller timestamps) get priority in the event of conflict.

Distributed deadlock management Deadlocks must be avoided They must be prevented Or detected Centralized Deadlock detection Hierarchical deadlock detection Distributed deadlock detection

Deadlock example Consider 3 transactions T1 ,T2, T3 at different sites S1, S2, S3. x, y, z are 3 objects replicated at all 3 sites and x1 for copy at S1, y2 for copy at S2 and z3 for copy at S3

Deadlock Example cont. At time t1, T1 sets a shared lock on x, T2 puts an exclusive lock on y and T3 puts a shared lock on z. At t2, T1 wants exclusive lock on y but T2 has already put an exclusive lock on y so T1 has to wait. At t3, T2 wants an exclusive lock on z but T3 has put a shared lock on z so T2 has to wait. At t3, T3 wants an exclusive lock on x, but T1 has put a shared lock on x.

Wait For Graphs (WFG) Phantom deadlocks are deadlocks which are caused by delays in propagation

Centralized deadlock detection A single site defined as deadlock detection coordinator (DDC) DDC responsible for constructing and maintaining the global WFG Each lock manager sends its WFG to DDC DDC builds global WFG and checks for cycles If cycles are detected, DDC breaks the cycle by rolling back a particular transaction

Hierarchical deadlock detection S1, S2, S3 and S4 are the sites where transactions take place DD12 is deadlock involving sites 1&2 and so on.

Distributed Deadlock detection T ext is an external node to local WFG to hint that an agent is introduced at a remote site

Distributed database recovery

Distributed database recovery Failures in Distributed environment Loss of message Failures of communication link Failure at a site Network partitioning Failures affecting recovery Distributed recovery protocol Two-phase commit (2PC) Three-phase commit (3PC)

Network partitioning If the network of nodes has failed, any one of the reasons may exist

Two-phase commit A transaction is divided in many sub-transactions One node acts as Coordinator and all other nodes are participants / subordinates 2PC operates in 2 phases Phase 1 – Voting Phase 2 – Decision (Termination) Voting phase includes following steps The coordinator sends prepare to commit message to participants Participants respond with yes/no Decision phase includes following steps If coordinator receives all yes, it sends message commit else abort Each participant must acknowledge the commit/abort message Coordinator writes end log record after receiving acknowledgement from everyone

2PC discussed Two Phase commit exchanges 2 phases of messages – Voting and Termination When a message is sent, its log record is forced to stable storage A transaction is committed when the Coordinator’s commit log reaches the stable storage Fail-stop model of 2PC means failed sites stop working

Site crashed-Recovery procedure When a site comes up, recovery procedure checks the log If commit record exits then redo else undo the transaction If prepare log record but no commit / abort then contact coordinator repeatedly to find the status of transaction If no prepare, commit or abort then abort and undo the transaction

Recovery procedure cont Coordinator fails and no message is given to participants, then transaction T is blocked till Coordinator recovers Remote site does not respond during commit protocol, then either communication link or site have failed- Then actions taken: If coordinator fails, abort T If participant and not voted yes then abort T If participant and voted yes then blocked till coordinator responds

2PC with Presumed Abort Basic observations regarding 2PC protocols Ack messages are useful in knowing whether all participants are aware of decision. The coordinator site fails after sending prepare but before writing commit/abort then it has no information about T after coming up. Then it is free to abort If subtransaction does no updates, then no changes, it is a reader

2PC with Presumed Abort cont When coordinator aborts a transaction it can undo T so default is to abort No acknowledgement needed after abort message All short log records can be appended to the log tail If a sub-transaction does no updates, it responds by saying it s a reader so no log record If coordinator receives a reader it treats it as yes If all subtransactions are readers, second phase is not required

Three phase commit A third phase introduced to avoid blocking Three phases are : Phase 1: Voting – Coordinator sends a prepare message and receives yes vote from all Phase 2: Precommit – Coordinator sends a precommit/abort message to all participants, most respond with ack Phase 3 : Termination – when sufficient number of messages have been received, Coordinator force-writes a commit log record and then sends a commit message to all

Advantages of 3 PC The Coordinator postpones decision till sufficient number of sites know about If Coordinator fails, participants can communicate with each other and decide to commit/abort Due to precommit phase, transaction is not blocked

Mobile Databases

Mobile Databases

Mobile Database Environment A Corporate database server and DBMS Managing corporate data and providing applications A remote database and DBMS Storing mobile data and providing applications A mobile database platform i.e. laptop or PDA Two-way communication link between mobile and corporate database

Case study – Distribution and Replication in Oracle

Oracle’s Distributed Functionality Connectivity Global database names Database links Referential integrity Heterogeneous distributed database Distributed query optimization

Oracle’s Replication Functionality Oracle supports synchronous and asynchronous replication through Oracle advanced replication There is a Master site and multiple slave sites and Master can replicate changes to slave sites Oracle supports 4 types of replication Read-only snapshots Updatable snapshots Multimaster replication Procedural replication

Summary Distributed DBMS architectures Data storage in a distributed DBMS Distributed catalog management Distributed query processing Distributed transactions Distributed concurrency control Distributed database recovery Mobile databases