Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.

Slides:



Advertisements
Similar presentations
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Advertisements

Database Systems: Design, Implementation, and Management
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
Chapter 12 (Online): Distributed Databases
Functions of Database Management Systems Data storage retrieval and update facilities A user-accessible catalogue or data dictionary Support for shared.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Transaction Management and Concurrency Control
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Chapter 12 Distributed Database Management Systems
Chapter 13 (Web): Distributed Databases
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Chapter 1 Introduction to Databases
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Database Design and Introduction to SQL
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
IMS 4212: Distributed Databases 1 Dr. Lawrence West, Management Dept., University of Central Florida Distributed Databases Business needs.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Multi-user Database Processing Architectures Architectures Transactions Transactions Security Security Administration Administration.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
1 Chapter 13: Distributed Databases. Chapter 13 2 Definitions Distributed Database: A single logical database that is spread physically across computers.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Distributed Database Systems Overview
Unit 9 Transaction Processing. Key Concepts Distributed databases and DDBMS Distributed database advantages. Distributed database disadvantages Using.
G063 - Distributed Databases. Learning Objectives: By the end of this topic you should be able to: explain how databases may be stored in more than one.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Distributed Databases
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
1 Database Management Systems (DBMS). 2 Database Management Systems (DBMS) n Overview of: ä Database Management Components ä Database Systems Architecture.
Distributed DB CSE2132 Database Systems Week 12 Lecture Distributed Database.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Distributed Database: Part 2. Distributed DBMS Distributed database requires distributed DBMS Distributed database requires distributed DBMS Functions.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall 1 Chapter 12 (Online): Distributed Databases Modern Database Management 10 th Edition Jeffrey.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Chapter 1 Database Access from Client Applications.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Database Administration: The Complete Guide to Practices and Procedures Chapter 19 Data Movement and Distribution.
© 2009 Pearson Education, Inc. Publishing as Prentice Hall 1 Lecture 11 Distributed Databases Modern Database Management 9 th Edition Jeffrey A. Hoffer,
Distributed Databases
Distributed Databases
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
G063 - Distributed Databases
Distributed Databases
A View over Distributed databases
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised DBs located in different places, developed for the specific information needs of each site Aim: to integrate these decentralised DBs into a coherent DDB

Advantages of Distributed DBs: Increased reliability of systems and availability of data Local control preserved Modular growth possible at each site and at new sites Optimised communication costs Faster response times

Control in normal DBs transaction control: ability of the DBMS to ensure the successful completion of transactions –commit transactions –roll-back to previous state concurrency control: ability of the DBMS to arbitrate between concurrent uses of data: –simultaneous access –simultaneous update –deletion

Control in Distributed DBs Different portions of the overall database reside at different locations these portions are controlled by different processors running sometimes different DBMSs common schema means queries can involve any portion of the DB residing at any location

Options for Distributed DBs Issue of physical design (data structure) performance of the DB (response time...) depends upon good design There are a number of options: –data replication –horizontal partitioning –vertical partitioning –combinations of the above

Data replication store a separate copy of the full tables in each location if a copy is stored at every site: Full Replication Advantages: –reliability –fast response Disadvantages –storage requirements –complexity and cost of updating

Horizontal partitioning some of the rows of the tables are stored in one location; others are stored at other locations eg: customers banking out of a particular branch Advantages: –efficiency –local optimisation –security Disadvantages: –inconsistent speed access –backup vulnerability

Vertical partitioning some columns are projected into base relationship at different sites all relations share a common domain so the full table can be reconstructed Advantages: –tailor-made support for functional areas –same as horizontal partitioning Disadvantages: –some queries might be very slow –users must understand some design issues

Combinations of the three methods most of the time, companies will use different methods each method is efficient in certain situations + some other security requirements eg: local customers, information originating at a certain site, shared processes that require the same data at all sites it is a design issue to try to identify the optimal distribution - data at the sites where it is used most

Distributed DBMS additional roles to play in the case of a distributed DB determine the location where data to be retrieved is located translate the request into the language used by the local DBMS deal with normal data management functions, security matters, locking, query optimisation...

Heterogeneous Distributed DBMS a different DBMS running at each site a master DBMS controlling the interactions amongst the parts not practical today (compatibility) more often, each DBMS follows the same data architecture

Problems with global transactions DBMSs can be radically different - relational versus network only some state-of-the-art commercial products have translating capabilities one alternative solution is to put some essential data and the directory of the data locations on a central server Real distributed DBMS solve these problems for the users with the help of the NOS

Commit Protocol to ensure the integrity of the data in update operations well defined procedure based on the exchange of messages (“ok” or “not ok”) each global transaction can either be complete (and completed) or aborted Two-phase commit: –site originating the transaction sends requests to all sites involved in the update –all sites attempt to process their part of the transaction without committing the data (temp files) –they notify the first site whether OK or not –the first site collects all OKs and sends order to commit the data

Timestamping Alternative to locking (possibility of deadlocks) ensures that transactions are processed in serial order so locking in not needed All updated records carry the timestamp of the transactions that modified them if new transaction attempts to update a record with an earlier timestamp = OK If new transaction...with a later stamp, update access is denied, the transaction is re-stamped and is re-started

Updated record Example: 168 Record update: 170OK 170 Record Update: 165Denied Record Update: 170Transaction re-started (ie: do it again) 170 Record in a DB +++: costly deadlock situations are avoided ----: transactions may sometimes be restarted even though they did not conflict with previous ones.

Effect of design on speed how to design fast queries simple example with two sites in relational DB: –supplier (Supplier#,...,City): 10,000 records stored in Detroit –part (part#,.., colour): 100,000 records stored in Chicago –Shipment (supplier#,..., Part#): 1,000,000 records stored in Detroit –each record is 100 characters long + there are 10 red parts –data transmission is 10,000 character/second, 1 second delay in any communication –data processing negligible Write the SQL statement Imagine how the query can be carried out between the two sites

SQL statement select supplier.supplier# from supplier, part, shipment where supplier.city = ‘Cleveland’ and supplier.supplier# = shipment.supplier# and shipment.part# = part.part# and part.color = ‘Red’

Conclusions Reasonably easy to optimise query with two tables Very complex with more than two (try with 30!) Rules: Queries must be broken down into components isolated at different sites (minimise communication time and traffic) Determine which site has the potential to yield FEWER selected records Move preliminary results to site where rest of the work can be performed (ie: try to move as few records as possible)