Parallel and Distributed Databases CS263 Lecture 16.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Database Architectures and the Web
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
1 Chapter 2 Database Environment Transparencies © Pearson Education Limited 1995, 2005.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
Data Storage and Data Processing Architectures The difficulty is in the choice George Moore, 1900.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Database System Architectures  Client-server Database System  Parallel Database System  Distributed Database System Wei Jiang.
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
Distributed Databases Dr. Lee By Alex Genadinik. Distributed Databases? What is that!?? Distributed Database - a collection of multiple logically interrelated.
Database Architectures and the Web
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Data Storage and Data Processing Architectures The difficulty is in the choice George Moore, 1900.
Database Architectures and the Web Session 5
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Lecture On Database Analysis and Design By- Jesmin Akhter Lecturer, IIT, Jahangirnagar University.
Lecture 5: Sun: 1/5/ Distributed Algorithms - Distributed Databases Lecturer/ Kawther Abas CS- 492 : Distributed system &
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Multiprossesors Systems.. What are Distributed Databases ? “ A Logically interrelated collection of shared data ( and a description of this data) physically.
The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Chapter 12 Distributed Database Management Systems.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Distributed DB CSE2132 Database Systems Week 12 Lecture Distributed Database.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Chapter 1 Database Access from Client Applications.
Chapter 2 Database Environment.
Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Em Spatiotemporal Database Laboratory Pusan National University File Processing : Database Management System Architecture 2004, Spring Pusan National University.
IT 5433 LM1. Learning Objectives Understand key terms in database Explain file processing systems List parts of a database environment Explain types of.
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Chapter 19: Distributed Databases
Introduction to Databases Transparencies
Database System Architectures
Presentation transcript:

Parallel and Distributed Databases CS263 Lecture 16

LECTURE PLAN  Parallel DBMS - What and Why?  What is a Client/Server DBMS?  Why do we need Distributed DBMSs?  Date’s rules for a Distributed DBMS  Benefits of a Distributed DBMS  Issues associated with a Distributed DBMS  Disadvantages of a Distributed DBMS

PARALLEL DATABASE SYSTEM

PARALLEL DBMSs WHY DO WE NEED THEM? More and More Data! We have databases that hold a high amount of data, in the order of bytes: 10,000,000,000,000 bytes! Faster and Faster Access! We have data applications that need to process data at very high speeds: 10,000s transactions per second! SINGLE-PROCESSOR DBMS AREN’T UP TO THE JOB!

Improves Response Time. INTERQUERY PARALLELISM It is possible to process a number of transactions in parallel with each other. Improves Throughput. INTRAQUERY PARALLELISM It is possible to process ‘sub-tasks’ of a transaction in parallel with each other. PARALLEL DBMSs BENEFITS OF A PARALLEL DBMS

Speed-Up. As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor: 10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs PARALLEL DBMSs HOW TO MEASURE THE BENEFITS Scale-up. As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor. 1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

Sub-linear speed-up Linear speed-up (ideal) Number of CPUs Number of transactions/second 1000/Sec 5 CPUs 2000/Sec 10 CPUs 16 CPUs 1600/Sec PARALLEL DBMSs SPEED-UP

10 CPUs 2 GB Database Number of CPUs, Database size Number of transactions/second Linear scale-up (ideal) Sub-linear scale-up 1000/Sec 5 CPUs 1 GB Database 900/Sec PARALLEL DBMSs SCALE-UP

MEMORY CPU Shared Memory – Parallel Database Architecture

CPU Shared Disk – Parallel Database Architecture MMMMMM

Shared Nothing – Parallel Database Architecture CPU M M M M M

MAINFRAME DATABASE SYSTEM

DUMB SPECIALISED NETWORK CONNECTION TERMINALS MAINFRAME COMPUTER PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC

CLIENT/SERVER DATABASE SYSTEM

CLIENT/SERVER DBMS  Manages user interface  Accepts user data  Processes application/business logic  Generates database requests (SQL)  Transmits database requests to server  Receives results from server  Formats results according to application logic  Present results to the user CLIENT PROCESS

CLIENT/SERVER DBMS  Accepts database requests  Processes database requests  Performs integrity checks  Handles concurrent access  Optimises queries  Performs security checks  Enacts recovery routines  Transmits result of database request to client SERVER PROCESS

   Data Request  Data Response   CLIENT/SERVER DBMS ARCHITECTURE CLIENT #1 CLIENT #2 CLIENT #3 PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (FAT CLIENT) D/BASE SERVER  

D/BASE SERVER      Data Request  Data Response   CLIENT/SERVER DBMS ARCHITECTURE CLIENT #1 CLIENT #2 CLIENT #3 PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC (THIN CLIENT) PL/SQL

LAN CLIENT LAN CLIENT LAN CLIENT LAN CLIENT Leyton CLIENT Stratford DBMS WIDE AREA NETWORK Barking Leytonstone DISTRIBUTED PROCESSING ARCHITECTURE CLIENT

DISTRIBUTED DATABASE SYSTEM

 A distributed database system is a collection of logically related databases that co-operate in a transparent manner.  Transparent implies that each user within the system may access all of the data within all of the databases as if they were a single database  There should be ‘location independence’ i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user. DISTRIBUTED DATABASES WHAT IS A DISTRIBUTED DATABASE?

WIDE AREA NETWORK LAN CLIENT DBMS DISTRIBUTED DATABASE ARCHITECTURE LAN CLIENT DBMS Leytonstone CLIENT DBMS Stratford CLIENT DBMS Barking CLIENT Leyton

D/BASE SERVER #1 CLIENT #1 D/BASE SERVER #2 CLIENT #2 CLIENT #3 M:N CLIENT/SERVER DBMS ARCHITECTURE NOT TRANSPARENT!

DB Computer Network Site 2 Site 1 GSC DDBMS DC LDBMS GSC DDBMS DC LDBMS = Local DBMS DC = Data Communications GSC = Global Systems Catalog DDBMS = Distributed DBMS COMPONENTS OF A DDBMS

Reduced Communication Overhead Most data access is local, less expensive and performs better. Improved Processing Power Instead of one server handling the full database, we now have a collection of machines handling the same database. Removal of Reliance on a Central Site If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available. DISTRIBUTED DATABASES ADVANTAGES

Expandability It is easier to accommodate increasing the size of the global (logical) database. Local autonomy The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data. DISTRIBUTED DATABASES ADVANTAGES

A distributed system looks exactly like a non-distributed system to the user! 1. Local autonomy 2. No reliance on a central site 3. Continuous operation 4. Location independence 5. Fragmentation independence 6. Replication independence 7. Distributed query independence 8. Distributed transaction processing 9. Hardware independence 10. Operating system independence 11. Network independence 12. Database independence DISTRIBUTED DATABASES DATE’S TWELVE RULES FOR A DDBMS

 Data Allocation  Data Fragmentation  Distributed Catalogue Management  Distributed Transactions  Distributed Queries – (see chapter 20) DISTRIBUTED DATABASES ISSUES

1. Locality of reference Is the data near to the sites that need it? 2. Reliability and availability Does the strategy improve fault tolerance and accessibility? 3. Performance Does the strategy result in bottlenecks or under-utilisation of resources? 4. Storage costs How does the strategy effect the availability and cost of data storage? 5. Communication costs How much network traffic will result from the strategy? DISTRIBUTED DATABASES DATA ALLOCATION METRICS

CENTRALISED DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Lowest Unsatisfactory Highest

PARTITIONED/FRAGMENTED DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Low (item) – High (system) Lowest Satisfactory Low

COMPLETE REPLICATION DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs Highest High High (update) – Low (read)

SELECTIVE REPLICATION DISTRIBUTED DATABASES DATA ALLOCATION STRATEGIES Locality of Reference Reliability/Availability Storage Costs Performance Communication Costs High Average Satisfactory Low Low (item) – High (system)

 Usage Applications are usually interested in ‘views’ not whole relations.  Efficiency It’s more efficient if data is close to where it is frequently used.  Parallelism It is possible to run several ‘sub-queries’ in tandem.  Security Data not required by local applications is not stored at the local site. DISTRIBUTED DATABASES WHY FRAGMENT DATA?

DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION STRATFORDKHAN BARKINGONO BARKINGGREEN STRATFORDSMITH BARKINGGRAY STRATFORDJONES200 BALANCE BRANCHCUSTOMERACCOUNT Horizontal Fragmentation: Consists of a Restriction on a Relation. e.g., (  branch = ‘Stratford’ Account)

DISTRIBUTED DATABASES HORIZONTAL DATA FRAGMENTATION STRATFORD KHAN SMITH JONES200 BALANCE BRANCH CUSTOMER ACCT NO. BARKING ONO GREEN GRAY324 BALANCE BRANCH CUSTOMER ACCT NO. STRATFORD BRANCH BARKING BRANCH

DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION KJTR78KHA456T STRATFORDKHAN456 ZZEE56GRA324S BARKINGGRAY324 XXYY22JON200T STRATFORDJONES200 PASSWORDLOGINPHONE NOSITENAMES# Vertical Fragmentation: Consists of a Projection on a Relation. e.g., (  S#, NAME, SITE, PHONE NO Student)

DISTRIBUTED DATABASES VERTICAL DATA FRAGMENTATION STRATFORD BARKING STRATFORD KHAN456 GRAY JONES200 PHONE NO. SITE NAME S# KJTR78 ZZEE56 XXYY22 KHA456T456 GRA324S324 JON200T200 PASSWORD LOGIN-ID S# STUDENT ADMINISTRATION NETWORK ADMINISTRATION

DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT Centralised Global Catalog One site maintains the full global catalog. All changes to any local system catalog have to be propagated to the site maintaining the global catalog. Bad performance, single point of failure, compromises site autonomy. Dispersed Catalog There is no physical global catalog. Each time a remote data item is required, the catalogues from ALL other sites are examined for the item. This has severe performance penalties.

DISTRIBUTED DATABASES DISTRIBUTED CATALOG MANAGEMENT Replicated Global Catalog Each site maintains its own global catalog. Although this greatly speeds up remote data location, it is very inefficient to maintain. A detail of every data item added, changed or deleted locally has to be propagated to ALL other sites. Local-Master Catalog Each site maintains both its local system catalog as well as a catalog of all of its data items that are replicated at other sites. This avoids compromising site autonomy, is fairly efficient, and is not a single point of failure.

ATOMIC DISTRIBUTED TRANSACTION DISTRIBUTED DATABASES DISTRIBUTED TRANSACTIONS Stratford DB Barking DB Leyton DB Stratford DBMS Stratford Client Stratford Client Stratford Client Barking DBMS Leyton DBMS Global Transaction (a) Debit Stratford A/C £500 (b) Credit Barking A/C £350 (c) Credit Leyton A/C £150 (a) (b) (c)

TWO-PHASE COMMIT (2PC) - OK

TWO-PHASE COMMIT (2PC) - ABORT ‘Global Abort’

 Architectural complexity.  Cost.  Security.  Integrity control more difficult.  Lack of standards.  Lack of experience.  Database design more complex. DISTRIBUTED DATABASES DISADVANTAGES OF DDBMSs