Distributed DB 12. 1 CSE2132 Database Systems Week 12 Lecture Distributed Database.

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Transaction.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Chapter 13 (Web): Distributed Databases
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed Databases Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised.
Distributed Database Management Systems
Chapter 9 : Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
1 © Prentice Hall, 2002 Chapter 13: Distributed Databases Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden.
Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Parallel and Distributed Databases CS263 Lecture 16.
DISTRIBUTED DATABASE MANAGEMENT SYSTEM CHAPTER 07.
Distributed Systems: Client/Server Computing
Distributed databases
DATABASE MANAGEMENT SYSTEMS 2 ANGELITO I. CUNANAN JR.
Distributed Databases
Distributed Databases and DBMSs: Concepts and Design
System Architecture & Hardware Configurations Dr. D. Bilal IS 592 Spring 2005.
ITEC 3220A Using and Designing Database Systems
Client/Server Databases and the Oracle 10g Relational Database
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
12 1 Chapter 12 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Session-9 Data Management for Decision Support
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Database Systems: Design, Implementation, and Management Tenth Edition
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Chapter 10 Distributed Database Management System
The Client/Server Database Environment Ployphan Sornsuwit KPRU Ref.
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
The Evolution of Distributed DBMS 4Social and Technical Changes in the 1980’s u Business operations became more decentralized geographically. u Competition.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Distributed Databases
Chapter 12 Distributed Database Management Systems.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Chapter 10 Distributed Database Management System
Distributed Database System
Distributed database system
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
MBA 664 Database Management Systems Dave Salisbury ( )
Chapter 12 Distributed Data Bases. Learning Objectives What a distributed database management system (DDBMS) is and what its components are How database.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
DISTRIBUTED DATABASES AND DDBMS. Learning Objectives  Describe various DDBMS implementations  Explain how database design affects the DDBMS environment.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
System Architecture & Hardware Configurations
Chapter 12 Distributed Database Management Systems
System Architecture & Hardware Configurations
Distributed Databases
Introduction of Week 14 Return assignment 12-1
Presentation transcript:

Distributed DB CSE2132 Database Systems Week 12 Lecture Distributed Database

Distributed DB Evolution of Distributed DBMS  CENTRALISED systems - all system components located on a single computer data DBMS secondary storage (disks etc) –Access via serially connected 'dumb terminals' - all processing occurs at central site  1980s changes: –business operations decentralised geographically, need to become 'lean-and-meaner, quick reacting, dispersed operations' –technological change low cost powerful computing platforms  led to DISTRIBUTED SYSTEMS

Distributed DB Distributed Data Base Management Systems  system components distributed over multiple sites, interconnected via communication system (network)  managed by Distributed Data Base Management System (DDBMS)  DDBMS advantages: –data located near the greatest demand site –faster data access - desired data subset locally available –faster data processing - system processing load spread out over multiple cpu's –growth facilitation - easy to add new sites to network –less danger of single point failure  DDBMS disadvantages: –complexity of management and control –security - weaker due to distribution (thus more people involved) and network traffic –lack of standards - many communication protocols exist eg. tcp/ip netbios DECnet etc.

Distributed DB Distributed Processing  database's logical processing shared among multiple cpu's  actual database resides on a single computer

Distributed DB Distributed Database  actual database stored over two or more independent cpu's - each part know as a database fragment

Distributed DB Distribution Options  Two important components of a DDBMS are –Transaction Processor (TP) - receives and processes the applications data request –Data Processor (DP) - stores and retrieves data located at the site  Data And Processing Distribution Options: –Single site processing, single site data all processing done on single cpu (host computer) all data stored on host computers local disk/s eg. traditional mainframe / minicomputer DBMS with dumb terminals or single user microcomputer DBMS

Distributed DB Distribution Options –Multi site processing, single site data multiple processes run on different computers all data stored on single computers local disk/s e.g. LAN file server TP acts as redirector - routes all network data requests to file server file server appears to end user as a hard disk eg. F: all data selection, search and update occur at workstation entire file must be transported across network inefficient, costly (communication)

Distributed DB Distribution Options (1)  Client-server is a term used very loosely - difficult to clearly define Client-server, similar to file server except database processing occurs at the server site - know as a database server. Client server is about a split in processing rather than a split in data. A 3 tier architecture is often employed. This requires communications middleware to resolve translation issues by the use of an agreed protocol(e.g. TCP/IP, IPX/SPX or NetBEUI). Client-server approaches overlap and and are used in conjunction with distributed database.  Multiple site processing, multiple site data This describes a fully distributed DBMS. They can then be classified as homogeneous or heterogeneous.

Distributed DB Distribution Options (2) homogeneous DDBMS The same DBMS running at each site. While there is much complexity to deal with the task of accessing data over many sites is simpler than in the heterogeneous case. heterogeneous DDBMS Will support different DBMSs and even different DBMS models (relational, network, hierarchical) at each site. Such implementations of a DDBMS operate under certain restrictions. (e.g. the access to other databasesis read only or only other relational databases can be accessed.)

Distributed DB Dates Distributed Database Rules DISTRIBUTED DATABASE SHOULD LOOK EXACTLY LIKE A NON DISTRIBUTED 'LOCAL' DATABASE LOCAL ACCESS TO LOCAL DATA Many vendors claim to be delivering distributed database applications - need criteria: DATES 12 RULES –0 - TO USER A DISTRIBUTED DATABASE SHOULD LOOK NO DIFFERENT TO A NON DDB - Distribution transparency, database treated as single logical database –1 - LOCAL AUTONOMY, local data owned and managed locally –2 - NO RELIANCE ON CENTRAL SITE –3 - CONTINUOUS OPERATION, failure transparency - site independence, even in the event of a node failure the system continues to operate –4 - LOCATION INDEPENDENCE, location transparency –5 - FRAGMENTATION INDEPENDENCE, fragmentation transparency

Distributed DB Dates rules continued –6 - REPLICATION INDEPENDENCE –7 - DISTRIBUTED QUERY PROCESSING, multiple site queries –8 - DISTRIBUTED TRANSACTION MANAGEMENT, transaction transparency - multiple site updates –9 - HARDWARE INDEPENDENCE –10 - OPERATING SYSTEM INDEPENDENCE –11 - COMMUNICATION NETWORK INDEPENDENCE –12 - DATABASE INDEPENDENCE

Distributed DB Levels of Distribution Transparency  Can be used as a method of classification by determining level of transparency supported by DDBMS at highest level: –FRAGMENTATION transparency No need to specify fragment names or locations select * from employee where dob < 01/01/40 –LOCATION transparency Specify fragment names but not locations select * from e1 where dob < 01/01/40; UNION select * from e2 where dob < 01/01/40; UNION select * from e3 where dob < 01/01/40;

Distributed DB Levels of Distribution Transparency –local MAPPING transparency (lowest level) Need to specify both fragment and location (using pseudo-SQL) select * from e1 node melbourne where dob < 01/01/40; UNION select * from e2 node sydney where dob < 01/01/40; UNION select * from e3 node adelaide where dob < 01/01/40;

Distributed DB DDBMS Operations  Join operation (most vendors supply) –Easier to achieve –Query optimisation critical  Update operation –eg debit / credit of two accounts at different sites –more difficult to manage –Need sophisticated transaction management - most popular strategy : TWO PHASE COMMIT  Two Phase Commit requires three operations: –DO - performs operation & records before and after image in transaction log –UNDO - undoes an operation using log entries created in DO –REDO - redoes an operation using log entries created in DO

Distributed DB Two Phase Commit  Site originating transaction (coordinator) sends request to sites (subordinates), each site processes sub transaction but does not commit.  Phase 1 - Preparation: –1. coordinator sends prepare to commit to all subordinates –2. subordinates receive message, write log entries and reply to coordinator - READY to COMMIT or NOT READY –3. coordinator checks all nodes ready to commit - if not broadcasts an ABORT, if all ready:  Phase 2 - Final COMMIT –1. coordinator broadcasts a commit message to all subordinates and awaits a reply –2. subordinate receives commit and updates database –3. subordinates reply with COMMITTED or NOT COMMITTED to the coordinator –If any subordinates did not commit, coordinator sends ABORT forcing an UNDO

Distributed DB DISTRIBUTION STRATEGIES The overall data model - the company view  Distribution principles –Examine geography / frequency of access –Guiding principle -minimise network traffic and communication costs

Distributed DB  The sales team –Situated in Sydney –Need most access to customer/order/order-line/product  The supply branch –Situated in Melbourne –Need most access to warehouse/inventory/product  Data model partitioning What do we do with RELATIONS at the BOUNDARY ? FRAGMENTATION vs REPLICATION

Distributed DB  Possible starting points: –Which site accesses it most? Storage at a single site Minimises update complexity –Is there a case for fragmentation? Minimises local access time Minimises network traffic –Is there a case for replication? Minimises local access time Minimises network traffic  Fragmentation –Horizontal fragmentation based on SELECTION, eg fragment customer table on customer_city

Distributed DB –Vertical fragmentation Based on projection, eg fragment product table on attributes needed by each group SALES : P_CODE, DESC, UNIT_PRICE SUPPLY : P_CODE, PACK_SIZE –Hybrid Fragmentation Assume 3 warehouses Footscray - p_code < 100 Collingwood - p_code = 100 Dandenong - p_code > 100 further subdivide vertically supply fragment SELECT P_CODE FOOTSCRAY SELECT P_CODE = > COLLINGWOOD SELECT P_CODE > > DANDENONG

Distributed DB Replication  Efficient retrievals vs Multi site updating –Costly, accident prone Updating techniques –Conservative –Don't commit until all sites accept –Primary node, one site accept updates and broadcast –Majority voting –Snapshots, etc

Distributed DB Query Optimization  Query - list the supplier numbers for cleveland suppliers of red parts ? –SUPPLIER(SUPPLIER#, CITY) 10,000 DETROIT –PART(PART#, COLOUR) 100,000 CHICAGO –SHIPMENT(SUPPLIER#, PART#) 1,000,000 DETROIT SELECT S.SUPPLIER# FROM SUPPLIER S, PART P, SHIPMENT H WHERE S.SUPPLIER# = H.SUPPLIER# AND H.PART# = P.PART# AND P.COLOUR = 'RED';  Time for query varies from 1 second to 2.3 days depending on the Query Plan selected