DDBMS Distributed Database Management Systems Fragmentation

Slides:



Advertisements
Similar presentations
Database Systems: Design, Implementation, and Management
Advertisements

Basis for Distributed Database Technology
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Distributed Database Systems Dr. Mohamed Osman Hegazi.
Transaction.
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
ABCSG - Distributed Database 1 Data Management Distributed Database Data Replication.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Overview Distributed vs. decentralized Why distributed databases
Distributed Databases
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Reference Book Principles of Distributed Database System Chapters 4. Distributed DBMS Architecture 5. Distributed Database Design 7.5 Layers of Query Processing.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Definition of terms Definition of terms Explain business conditions driving distributed databases Explain business conditions driving distributed databases.
Distributed Databases
Outline Introduction Background Distributed Database Design
Distributed databases
Distributed Databases
DISTRIBUTED DBMS ARCHITECTURE
Distributed Databases and DBMSs: Concepts and Design
Distributed Database The University of California Berkeley Extension Copyright © 2011 Patrick McDermott.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
Database Design – Lecture 16
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Distributed Database Systems Overview
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Copyright © 2004 Pearson Education, Inc. Slide 2-1 Data Models Data Model: A set.
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CS338Parallel and Distributed Databases11-1 Parallel and Distributed Databases Lecture Topics Multi-CPU and distributed systems Monolithic system Client–server.
CS742 – Distributed & Parallel DBMSM. Tamer Özsu Page 1.1 Outline Introduction & architectural issues What is a distributed DBMS Problems Current state-of-affairs.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Chapter 24 Distributed DBMSs – Concepts and Design Pearson Education © 2014.
Distributed DBMS Architecture Chapter 4 Principles Of Distributed Database Systems,2/e By Ozsu, Patrick Valduriez.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Distributed DBMSs – Concepts and Design Chapter 24 in Textbook.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
Distributed Database Concepts
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Distributed Databases and DBMSs: Concepts and Design
Distributed Database Management Systems
Database Architecture
Course Instructor: Supriya Gupta Asstt. Prof
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

DDBMS Distributed Database Management Systems Fragmentation Data Allocation

Distributed Database Distributed Database A collection of multiple, logically interrelate databases, distributed over a computer network Distributed Database Management System A software system that supports the transparent creation, access, and manipulation of interrelated data located at different sites of a computer network.

Centralized vs Distributed Differences between the two approaches: Centralized: achieves efficiency through local optimization by using complex physical data structures Distributed: achieves efficiency through global optimization of processing including cost of network communication and local processing.

DDBMS Benefits Transparency Separation of high-level semantics from low-level implementation issues Extension of the data Independence concept in centralized databases Basic Concepts Fragmentation Allocation replication

Transparency Language Transparency Fragmentation Transparency Replication Transparency Network Transparency Data Independence Data

Transparency Network Replication Protect the user from the operational details of the network Users do not have to specify where the data is located location transparency – naming transparency Replication Replicas (copies) are created for performance and reliability reasons Replication causes difficult update problems Users should be made unaware of the existence of these copies

Transparency Fragmentation Basic fragments are parts of a relation vertical: subset of columns – horizontal: subset of rows Fragments are also created for performance and reliability reasons Users should be made unaware of the existence of fragments

DDBMS Benefits PERFORMANCE Greater throughput due to: Data Localization Reduces communication overhead Parallelism Inter-query and intra-query parallelism

DDBMS ARCHITECTURAL ALTERNATIVES Autonomy: degree to which each DBMS can operate independently (distribution of control, not data) Distribution: describes where the data is located physically Heterogeneity: indicates uniformity of the DBMSs with respect to data

DDBMS ARCHITECTURAL ALTERNATIVES Distribution Autonomy Heterogeneity

DDBMS ARCHITECTURAL ALTERNATIVES Heterogeneity (H) H0:Homogeneous H1: Heterogeneous 3*3*2 = 18 Alternatives Some alternatives are meaningless or not practical! Autonomy (A) A0: Tight integration A1: Semi-autonomous A2: Total isolation Distribution (D) D0: Non-distributed D1: Client Server D2: Peer-to-peer

Major DDBMS Architecturea Client-server Peer-to-peer Multidatabase

CLIENT SERVER (Ax, D1, Hy) Distribute the functionality between client and server to better manage the complexity of the DBMS Two-level architecture

CLIENT SERVER Typical Scenario 1. Client parses a query, decomposes it into independent site queries, and sends it to an appropriate server. 2. Each server processes a local query and sends the result relation to the client. 3. The client combines the results of all the sub-queries.

PEER-TO-PEER (A0, D2, H0) Global users submit requests via an external schema (ES) defined over a global conceptual schema (GCS), which is system maintained, and that is the union of all local conceptual schemas (LCSs) The global system has no control over local data and processing.

MULTI-DATABASE (A2, Dx, Hy) The global conceptual schema (GCS) exists as a union of some local conceptual schemas (LCSs) only or does not exist.

Three orthogonal dimensions for data distribution level of sharing access pattern behaviour level of knowledge (of access pattern behaviour)

Partial information Complete information No sharing ACCESS PATTERN BEHAVIOUR Dynamic Static Partial information Complete information No sharing LEVEL OF KNOWLEDGE Data Data+program Level of Sharing

The three orthogonal dimensions . Level of sharing no sharing: each application and its data execute at one site data sharing: programs replicated at each site; data not replicated, but moved to a site as needed data+program sharing: both data and programs may be moved to a site as needed

The three orthogonal dimensions Access pattern behaviour static: access patterns to data do not change over time dynamic: access patterns to data change over time . how dynamic? Level of knowledge (of access pattern behaviour) no knowledge: not a likely scenario partial knowledge: can predict, but users may deviate significantly complete knowledge: can predict and no major changes

DISTRIBUTED DATABASE DESIGN TOP-DOWN Approach Have a database How to partition and /or replicate it across sites BOTTOM_UP Approach Have existing databases at different sites How to integrate then together and deal with heterogeneity and autonomy?

DISTRIBUTED DATABASE DESIGN Top-Down Typical approach to database design information regarding distribution of accesses among sites nature of access to database at each site needs to be gathered during Requirements Analysis Design local conceptual schemas by distributing the entities over the sites of the distributed system after conceptual schema design Distribution activity consists of: data fragmentation . split up schema into pieces data allocation . assign schema pieces to sites

DISTRIBUTED DATABASE DESIGN . Bottom-Up Approach Necessary when databases already exist and we need to integrate them into one database Similar to view integration, but may be heterogeneous

Top-Down Design Requirement Analysis System Requirement Conceptual Design View Design Global Conceptual Schema External Schema Definition Access Information Distribution Design

Top-Down Design(cont’d) Distribution Design User Input Logical Conceptual Schema Physical Design Physical Schema Observation and Monitoring

Reasons for Fragmentation Relation is not an appropriate unit of distribution Application views are usually subsets of relations Application access is local to subsets of relations Applications at different sites may require different parts of the same relation Store once . high remote access costs Duplicate . high update costs Concurrency of access is more limited

Reasons for Fragmentation BUT, some applications may suffer from data fragmentation Their required data is located in two or more fragments It may be harder to check integrity constraints Careful design of data fragmentation is required!

Correctness Rule for Fragmentation Completeness If a relation R is decomposed into fragments R1, R2, …, Rn, each tuple/attribute that can be found in R can also be found in one or more of the Ri’s Reconstruction If a relation R is decomposed into fragments R1, R2, …, Rn, it should be possible to define a relational operator  such that R =  Ri  Ri  FR Disjointness If a relation R is horizontally decomposed into fragments R1, R2, …,Rn and data item di is in Rj, it is not in any other fragment Rk (kj)

Types of Fragmentation Horizontal Vertical Hybrid

HORIZONTAL DATA FRAGMENTATION What is it? Partitions a relation along its tuples so that each fragment has a subset of the tuples the relation. Types of horizontal fragmentation Primary: based on a predicate Pi that selects tuples from a relation R Derived: based on the partitioning of a relation due to predicates defined on another relation related according to foreign keys

HORIZONTAL DATA FRAGMENTATION: Primary Each fragment, Ri, is a selection on a relation R using a predicate P

HORIZONTAL DATA FRAGMENTATION: Derived

HORIZONTAL DATA FRAGMENTATION: Derived

HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS Database Information Relations in the database and relationships between them specially with joins owner member Project Works On

HORIZONTAL FRAGMENTATION: INFORMATION REQUIREMENTS Application Information User query predicates . examine most important applications (80/20 rule) simple: p1: DEPT = ‘CSE’; p2 : SAL > 3000 conjunctive: minterm predicate: m1= p1AND p2 (e.g., (DEPT=‘CSEE’) AND (SAL>3000)) minterm selectivity: number of tuples returned against a given minterm access frequency: access frequency of user queries

Predicates & Minterms Aspects of simple predicates: Completeness: A set of simple predicate is said to be complete iff there is an equal probability of access by every application to any tuple belonging to any minterm fragment that is defined according to the set If the only application that accesses Project wants to access depts, it is complete.

Predicates & Minterms Minimality: If a predicate influences how fragmentation is performed (causes a fragment to be further broken in fi and fj), there should be at least one application that accesses fi and fj differently.

VERTICAL FRAGMENTATION produces fragments R1, R2, R3, …,Rn of a relation R each fragment contains a subset of R’s attributes as well as the primary key of R divides relations “vertically” by columns (attributes) the objective is to obtain fragments so that applications only need to access one fragment want to minimize execution time of applications inherently more complicated than horizontal data fragmentation due to the total number of alternatives available

VERTICAL FRAGMENTATION

VERTICAL FRAGMENTATION

VERTICAL FRAGMENTATION: INFORMATION REQUIREMENTS

HYBRID FRAGMENTATION

HYBRID FRAGMENTATION

ALLOCATION Find an “optimal” distribution of F to S Given: a set of fragments F = {F1, F2, …, Fn} a set of sites S = {S1, S2, …, Sm} a set of transactions T = {T1, T2, …, Tp} Find an “optimal” distribution of F to S

ALLOCATION 1. Minimize cost 2. Maximize performance Cost of storing each Fi at Sj Cost of querying Fi at Sj Cost of updating Fi at all sites where it is stored Cost of data communication 2. Maximize performance Minimize response time at each site Maximize system throughput at each site This problem is NP-hard!