PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

CS4432: Database Systems II
Enterprise Systems Distributed databases and systems - DT
Distributed Databases John Ortiz. Lecture 24Distributed Databases2  Distributed Database (DDB) is a collection of interrelated databases interconnected.
Distributed databases
Chapter 13 (Web): Distributed Databases
Advanced Database Systems September 2013 Dr. Fatemeh Ahmadi-Abkenari 1.
Distributed Databases: Review May 2003Yangjun Chen1 Distributed Databases System Architecture Distributed Database Design Semantic Data Control Distributed.
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Chapter 25 Distributed Databases and Client-Server Architectures Copyright © 2004 Pearson Education, Inc.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Overview Distributed vs. decentralized Why distributed databases
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Chapter 12 Distributed Database Management Systems
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Distributed Databases
Distributed databases
Distributed Databases
Distributed Database and Replication. Distributed Database A logically interrelated collection of shared data and a description of this data physically.
Distributed Databases and DBMSs: Concepts and Design
Database Design – Lecture 16
DISTRIBUTED DATABASES IN ADBMS Shilpa Seth
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
DISTRIBUTED DATABASE SYSTEM.  A distributed database system consists of loosely coupled sites that share no physical component  Database systems that.
DISTRIBUTED DATABASE DESIGN
Session-9 Data Management for Decision Support
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Session-8 Data Management for Decision Support
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
DDBMS Distributed Database Management Systems Fragmentation
ASMA AHMAD 28 TH APRIL, 2011 Database Systems Distributed Databases I.
1 Distributed Databases BUAD/American University Distributed Databases.
Databases Illuminated
Distributed Database. Introduction A major motivation behind the development of database systems is the desire to integrate the operational data of an.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Ing. Erick López Ch. M.R.I. Replicación Oracle. What is Replication  Replication is the process of copying and maintaining schema objects in multiple.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 ICS 214B: Transaction Processing and Distributed Data Management Lecture 9: Fragmentation and Distributed Query Processing Professor Chen Li.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
1 Distributed Databases architecture, fragmentation, allocation Lecture 1.
 Distributed Database Concepts  Parallel Vs Distributed Technology  Advantages  Additional Functions  Distribution Database Design  Data Fragmentation.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Topics in Distributed Databases Database System Implementation CSE 507 Some slides adapted from Navathe et. Al and Silberchatz et. Al.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
CMS Advanced Database and Client-Server Applications Distributed Databases slides by Martin Beer and Paul Crowther Connolly and Begg Chapter 22.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 2. 1M. Tamer Özsu Outline Introduction & architectural issues  Data distribution  Fragmentation  Data Allocation.
Distributed Databases and Client-Server Architectures
Practical Database Design and Tuning
Distributed Database Concepts
Distributed Database Management Systems
Chapter 19: Distributed Databases
Vertical Fragmentation
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University

Lecture 07 Distributed Database Design

Outline Distributed Database Design  Distributed Design Issues o Data Allocation Model Slide 3

A vertical fragmentation of a relation R produces fragments R 1, R 2,…. R r, each of which contains a subset of R’s attributes as well as the primary key of R. The objective of vertical fragmentation is to partition a relation into a set of smaller relations so that many of the user applications will run on only one fragment. In this context, an “optimal” fragmentation is one that produces a fragmentation scheme which minimizes the execution time of user applications that run on these fragments. Vertical Fragmentation Slide 4

More difficult than horizontal, because more alternatives exist. Example:  In horizontal partitioning, if the total number of simple predicates in Pr is n, there are 2 n possible minterm predicates that can be defined on it.  some of these will contradict the existing implications, further reducing the candidate fragments that need to be considered In the case of vertical partitioning if a relation has m non- primary key attributes, the number of possible fragments is equal to B(m), which is the mth Bell number. For large values of m;B(m)= approximately (m m )  for m=10, B(m) =115,000,  for m=15, B(m) =10 9,  for m=30, B(m) = Vertical Fragmentation Slide 5

Two types of heuristic approaches exist for the vertical fragmentation of global relations:  Grouping: starts by assigning each attribute to one fragment, and at each step, joins some of the fragments until some criteria is satisfied.  Grouping was first suggested for centralized databases [Hammer and Niamir, 1979], and was used later for distributed databases [Sacca and Wiederhold, 1985].  Splitting: starts with a relation and decides on beneficial partitionings based on the access behavior of applications to the attributes.  The technique was also first discussed for centralized database design [Hoffer and Severance,1975]. It was then extended to the distributed environment [Navathe et al.,1984]. Slide 6 Vertical Fragmentation

In most cases a simple horizontal or vertical fragmentation of a database schema will not be sufficient to satisfy the requirements of user applications. In this case a vertical fragmentation may be followed by a horizontal one, or vice versa, producing a tree structured Partitioning. Since the two types of partitioning strategies are applied one after the other, this alternative is called hybrid fragmentation. It has also been named mixed fragmentation or nested fragmentation. Slide 7 Hybrid Fragmentation

R HF R1R1 VF R 11 R 12 R 21 R 22 R 23 R2R2 It is also called mixed fragmentation or nested fragmentation. Slide 8

To reconstruct the original global relation in case of hybrid fragmentation, one starts at the leaves of the partitioning tree and moves upward by performing joins and unions. The fragmentation is complete if the intermediate and leaf fragments are complete. Similarly, disjointness is guaranteed if intermediate and leaf fragments are disjoint. Slide 9 Correctness of Hybrid Fragmentation

Allocation Allocation Problem Given F = {F 1, F 2, …, F n } fragments S ={S 1, S 2, …, S m } network sites on which a set of applications Q = {q 1, q 2,…, q q } is running. The allocation problem involves finding the “optimal” distribution of F to S. Optimality can be defined with respect to two measures:  Minimal cost o The cost function consists of the cost of storing each Fi at a site Sj, o the cost of querying Fi at site Sj, the cost of updating Fi at all sites where it is stored, o the cost of data communication.  Performance o minimize the response time. o maximize the system throughput at each site. Slide 10

General Form min(Total Cost) subject to response time constraint storage constraint processing constraint Decision Variable Allocation Model x ij  1 if fragment F i is stored at site S j 0 otherwise Slide 11

Total Cost Storage Cost (of fragment F j at S k ) We choose a different approach in our model of the database allocation problem (DAP) and specify it as consisting of the processing cost (PC) and the transmission cost (TC). Thus the query processing cost (QPC) for application qi is: processing component + transmission component Allocation Model (unit storage cost at S k )  (size of F j )  x jk query processing cost  all queries  cost of storing a fragment at a site all fragments  all sites  Slide 12

Allocation Model Query Processing Cost  Processing component PC, consists of three cost factors  the access cost (AC) + the integrity enforcement cost (IE) + the concurrency control cost (CC)  Access cost o The first two terms calculate the number of accesses of user query qi to fragment Fj. o We assume that the local costs of processing them are identical. o The summation gives the total number of accesses for all the fragments referenced by qi. Multiplication by LPC k gives the cost of this access at site S k. o We again use x ij to select only those cost values for the sites where fragments are stored.  Integrity enforcement and concurrency control costs o Can be similarly calculated (no. of update accesses+ no. of read accesses)  all fragments  all sites  x ij  local processing cost at a site Slide 13

Query Processing Cost Transmission component cost of processing updates + cost of processing retrievals  In update queries it is necessary to inform all the sites where replicas exist, while in retrieval queries, it is sufficient to access only one of the copies.  In addition, at the end of an update request, there is no data transmission back to the originating site other than a confirmation message, whereas the retrieval-only queries may result in significant data transmission.  Cost of updates  Retrieval Cost Allocation Model update message cost  all fragments  all sites  acknowledgment cost all fragments  all sites  min all sites all fragments  (cost of retrieval command  cost of sending back the result) Slide 14

Allocation Model Constraints  Response Time execution time of query ≤ max. allowable response time for that query  Storage Constraint (for a site)  Processing constraint (for a site) storage requirement of a fragment at that site  all fragments  storage capacity at that site processing load of a query at that site  all queries  processing capacity of that site Slide 15

Solved Problem Slide 16

Slide 17 Solved Problem

Sample Questions Lecture – 1: Q-1: What is the distributed Database? Q-2: What is not distributed Database? Q-3: Problems area and Disadvantages of Distributed Database System? Q-4: Distributed Database reality or Real view of Distributed DBMS? Q-5: Implicit assumptions, application and Promises of Distributed Database Management system? Q-6: Network Transparency, Replication Transparency, Data independence? Q-7: Network Transparency Location Transparency Naming Transparency (Slide 32, 33)? Q-8: Logical Data Independency, Physical Data Independency? Q-9: Fully Transparent Access (slide 28, Lec-1)? Q-10: Improve Performance, Reliability of distributed Database (slide 38, 39)? Q-11: What is being distributed? Q-12 What do you mean by distributed processing system? Slide 18

Lecture 2 – 3: Normalization (all slides) Relational Algebra (only algebra) Lecture -4: Reasons for Fragmentation Slides, 6, 7, 9, 10 Lecture -5: Degree of Fragmentation Slides 4, 5, 6, 7 Comparisons of Relocation Alternatives (Slide 9) Database Information (Slide 13) Application Information (Slide 17, 18, 19) Algorithm (Slides 25, 26, 27, 28, 29) Example solving (Slides 34, 38) Lecture -6+7: Important aspect or desirable property of simple predicates: Slides 7, 8, 10, 11, 12, 13, 14, 16, Problems solving (Slides 18, 19, 20) Slides: 33-37(Allocation Model) Slide 19 Sample Questions

Thank You Slide 20