Distributed Database Management Systems

Slides:

Advertisements

Similar presentations

1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.

Advertisements

Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.

Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.

Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:

1 Distributed Databases Chapter Two Types of Applications that Access Distributed Databases The application accesses data at the level of SQL statements.

1 Distributed Databases CS347 Lecture 14 May 30, 2001.

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.

Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.

1 Distributed Databases Chapter What is a Distributed Database? Database whose relations reside on different sites Database some of whose relations.

Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.

CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.

L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.

1 Implementation of Relational Operations: Joins.

Database Management 9. course. Execution of queries.

1 6. Distributed Query Optimization Chapter 9 Optimization of Distributed Queries.

Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.

©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.

Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Query Processor  A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level.

PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.

Query optimization in distributed database systems.

PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.

Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.

Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.

Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty

Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.

CS 440 Database Management Systems Lecture 5: Query Processing 1.

Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.

CS 540 Database Management Systems

Query Processing and Optimization Muheet Ahmed Butt.

CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.

CS 540 Database Management Systems

CS 440 Database Management Systems

Database Management System

Interquery Parallelism

Database System Implementation CSE 507

Chapter 12: Query Processing

COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.

Chapter 15 QUERY EXECUTION.

Database Management Systems (CS 564)

R*: An Overview of the Architecture

Yan Huang - CSCI5330 Database Implementation – Access Methods

Database Query Execution

Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.

Outline Introduction Background Distributed DBMS Architecture

Selected Topics: External Sorting, Join Algorithms, …

Distributed Database Management Systems

Lecture 2- Query Processing (continued)

Advance Database Systems

Chapter 12 Query Processing (1)

Implementation of Relational Operations

CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.

Evaluation of Relational Operations: Other Techniques

Distributed Database Management Systems

External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.

Computational Advertising and

Presentation transcript:

Distributed Database Management Systems Lecture 35

In the previous lecture Query Optimization Centralized QO Best access path Join Processing QO in Distributed Environment.

In this lecture Query Optimization Fragmented Queries Joins replaced by Semijoins Three major QO algorithms.

Semijoin based Algorithms

Reduces cost of join queries Semijoin is ……. Join of two relations can be replaced SJ of one or both relations.

Which one? Need to estimate costs. So R ⋈A S can be replaced: (R ⋉A S) ⋈A S R ⋈A (S ⋉A R) (R ⋉A S) ⋈A (S ⋉A R) Which one? Need to estimate costs.

Same Assumptions: R at site 1, S at site 2 Size (R) < Size (S), so A (S)  site 1 Site1 computes R’ = R ⋉A S’ R’  site 2 Site2 computes R’ ⋈A S

Ignoring Tmsg semijoin is better if Size(A(S)) + size(R ⋉A S) < size(R) Join is better if …..- Semijoin is better if…..-.

SJ with more than two tables Will be more complex Semijoin approach can be applied to each individual join, consider EMP ⋈ ASG ⋈ PROJ

EMP ⋈ ASG ⋈ PROJ = EMP’ ⋈ ASG’ ⋈ PROJ where EMP’ = EMP ⋉ ASG and ASG’ = ASG ⋉ PROJ rather EMP” = EMP ⋉ (ASG ⋉ PROJ)

Many SJ expressions possible for a relation “Full reducer” a SJ expression that reduces R the maximum Not exists for cyclic queries.

Select eName From EMP, ASG, PROJ Where EMP.eNo = ASG.eNo and ASG.eNo = PROJ.eNo and EMP.city = PROJ.city

Cyclic Query Tree Query city eNo pNo eNo, city pNo, city ASG EMP PROJ

Full Reducer may be hard to find. Easy for a chained query Most systems use single SJs to reduce relation size.

Distributed Query Processing Algorithms

Three main representative algos are Distributed INGRES Algorithm R* Algorithm SDD-1 Algorithm.

R* Algorithm Static, exhaustive Algorithm supports fragmentation, actual implementation doesn’t Master, execution and apprentice sites.

Optimizer of Master site makes inter-site decisions Apprentice sites make local decisions Optimizes local processing time & communication time.

Optimizer, based on stats of DB and size of iterm results, decides about Join Ordering Join Algo (nested/mergeJoin) Access path (indexed/seq.).

Inter-site transfers Ship-whole Fetch-as-needed Entire relation transferred Stored in a temp relation In case of merge-join approach, tuples can be processed as they arrive Fetch-as-needed nExternal relation is sequentially scanned Join attribute value is sent to other relation Relevant tuples scanned at other site and sent to first site.

Inter-site transfers: comparison Ship-whole larger data transfer smaller number of messages better if relations are small Fetch-as-needed number of messages = O(cardinality of external relation) data transfer per message is minimal better if relations are large and the join selectivity is good.

Example, join of an external relation R with an internal relation S, there are four strategies.

1-Move outer relation tuples to the site of the inner relation Can be joined as they arrive Total Cost = LT (retrieve card(R) tuples from R) + CT (size(R)) + LT (retrieve s tuples from S) * card (R)

2- Move inner relation to the site of outer relation cannot join as they arrive; they need to be stored Total Cost = LT (retrieve card(S) tuples from S) + CT (size (S)) + LT (store card(S) tuples as T) + LT (retrieve card(R) tuples from R) + LT (retrieve s tuples from T) * card (R).

3- Fetch inner tuples as needed For each tuple in R, send join attribute value to site of S Retrieve matching inner tuples at site S Send the matching S tuples to site of R Join as they arrive

Total Cost = LT (retrieve card(R) tuples from R)+ CT (length(A) * card (R)) + LT(retrieve s tuples from S) * card(R) + CT (s * length(S)) * card(R)

4- Move both inner and outer relations to another site Example: A query consisting join of PROJ (ext) and ASG (int) on pNo Four strategies

1- Ship PROJ to site of ASG 2- Ship ASG to site of PROJ 3- Fetch ASG tuples as needed for each tuple of PROJ 4- Move both to a third site Optimization involves costing for each possibility.

That is it regarding R* algorithm for distributed query optimization Lets review it.

SDD-1 Algorithm System for Distributed Databases A non-commercial database.

Based on the Hill Climbing Algorithm No semijoins, No rep/frag Cost of transferring the result to the user site from the final result site is not considered Can minimize either total time or response time

Input include Query Graph Locations of relations Relations’ statistics.

1- Do the initial local processing 2- Select the initial best plan (ES0) Calculate cost of moving all relations to a single site Plan with the least cost is ES0

3- Split ES0 into ES1 and ES2 ES1: Sending one of the relation to other site, relations joined there ES2:Sending the result back to site in ES0.

5- Recursively apply step 3 and 4 on ES1 and ES2, until no improvement 4- Replace ES0 with ES1 and ES2 when we should have cost(ES1) + cost(local join) + cost (ES2) < cost (ES0) 5- Recursively apply step 3 and 4 on ES1 and ES2, until no improvement

Example “Find the salaries of engineers working on CAD/CAM project” Involves EMP, PAY, PROJ and ASG sal(PAY ⋈title(EMP ⋈eNo(ASG ⋈pNo(pName = ‘CAD/CAM’ (PROJ)))))

Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 Relation Size Site EMP PAY PROJ ASG 8 4 1 10 2 3 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 So size(R) = card(R)

Site 1 Considering only transfers costs PAY  site 1 = 4 PROJ  site 1 = 1 ASG  site 1 = 10 Total = 15

Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 Relation Size Site EMP PAY PROJ ASG 8 4 1 10 2 3 Assume Tmsg = 0 and TTR = 1 Length of a tuple is 1 So size(R) = card(R)

Site 1 Considering only transfers costs PAY  site 1 = 4 PROJ  site 1 = 1 ASG  site 1 = 10 Total = 15

Cost for site 2 = 19 Cost for site 3 = 22 Cost for site 4 = 13 So site 4 is our ES0 Move all relations to site 4.

Thanks