Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Slides:



Advertisements
Similar presentations
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Advertisements

Choosing an Order for Joins
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.14/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
CS4432: Database Systems II
Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Query Processing –An Overview
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed databases
Chapter 13 (Web): Distributed Databases
1 Minggu 12, Pertemuan 23 Introduction to Distributed DBMS (Chapter , 22.6, 3rd ed.) Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Chapter 12 Distributed Database Management Systems
Distributed Databases
Outline Introduction Background Distributed Database Design
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed Databases and DBMSs: Concepts and Design
Query Processing Presented by Aung S. Win.
1 Distributed and Parallel Databases. 2 Distributed Databases Distributed Systems goal: –to offer local DB autonomy at geographically distributed locations.
III. Current Trends: 1 - Distributed DBMSsSlide 1/32 III. Current Trends Part 1: Distributed DBMSs: Concepts and Design Lecture 12 (2 hours) Lecturer:
low level data manipulation
DISTRIBUTED DATABASE DESIGN
Access Path Selection in a Relational Database Management System Selinger et al.
Session-9 Data Management for Decision Support
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Οι διαφάνειες καλύπτουν μέρος των Κεφαλαίων 7&8: Distributed Database QueryProcessing and Optimization.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Overview of Query Processing
1 12. Course Summary Course Summary Distributed Database Systems.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Session-8 Data Management for Decision Support
10 1 Chapter 10 Distributed Database Management Systems Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 12 Distributed Database Management Systems.
Distributed Database Systems Overview
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
DDBMS Distributed Database Management Systems Fragmentation
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Kjell Orsborn UU - DIS - UDBL DATABASE SYSTEMS - 10p Course No. 2AD235 Spring 2002 A second course on development of database systems Kjell.
PMIT-6101 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Topic Distributed DBMS Database Management Systems Fall 2012 Presented by: Osama Ben Omran.
R*: An overview of the Architecture By R. Williams et al. Presented by D. Kontos Instructor : Dr. Megalooikonomou.
Distributed Database Management Systems. Reading Textbook: Ch. 1, Ch. 3 Textbook: Ch. 1, Ch. 3 For next class: Ch. 4 For next class: Ch. 4 FarkasCSCE.
Distributed DBMS, Query Processing and Optimization
1 Chapter 22 Distributed DBMS Concepts and Design CS 157B Edward Chen.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Distributed Database Design Bayu Adhi Tama, MTI Fasilkom-Unsri Adapted from Connolly, et al., Database Systems 4 th Edition, Pearson Education Limited,
L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
1 Chapter 22 Distributed DBMSs - Concepts and Design Simplified Transparencies © Pearson Education Limited 1995, 2005.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Outline Background Introduction Distributed DBMS Architecture
Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Database Architecture
Query Optimization.
Distributed Database Management Systems
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control Distributed Query Processing ➡ Overview ➡ Query decomposition and localization ➡ Distributed query optimization Multidatabase Query Processing Distributed Transaction Management Data Replication Parallel Database Systems Distributed Object DBMS Peer-to-Peer Data Management Web Data Management Current Issues

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/2 Query Processing in a DDBMS high level user query query processor Low-level data manipulation commands for D-DBMS

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/3 Query Processing Components Query language that is used ➡ SQL: “intergalactic dataspeak” Query execution methodology ➡ The steps that one goes through in executing high-level (declarative) user queries. Query optimization ➡ How do we determine the “best” execution plan? We assume a homogeneous D-DBMS

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/4 SELECTENAME FROMEMP,ASG WHEREEMP.ENO = ASG.ENO ANDRESP = "Manager" Strategy 1  ENAME (  RESP=“Manager”  EMP.ENO=ASG.ENO (EMP×ASG)) Strategy 2  ENAME (EMP ⋈ ENO (  RESP=“Manager” (ASG)) Strategy 2 avoids Cartesian product, so may be “better” Selecting Alternatives

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/5 What is the Problem? Site 1Site 2Site 3Site 4Site 5 EMP 1 =   ENO≤“E3” (EMP)EMP 2 =   ENO>“E3” (EMP) ASG 2 =   ENO>“E3” (ASG) ASG 1 =  ENO≤“E3” (ASG) Result Site 5 Site 1Site 2Site 3Site 4 ASG 1 EMP 1 EMP 2 ASG 2 Site 4Site 3 Site 1Site 2 Site 5 EMP ’ 1 =EMP 1 ⋈ ENO ASG ’ 1 result= (EMP 1  EMP 2 ) ⋈ ENO σ RESP=“Manager” (ASG 1 ×  ASG 2 ) EMP ’ 2 =EMP 2 ⋈ ENO ASG ’ 2

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/6 Cost of Alternatives Assume ➡ size (EMP) = 400, size (ASG) = 1000 ➡ tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1 ➡ produce ASG': (10+10)  tuple access cost20 ➡ transfer ASG' to the sites of EMP: (10+10)  tuple transfer cost200 ➡ produce EMP': (10+10)  tuple access cost  240 ➡ transfer EMP' to result site: (10+10)  tuple transfer cost 200 Total Cost460 Strategy 2 ➡ transfer EMP to site 5: 400  tuple transfer cost4,000 ➡ transfer ASG to site 5: 1000  tuple transfer cost10,000 ➡ produce ASG': 1000  tuple access cost1,000 ➡ join EMP and ASG': 400  20  tuple access cost 8,000 Total Cost23,000

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/7 Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks ➡ communication cost may dominate or vary much ✦ bandwidth ✦ speed ✦ high protocol overhead Local area networks ➡ communication cost not that dominant ➡ total cost function should be considered Can also maximize throughput

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/8 Complexity of Relational Operations Assume ➡ relations of cardinality n ➡ sequential scan OperationComplexity Select Project (without duplicate elimination) O( n ) Project (with duplicate elimination) Group O( n  log n ) Join Semi-join Division Set Operators O( n  log n ) Cartesian ProductO( n 2 )

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/9 Query Optimization Issues – Types Of Optimizers Exhaustive search ➡ Cost-based ➡ Optimal ➡ Combinatorial complexity in the number of relations Heuristics ➡ Not optimal ➡ Regroup common sub-expressions ➡ Perform selection, projection first ➡ Replace a join by a series of semijoins ➡ Reorder operations to reduce intermediate relation size ➡ Optimize individual operations

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/10 Query Optimization Issues – Optimization Granularity Single query at a time ➡ Cannot use common intermediate results Multiple queries at a time ➡ Efficient if many similar queries ➡ Decision space is much larger

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/11 Query Optimization Issues – Optimization Timing Static ➡ Compilation  optimize prior to the execution ➡ Difficult to estimate the size of the intermediate results  error propagation ➡ Can amortize over many executions ➡ R* Dynamic ➡ Run time optimization ➡ Exact information on the intermediate relation sizes ➡ Have to reoptimize for multiple executions ➡ Distributed INGRES Hybrid ➡ Compile using a static algorithm ➡ If the error in estimate sizes > threshold, reoptimize at run time ➡ Mermaid

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/12 Query Optimization Issues – Statistics Relation ➡ Cardinality ➡ Size of a tuple ➡ Fraction of tuples participating in a join with another relation Attribute ➡ Cardinality of domain ➡ Actual number of distinct values Common assumptions ➡ Independence between different attribute values ➡ Uniform distribution of attribute values within their domain

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/13 Query Optimization Issues – Decision Sites Centralized ➡ Single site determines the “best” schedule ➡ Simple ➡ Need knowledge about the entire distributed database Distributed ➡ Cooperation among sites to determine the schedule ➡ Need only local information ➡ Cost of cooperation Hybrid ➡ One site determines the global schedule ➡ Each site optimizes the local subqueries

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/14 Query Optimization Issues – Network Topology Wide area networks (WAN) – point-to-point ➡ Characteristics ✦ Low bandwidth ✦ Low speed ✦ High protocol overhead ➡ Communication cost will dominate; ignore all other cost factors ➡ Global schedule to minimize communication cost ➡ Local schedules according to centralized query optimization Local area networks (LAN) ➡ Communication cost not that dominant ➡ Total cost function should be considered ➡ Broadcasting can be exploited (joins) ➡ Special algorithms exist for star networks

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/15 Distributed Query Processing Methodology Calculus Query on Distributed Relations CONTROL SITE LOCAL SITES Query Decomposition Query Decomposition Data Localization Data Localization Algebraic Query on Distributed Relations Global Optimization Global Optimization Fragment Query Local Optimization Local Optimization Optimized Fragment Query with Communication Operations Optimized Local Queries GLOBAL SCHEMA GLOBAL SCHEMA FRAGMENT SCHEMA FRAGMENT SCHEMA STATS ON FRAGMENTS STATS ON FRAGMENTS LOCAL SCHEMAS LOCAL SCHEMAS