Distributed DBMSPage 7-9. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Slides:



Advertisements
Similar presentations
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.15/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Advertisements

Choosing an Order for Joins
Distributed DBMS©M. T. Özsu & P. Valduriez Ch.14/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
CS4432: Database Systems II
Distributed DBMSPage 6. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Distributed Query Processing –An Overview
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Distributed DBMSPage 4. 1© 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background  Distributed DBMS Architecture  Datalogical Architecture.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Chapter 12 Distributed Database Management Systems
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline  Introduction à What is a distributed DBMS à Problems à Current state-of-affairs.
Query Processing & Optimization
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed Databases
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
low level data manipulation
Access Path Selection in a Relational Database Management System Selinger et al.
Session-9 Data Management for Decision Support
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Οι διαφάνειες καλύπτουν μέρος των Κεφαλαίων 7&8: Distributed Database QueryProcessing and Optimization.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Overview of Query Processing
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 0.1 Outline Introduction Background Distributed DBMS Architecture Distributed Database Design.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
DDBMS Distributed Database Management Systems Fragmentation
Distributed DBMSs- Concept and Design Jing Luo CS 157B Dr. Lee Fall, 2003.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
1 Minggu 6, Pertemuan 12 Query Processing Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Outline Background Introduction Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
R*: An Overview of the Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Database Architecture
Query Optimization.
Distributed Database Management Systems
Course Instructor: Supriya Gupta Asstt. Prof
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database Design  Distributed Query Processing à Query Processing Methodology à Distributed Query Optimization Distributed Transaction Management (Extensive) n Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Processing high level user query query processor low level data manipulation commands

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Processing Components Query language that is used à SQL: “intergalactic dataspeak” Query execution methodology à The steps that one goes through in executing high- level (declarative) user queries. Query optimization à How do we determine the “best” execution plan?

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez SELECTENAME  Project FROMEMP,ASG  Select WHEREEMP.ENO = ASG.ENO  Join ANDDUR > 37 Strategy 1  ENAME (  DUR>37  EMP.ENO=ASG.ENO  (EMP  ASG)) Strategy 2  ENAME (EMP ENO (  DUR>37 (ASG))) Strategy 2 avoids Cartesian product, so is “better” Selecting Alternatives

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez What is the Problem? Site 1Site 2Site 3Site 4Site 5 EMP 1 =  ENO≤“E3” (EMP)EMP 2 =  ENO>“E3” (EMP) ASG 2 =  ENO>“E3” (ASG) ASG 1 =  ENO≤“E3” (ASG) Result Site 5 Site 1Site 2Site 3Site 4 ASG 1 EMP 1 EMP 2 ASG 2 result 2 =(EMP 1   EMP 2 ) ENO  DUR>37 (ASG 1  ASG 1 ) Site 4 result = EMP 1 ’  EMP 2 ’ Site 3 Site 1Site 2 EMP 2 ’ =EMP 2 ENO ASG 2 ’ EMP 1 ’ =EMP 1 ENO ASG 1 ’ ASG 1 ’ =  DUR>37 (ASG 1 )ASG 2 ’ =  DUR>37 (ASG 2 ) Site 5 ASG 2 ’ ASG 1 ’ EMP 1 ’ EMP 2 ’

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Assume: à size (EMP) = 400, size (ASG) = 1000 à tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1  produce ASG': (10+10)  tuple access cost 20  transfer ASG' to the sites of EMP: (10+10)  tuple transfer cost 200  produce EMP': (10+10)  tuple access cost  2 40  transfer EMP' to result site: (10+10)  tuple transfer cost 200 Total cost 460 Strategy 2  transfer EMP to site 5:400  tuple transfer cost 4,000  transfer ASG to site 5 :1000  tuple transfer cost 10,000  produce ASG':1000  tuple access cost 1,000  join EMP and ASG':400  20  tuple access cost 8,000 Total cost23,000 Cost of Alternatives

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks à communication cost will dominate (80 – 200 ms)  low bandwidth  low speed  high protocol overhead à most algorithms ignore all other cost components Local area networks à communication cost not that dominant (1 – 5 ms) à total cost function should be considered Can also maximize throughput Query Optimization Objectives

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Assume à relations of cardinality n à sequential scan Complexity of Relational Operations OperationComplexity Select Project (without duplicate elimination) O( n ) Project (with duplicate elimination) Group O( n log n ) Join Semi-join Division Set Operators O( n log n ) Cartesian ProductO( n 2 )

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Types of Optimizers Exhaustive search à cost-based à optimal à combinatorial complexity in the number of relations Heuristics à not optimal à regroup common sub-expressions à perform selection, projection first à replace a join by a series of semijoins à reorder operations to reduce intermediate relation size à optimize individual operations

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Optimization Granularity Single query at a time à cannot use common intermediate results Multiple queries at a time à efficient if many similar queries à decision space is much larger

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Optimization Timing Static  compilation  optimize prior to the execution  difficult to estimate the size of the intermediate results  error propagation à can amortize over many executions à R* Dynamic à run time optimization à exact information on the intermediate relation sizes à have to reoptimize for multiple executions à Distributed INGRES Hybrid à compile using a static algorithm à if the error in estimate sizes > threshold, reoptimize at run time à MERMAID

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Statistics Relation à cardinality à size of a tuple à fraction of tuples participating in a join with another relation Attribute à cardinality of domain à actual number of distinct values Common assumptions à independence between different attribute values à uniform distribution of attribute values within their domain

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Decision Sites Centralized à single site determines the “best” schedule à simple à need knowledge about the entire distributed database Distributed à cooperation among sites to determine the schedule à need only local information à cost of cooperation Hybrid à one site determines the global schedule à each site optimizes the local subqueries

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Query Optimization Issues – Network Topology Wide area networks (WAN) – point-to-point à characteristics  low bandwidth  low speed  high protocol overhead à communication cost will dominate; ignore all other cost factors à global schedule to minimize communication cost à local schedules according to centralized query optimization Local area networks (LAN) à communication cost not that dominant à total cost function should be considered à broadcasting can be exploited (joins) à special algorithms exist for star networks