Outline Introduction Background Distributed DBMS Architecture

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

Outline  Introduction  Background  Distributed DBMS Architecture  Distributed Database Design  Semantic Data Control ➠ View Management ➠ Data Security.
CS4432: Database Systems II
CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.7/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.
Institut für Scientific Computing – Universität WienP.Brezany Optimization of Distributed Queries Univ.-Prof. Dr. Peter Brezany Institut für Scientific.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Distributed DBMSPage 5. 1 © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture  Distributed Database.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Chapter 12 Distributed Database Management Systems
Query Processing & Optimization
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
LABERIO: Dynamic load- balanced routing in OpenFlow- enabled networks Author:Hui Long, Yao Shen*, Minyi Guo, Feilong Tang IEEE 27th International.
low level data manipulation
Access Path Selection in a Relational Database Management System Selinger et al.
1 6. Distributed Query Optimization Chapter 9 Optimization of Distributed Queries.
Query Optimization. Query Optimization Query Optimization The execution cost is expressed as weighted combination of I/O, CPU and communication cost.
Overview of Query Processing
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Query Processor  A query processor is a module in the DBMS that performs the tasks to process, to optimize, and to generate execution strategy for a high-level.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.8/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
L4: Query Optimization (1) - 1 L4: Query Processing and Optimization v 4.1 Query Processing  Query Decomposition  Data Localization v 4.1 Query Optimization.
1 Minggu 6, Pertemuan 12 Query Processing Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.
Distributed DBMS© 2001 M. Tamer Özsu & Patrick Valduriez Page 1.1 Outline n Introduction Background Distributed DBMS Architecture Distributed Database.
CS742 – Distributed & Parallel DBMSPage 3. 1M. Tamer Özsu Outline Introduction & architectural issues Data distribution  Distributed query processing.
Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN
Query Optimization Heuristic Optimization
Outline Background Introduction Distributed DBMS Architecture
Introduction to Load Balancing:
UNIT 11 Query Optimization
Distributed Database Management Systems
Query Optimization Kush Kashyap B.Tech -IT.
Distributed Databases
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
R*: An Overview of the Architecture
Outline Introduction Background Distributed DBMS Architecture
Akshay Tomar Prateek Singh Lohchubh
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Chapter 11 Database Performance Tuning and Query Optimization
Query Optimization.
Distributed Database Management Systems
Query Processing.
Course Instructor: Supriya Gupta Asstt. Prof
Distributed Database Management Systems
Outline Introduction Background Distributed DBMS Architecture
Outline Introduction Background Distributed DBMS Architecture
Presentation transcript:

Outline Introduction Background Distributed DBMS Architecture Distributed Database Design Distributed Query Processing Query Processing Methodology Distributed Query Optimization Transaction Management Building Distributed Database Systems (RAID) Mobile Database Systems Privacy, Trust, and Authentication Peer to Peer Systems

Useful References Textbook Principles of Distributed Database Systems, Chapter 6

Query Processing high level user query low level data manipulation processor low level data manipulation commands

Query Processing Components Query language that is used SQL: “intergalactic dataspeak” Query execution methodology The steps that one goes through in executing high- level (declarative) user queries. Query optimization How do we determine the “best” execution plan?

Selecting Alternatives SELECT ENAME  Project FROM EMP,ASG  Select WHERE EMP.ENO = ASG.ENO  Join AND DUR > 37 Strategy 1 ENAME(DUR>37EMP.ENO=ASG.ENO(EMP  ASG)) Strategy 2 ENAME(EMP ENO (DUR>37 (ASG))) Strategy 2 avoids Cartesian product, so is “better”

result2=(EMP1EMP2) ENODUR>37(ASG1ASG1) What is the Problem? Site 1 Site 2 Site 3 Site 4 Site 5 ASG1=ENO≤“E3”(ASG) ASG2=ENO>“E3”(ASG) EMP1=ENO≤“E3”(EMP) EMP2=ENO>“E3”(EMP) Result Site 5 Site 5 result = EMP1’EMP2’ result2=(EMP1EMP2) ENODUR>37(ASG1ASG1) EMP1’ EMP2’ ASG1 ASG2 EMP1 EMP2 Site 3 Site 4 EMP1’=EMP1 ENOASG1’ EMP2’=EMP2 ENOASG2’ Site 1 Site 2 Site 3 Site 4 ASG1’ ASG2’ Site 1 Site 2 ASG1’=DUR>37(ASG1) ASG2’=DUR>37(ASG2)

Cost of Alternatives Assume: Strategy 1 Strategy 2 size(EMP) = 400, size(ASG) = 1000 tuple access cost = 1 unit; tuple transfer cost = 10 units Strategy 1 produce ASG': (10+10)tuple access cost 20 transfer ASG' to the sites of EMP: (10+10)tuple transfer cost 200 produce EMP': (10+10) tuple access cost2 40 transfer EMP' to result site: (10+10) tuple transfer cost 200 Total cost 460 Strategy 2 transfer EMP to site 5:400tuple transfer cost 4,000 transfer ASG to site 5 :1000tuple transfer cost 10,000 produce ASG':1000tuple access cost 1,000 join EMP and ASG':40020tuple access cost 8,000 Total cost 23,000

Query Optimization Objectives Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments Wide area networks communication cost will dominate (80 – 200 ms) low bandwidth low speed high protocol overhead most algorithms ignore all other cost components Local area networks communication cost not that dominant (1 – 5 ms) total cost function should be considered Can also maximize throughput

Complexity of Relational Operations Select Project O(n) Assume relations of cardinality n sequential scan (without duplicate elimination) Project (with duplicate elimination) O(nlog n) Group Join Semi-join O(nlog n) Division Set Operators Cartesian Product O(n2)

Query Optimization Issues – Types of Optimizers Exhaustive search cost-based optimal combinatorial complexity in the number of relations Heuristics not optimal regroup common sub-expressions perform selection, projection first replace a join by a series of semijoins reorder operations to reduce intermediate relation size optimize individual operations

Query Optimization Issues – Optimization Granularity Single query at a time cannot use common intermediate results Multiple queries at a time efficient if many similar queries decision space is much larger

Query Optimization Issues – Optimization Timing Static compilation  optimize prior to the execution difficult to estimate the size of the intermediate results  error propagation can amortize over many executions R* Dynamic run time optimization exact information on the intermediate relation sizes have to reoptimize for multiple executions Distributed INGRES Hybrid compile using a static algorithm if the error in estimate sizes > threshold, reoptimize at run time MERMAID

Query Optimization Issues – Statistics Relation cardinality size of a tuple fraction of tuples participating in a join with another relation Attribute cardinality of domain actual number of distinct values Common assumptions independence between different attribute values uniform distribution of attribute values within their domain

Query Optimization Issues – Decision Sites Centralized single site determines the “best” schedule simple need knowledge about the entire distributed database Distributed cooperation among sites to determine the schedule need only local information cost of cooperation Hybrid one site determines the global schedule each site optimizes the local subqueries

Query Optimization Issues – Network Topology Wide area networks (WAN) – point-to-point characteristics low bandwidth low speed high protocol overhead communication cost will dominate; ignore all other cost factors global schedule to minimize communication cost local schedules according to centralized query optimization Local area networks (LAN) communication cost not that dominant total cost function should be considered broadcasting can be exploited (joins) special algorithms exist for star networks