CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
Parallel Databases Michael French, Spencer Steele, Jill Rochelle When Parallel Lines Meet by Ken Rudin (BYTE, May 98)
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Parallel Database Systems The Future Of High Performance Database Systems David Dewitt and Jim Gray 1992 Presented By – Ajith Karimpana.
Selectivity-Based Partitioning Alkis Polyzotis UC Santa Cruz.
CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) February, 3, 2010.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Session – 10 QUERY OPTIMIZATION Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) February, 4, 2009.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Parallel Algorithms for Relational Operations. Models of Parallelism There is a collection of processors. –Often the number of processors p is large,
Parallel Algorithms for Relational Operations. Many processors...and disks There is a collection of processors. –Often the number of processors p is large,
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
1 Distributed Databases Review CS347 June 6, 2001.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
L Distributed Query Optimization Algorithms -- 1 Distributed Query Optimization Algorithms v System R and R* v Hill Climbing and SDD-1.
CS 347Notes 031 CS 347: Distributed Databases and Transaction Processing Notes03: Query Processing Hector Garcia-Molina.
Multidimensional Data Many applications of databases are ``geographic'' = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
CSCE Database Systems Chapter 15: Query Execution 1.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
CS4432: Database Systems II Query Processing- Part 1 1.
Table General Guidelines for Better System Performance
Module 11: File Structure
CS 540 Database Management Systems
CS 440 Database Management Systems
Distributed Query Processing using different Semijoin operations.
CS 460 Spring 2011 Lecture 4.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Christian Böhm, Bernhard Braunmüller, Florian Krebs, and Hans-Peter Kriegel, University of Munich Epsilon Grid Order: An Algorithm for the Similarity.
Introduction to Query Optimization
Database Management Systems (CS 564)
Join Processing in Database Systems with Large Main Memories (part 2)
April 30th – Scheduling / parallel
Implementation of Relational Operations (Part 2)
CS 347: Parallel and Distributed Data Management Notes 11: Network Partitions Hector Garcia-Molina CS347 Notes11.
Outline Introduction Background Distributed DBMS Architecture
Selected Topics: External Sorting, Join Algorithms, …
Table General Guidelines for Better System Performance
Chapters 15 and 16b: Query Optimization
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Distributed Database Management Systems
Presentation transcript:

CS 347Notes 041 CS 347: Distributed Databases and Transaction Processing Notes04: Query Optimization Hector Garcia-Molina

CS 347Notes 042 Query optimization Cost estimation Strategies for exploring plans min Q

CS 347Notes 043 Cost estimation  As in centralized system: estimate result sizes

CS 347Notes 044  But: # IOs may not be best metric e.g., Transmission time may dominate workworkanswerat site T1T2 >>> TIME > or $

CS 347Notes 045 Another reason why plain IOs not enough:  Parallelism Plan APlan B site 150 IOs 100 IOs site 270 IOs site 350 IOs

CS 347Notes 046 Cost metrics –IOs, Bytes transmitted, $, … –Can add together Response time metric –cannot add –need scheduling and dependency info –skew important Task 1 Task 2 Task 3

CS 347Notes 047  Take into account: (in parallel/distributed system) Start up costs (for parallel operation) Data distribution costs/time Contention –memory, disk, network,… Assembling result

CS 347Notes 048 Example: Response time Site 1 Site 2 Site 3 Site 4 StartupDistri- Searching Final bution +send results proc.

CS 347Notes 049 Searching strategies (1)Exhaustive (with pruning) (2)Hill climbing (greedy) (3)Query separation

CS 347Notes 0410 (1) Exhaustive - consider “all” query plans with a set of techniques - prune some plans - heuristics

CS 347Notes 0411 RST R S R  T S R S T T S T  R (S R) T (T S) R ship S semi ship T semi to R join to Sjoin 1Prune because cross-product not necessary 2Prune because larger relation first Example: join |R|>|S|>|T| RST AB 2121

CS 347Notes 0412  In generating plans, keep goal in mind: e.g.: Goal is parallelism in system with fast net, consider partitioning relation(s) first e.g.: Goal is reduction of net traffic, consider semi-joins

CS 347Notes 0413 (2) Hill climbing Better plans Worse plans x Initial plan 1

CS 347Notes 0414 (2) Hill climbing Better plans Worse plans x Initial plan 1 2

CS 347Notes 0415 Example R S T V RelSiteSizetuple size = 1 R 1 10 S 2 20 T 3 30 V 4 40 RSTV ABC Goal: minimize data transmission

CS 347Notes 0416 Initial plan: send relations to one site What site do we send all relations to? To site 1: cost= =90 To site 2: cost= =80 To site 3: cost= =70 To site 4: cost= =60 

CS 347Notes 0417 P 0 : R (1  4) S (2  4) T (3  4) Compute R S T V at site 4

CS 347Notes 0418 Local search Consider sending each relation to neighbor: e.g.: R S R R S

CS 347Notes 0419 Assume: SizeR S = 20 S T = 5 T V = 1 Option (a) cost = 30cost = 30 No savings R S R S R

CS 347Notes 0420 Option (b) cost = 30cost = 40 Worse off! S R S S R

CS 347Notes 0421 Option (c) cost = 50cost = 35 A Win! T S T T S

CS 347Notes 0422 Option (d) cost = 50cost = 25 A Bigger Win! S T S T S

CS 347Notes 0423 P 1 : P 1 a: S (2  3)  = S T P 1 b: R (1  4)  (3  4) compute answer at site 4

CS 347Notes 0424 Repeat local search - Treat  = S T as relation R   R  R R  vs

CS 347Notes 0425 Hill climbing may miss best plan! Example: best plan could be: P B : T (3  4)  =T V  (4  2)  ’=  S  ’ (2  1)  ”=  ’ R  ” (1  4) Compute answer  ’’ ’’  V T R S T [optional]

CS 347Notes 0426 Hill climbing may miss best plan! Example: best plan could be: P B : T (3  4)  =T V  (4  2)  ’=  S  ’ (2  1)  ”=  ’ R  ” (1  4) Compute answer  ’’ ’’  V T R S T  30 1 1 1 1 1 1 33 = total Costs could be low because β is very selective [optional]

CS 347Notes 0427 (3) Query separation - separate query into 2 or more steps -optimize each step independently

CS 347Notes 0428 Example: simple queries e.g.:  c 1  c 2  c 3 R S 1. Compute R’ =  A [  c 2 R] S’ =  A [  c 3 S] 2. Compute J = R’ S’ A

CS 347Notes 0429  c 1  c 2  c 3 R S A 1. Compute R’ =  A [  c 2 R] S’ =  A [  c 3 S] 2. Compute J = R’ S’ 3. Compute Ans =  c 1 { [J  c 2 R] [J  c 3 S] }

CS 347Notes 0430 In other words: (a) Compute A values in answer (steps 1,2) (b) Get tuples from sites with matching A values and compute answer (step 3)

CS 347Notes 0431 Simple query - Relations have a single attribute - Output has a single attribute e.g., J  R’ S’

CS 347Notes 0432 Idea Decompose query into –Local processing –Simple query (or queries) –Final processing Optimize simple query

CS 347Notes 0433 Philosophy Hard part is distributed join Do this part with only keys; get rest of data later Simpler to optimize simple queries

CS 347Notes 0434 Summary: Query Optimization Cost estimation Strategies –Exhaustive –Hill climbing –Separation

CS 347Notes 0435 Words of wisdom “Optimization is like chess playing” i.e., May have to make sacrifices (move data, partition relations, build indexes) for later gains!