CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Evaluation of Recursive Queries Part 2: Pushing Selections.

Slides:



Advertisements
Similar presentations
Chapter 5: Other Relational Languages
Advertisements

SQL: The Query Language Part 2
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Relational Calculus and Datalog
TURKISH STATISTICAL INSTITUTE 1 /34 SQL FUNDEMANTALS (Muscat, Oman)
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5 Modified by Donghui Zhang.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Ver 1,12/09/2012Kode :CCs 111,Sistem basis DataFASILKOM Chapter 5: Other Relational Languages Database System Concepts, 5th Ed. ©Silberschatz, Korth and.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 Recursive SQL, Deductive Databases, Query Evaluation Book Chapter of Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Implementation Techniques Acknowledgements to Angele, Gehrke, STI.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 SQL: Queries, Programming, Triggers Chapter 5.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
Berendt: Advanced databases, winter term 2007/08, 1 Advanced databases – Inferring implicit/new.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Evaluation of Recursive Queries Part 1: Efficient fixpoint evaluation “Seminaïve Evaluation”
Chapter 5 Other Relational Languages By Cui, Can B.
Database Systems More SQL Database Design -- More SQL1.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Cs5611 Recursive SQL, Deductive Databases, Query Evaluation Slides based on book chapter, By Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
Deductive Databases Chapter 25
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
©Silberschatz, Korth and Sudarshan5.1Database System Concepts Chapter 5: Other Relational Languages Query-by-Example (QBE) Datalog.
Database Management Systems, 2 nd Edition. R. Ramakrishnan and J. Gehrke1 Data Warehousing and Decision Support Chapter 25, Part B.
CSE314 Database Systems More SQL: Complex Queries, Triggers, Views, and Schema Modification Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson.
Access Path Selection in a Relational Database Management System Selinger et al.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
SQL: Data Manipulation Presented by Mary Choi For CS157B Dr. Sin Min Lee.
1 Single Table Queries. 2 Objectives  SELECT, WHERE  AND / OR / NOT conditions  Computed columns  LIKE, IN, BETWEEN operators  ORDER BY, GROUP BY,
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
NULLs & Outer Joins Objectives of the Lecture : To consider the use of NULLs in SQL. To consider Outer Join Operations, and their implementation in SQL.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 1999 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
1 Relational Algebra and Calculas Chapter 4, Part A.
IS 230Lecture 6Slide 1 Lecture 7 Advanced SQL Introduction to Database Systems IS 230 This is the instructor’s notes and student has to read the textbook.
SQL SeQueL -Structured Query Language SQL SQL better support for Algebraic operations SQL Post-Relational row and column types,
Introduction to SQL ; Christoph F. Eick & R. Ramakrishnan and J. Gehrke 1 Using SQL as a Query Language COSC 6340.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 5 SQL.
1 SQL: Structured Query Language (‘Sequel’) Chapter 5.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Collection Operators These slides are.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Agenda for Class - 03/04/2014 Answer questions about HW#5 and HW#6 Review query syntax. Discuss group functions and summary output with the GROUP BY statement.
CS240A: Databases and Knowledge Bases Recursive Queries in SQL 2003 Carlo Zaniolo Department of Computer Science University of California, Los Angeles.
1 SQL: The Query Language. 2 Example Instances R1 S1 S2 v We will use these instances of the Sailors and Reserves relations in our examples. v If the.
SQL and Query Execution for Aggregation. Example Instances Reserves Sailors Boats.
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
1 CS122A: Introduction to Data Management Lecture 9 SQL II: Nested Queries, Aggregation, Grouping Instructor: Chen Li.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Basic SQL Lecture 6 Fall
Writing Correlated Subqueries
Evaluation of Relational Operations: Other Operations
CS222: Principles of Data Management Notes #13 Set operations, Aggregation Instructor: Chen Li.
SQL: Structured Query Language
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 5- Query Optimization (continued)
Datalog Inspired by the impedance mismatch in relational databases.
Presentation transcript:

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 1 Evaluation of Recursive Queries Part 2: Pushing Selections

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 2 Aggregate Operators  The notation in the head indicates grouping; the remaining arguments (Part, in this example) are the GROUP BY fields.  In order to apply such a rule, must have all of Assembly relation available.  Stratification with respect to use of is the usual restriction to deal with this problem; similar to negation. NumParts(Part, SUM ( )) :- Assembly(Part, Subpt, Qty). SELECT A.Part, SUM (A.Qty) FROM Assembly A GROUP BY A.Part

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 3 Datalog vs. SQL Notation  Don’t let the rule syntax of Datalog fool you: a collection of Datalog rules can be rewritten in SQL syntax, if recursion is allowed. WITH RECURSIVE Comp(Part, Subpt) AS ( SELECT A1.Part, A1.Subpt FROM Assembly A1) UNION ( SELECT A2.Part, C1.Subpt FROM Assembly A2, Comp C1 WHERE A2.Subpt=C1.Part) SELECT * FROM Comp C2

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 4 Evaluation of Datalog Programs  Repeated inferences: When recursive rules are repeatedly applied in the naïve way, we make the same inferences in several iterations.  Unnecessary inferences: Also, if we just want to find the components of a particular part, say wheel, computing the fixpoint of the Comp program and then selecting tuples with wheel in the first column is wasteful, in that we compute many irrelevant facts.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 5 Avoiding Repeated Inferences  Seminaive Fixpoint Evaluation: Avoid repeated inferences by ensuring that when a rule is applied, at least one of the body facts was generated in the most recent iteration. (Which means this inference could not have been carried out in earlier iterations.) For each recursive table P, use a table delta_P to store the P tuples generated in the previous iteration. Rewrite the program to use the delta tables, and update the delta tables between iterations. Comp(Part, Subpt) :- Assembly(Part, Part2, Qty), delta_Comp(Part2, Subpt).

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 6 Avoiding Unnecessary Inferences  There is a tuple (S1,S2) in SameLev if there is a path up from S1 to some node and down to S2 with the same number of up and down edges. SameLev(S1,S2) :- Assembly(P1,S1,Q1), Assembly(P1,S2,Q2). SameLev(S1,S2) :- Assembly(P1,S1,Q1), SameLev(P1,P2), Assembly(P2,S2,Q2). trike wheel frame spoke tire seat pedal rim tube

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 7 Avoiding Unnecessary Inferences  Suppose that we want to find all SameLev tuples with spoke in the first column. We should “push” this selection into the fixpoint computation to avoid unnecessary inferences.  But we can’t just compute SameLev tuples with spoke in the first column, because some other SameLev tuples are needed to compute all such tuples: SameLev(spoke,seat) :- Assembly(wheel,spoke,2), SameLev(wheel,frame), Assembly(frame,seat,1).

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 8 “Magic Sets” Idea  Idea: Define a “filter” table that computes all relevant values, and restrict the computation of SameLev to infer only tuples with a relevant value in the first column. Magic_SL(P1) :- Magic_SL(S1), Assembly(P1,S1,Q1). Magic_SL(spoke). SameLev(S1,S2) :- Magic_SL(S1), Assembly(P1,S1,Q1), Assembly(P1,S2,Q2). SameLev(S1,S2) :- Magic_SL(S1), Assembly(P1,S1,Q1), SameLev(P1,P2), Assembly(P2,S2,Q2).

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 9 The Magic Sets Algorithm  Generate an “adorned” program Program is rewritten to make the pattern of bound and free arguments in the query explicit; evaluating SameLevel with the first argument bound to a constant is quite different from evaluating it with the second argument bound This step was omitted for simplicity in previous slide  Add filters of the form “Magic_P” to each rule in the adorned program that defines a predicate P to restrict these rules  Define new rules to define the filter tables of the form Magic_P

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 10 Generating Adorned Rules  The adorned program for the query pattern SameLev bf, assuming a left-to-right order of rule evaluation : SameLev bf (S1,S2) :- Assembly(P1,S1,Q1), Assembly(P1,S2,Q2). SameLev bf (S1,S2) :- Assembly(P1,S1,Q1), SameLev bf (P1,P2), Assembly(P2,S2,Q2).  An argument of (a given body occurrence of) SameLev is b if it appears to the left in the body, or in a b arg of the head of the rule.  Assembly is not adorned because it is an explicitly stored table.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 11 Defining Magic Tables  After modifying each rule in the adorned program by adding filter “Magic” predicates, a rule for Magic_P is generated from each occurrence O of P in the body of such a rule: Delete everything to the right of O Add the prefix “Magic” and delete the free columns of O Move O, with these changes, into the head of the rule SameLev bf (S1,S2) :- Magic_SL bf (S1), Assembly(P1,S1,Q1), SameLev bf (P1,P2), Assembly(P2,S2,Q2). Magic_SL bf (P1) :- Magic_SL bf (S1), Assembly(P1,S1,Q1).

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 12 Nested Queries in SQL (No Recursion) SELECTE, Sal, Avg, Ecnt FROMemp(E, Sal, D, J), dinfo(D, Avg, Ecnt) WHEREJ = “Sr pgmer” dinfo(D, A, C) AS SELECTD, AVG(Sal), count(*) FROMemp GROUPBYD “Find senior programmers and their salary, and also average salary and headcount in their depts.”

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 13 Example – Datalog and Magic  Datalog Einfo(E, Sal, Avg, Ecnt) :- J=“Sr pgmer”, emp(E, Sal, D, J), dinfo(D, Avg, Ecnt). dinfo(D, A, C) :- …  MAGIC m_emp fffb (J) :- J=“Sr pgmer”. m_dinfo bff (D) :- { J = “Sr pgmer” }, m_emp fffb (J), emp(E, Sal, D, J). dinfo bff (D, A, C) :- m_dinfo bff (D), …

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 14 Magic 1.Identifies subqueries Idea: Use rules of the form: If is a (sub)query and also conditions hold, Then are also subqueries. 2.Restricts computation Idea: Modify the view definition by joining with the table of queries. This join acts as a “Filter”. 3. Classify queries Idea: Using “Adornments”, or “Query forms”. All queries of the form p bf (c,y)? are “MAGIC” tuples m_p bf (c). 4. Magic on subqueries :- …, p(x, y, z), q 1 (y, u), q 2 (z, y) … Can use magic for q 1 subqueries: m_q 1 (y) :- … p(x, y, z). q 1 (x, y) :- m_q 1 (x), … q 2 subqueries handled some other way. So, suitable for rule-based optimizer.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 15 Dealing with Subqueries – Other ways  CORRELATION When a subquery is generated, compute all answers, then continue. Not (gasp!) set-oriented. Current DB solution. (DB2 etc.)  PROLOG When a subquery is generated, compute one answer, then continue. May have to “BACKTRACK”.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 16 Example – Recursion, Duplicates SELECTP, S, count(*) FROMcontains(P, S) GROUPBY[P, S] Contains(p, s) AS (SELECTP, S FROMsubpart(P, S) ) UNION (SELECTP, S FROMsubpart(P, T), contains(T, S) ) “Find all subparts of a part along with a count of how often the subpart is used.”

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 17 Correlation SELECTEname FROMemp e1 WHEREJob = “Sr pgmer” AND Sal > (SELECTAVG(e2,Sal) FROMemp e2 WHEREe2.D = e1.D ) OUTER INNER For each senior programmer, the average salary of her/his department is computed. Not set-oriented. Possible redundancy.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 18 Decorrelation SELECTEname FROMemp, dep_avgsal WHEREJob = “Sr pgmer” AND Sal > AsalAND emp.D = dep_avgsal.D dep_avgsal AS SELECTD, AVG(Sal) FROMemp GROUPBYD Set-oriented, no redundancy. But…, irrelevant computation.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 19 Voila! Magic! msg(D) AS SELECTDISTINCTD FROMemp WHEREJob = “Sr pgmer” dep_avgsal(D, ASal)AS SELECT D, AVG(Sal) FROMmsg, emp WHEREmsg.D = emp.D GROUPYD SELECTEname FROMemp, dep_avgsal WHEREJob = “Sr pgmer” AND Sal > Asal AND emp.D = dep_avgsal.D

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 20 From Datalog to SQL Conditions X + Y >10 Grouping and aggregation Multisets If you don’t remove duplicates. That is a feature! Johntoy20 Joetoy30 Susancs50 Davidcs* avgsal = 25

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 21 Example- Conditions SELECTEname, Mgr FROMemp, dept WHEREJob = “Sr pgmer”AND Sal > 50000AND emp.D = dept.D Cast Magic m_emp fbfb (Sal, Job)AS Job = “Sr Pgmer”AND Sal > What really happens? σ job=“Sr Pgmer” emp Sal>50000 ⋈ D=D dept “Grounding”

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 22 Datalog to SQL: A summary Conditions Magic transformation is followed by some “GROUNDING” steps. Multisets (Duplicates) Semantics # copies of a tuple = # of derivations Operationally Just skip duplicate checks Magic All “magic” tables are DISTINCT Groupby, Aggregates Must check if restrictions (selections, conditions) can be “pushed down” With recursion, may need stratification.

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 23 Comparing Magic and Correlation We must consider three factors: 1.Binding propagation 2.Repeated work (duplicates) 3.Set-Orientation Correlation*Magic √√ √ √ × (√) ×

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 24 Experiments Experiments run on DB V2R2 DBMS. Benchmark DB itm34170,00041,850 wkc Itl782,550, ,980 itp43339,440144,250 Table Tuple Size#Tuples #Column #4k Pages

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 25 Results Experiment 1 Binding propagation, no duplicates, set-orientation not significant. Query TimeI/O Original100 Correlated Magic Experiment 2 Binding set contains duplicates (~100), set- orientation not significant. Query TimeI/O Original100 Correlated Magic

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 26 Results – Cont. Experiment 3 Binding set has some duplicates, set-orientation is significant. (Bindings on non- index column) Query TimeI/O Original100 Correlated Magic5546 Experiment 4 Variant of experiments 3 with more expensive subquery. (10 binding). Query TimeI/O Original100 Correlated Magic Original100 Correlated Magic bindings 100 bindings

CS 286, UC Berkeley, Spring 2007, R. Ramakrishnan 27 Conclusions  Magic is: Applicable to full SQL. Suitable for rule-based optimization. Efficient. Stable. Parallelizable.