Advanced Topics in Database Systems Cost-Based Optimization of Decision Support Queries using Transient - Views Subbu N. Subramanian IBM Santa Teresa Labs.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Indexing DNA Sequences Using q-Grams
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CS CS4432: Database Systems II Logical Plan Rewriting.
Efficient IR-Style Keyword Search over Relational Databases Vagelis Hristidis University of California, San Diego Luis Gravano Columbia University Yannis.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Chapter 14 An Overview of Query Optimization. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Figure 14.1 Typical architecture for.
Evaluating Hypotheses
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Processing & Optimization
CIS607, Fall 2005 Semantic Information Integration Article Name: Clio Grows Up: From Research Prototype to Industrial Tool Name: DH(Dong Hwi) kwak Date:
Presenter: Miguel Garzon Torres CrUise Lab - SITE SQL Coverage Measurement for Testing Database Applications María José Suárez-Cabal University of Oviedo.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Optimization CS 157B Ch. 14 Mien Siao. Outline Introduction Steps in Cost-based query optimization- Query Flow Projection Example Query Interaction.
Query Processing Presented by Aung S. Win.
Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.
Topic #10: Optimization EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
10/1/2015© Hal Perkins & UW CSEG-1 CSE P 501 – Compilers Intermediate Representations Hal Perkins Autumn 2009.
Simpson Rule For Integration.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Database Management 9. course. Execution of queries.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Functional Programming Universitatea Politehnica Bucuresti Adina Magda Florea
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
Join Synopses for Approximate Query Answering Swarup Achrya Philip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented by Bhushan Pachpande.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Numerical Methods Part: Simpson Rule For Integration.
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle, Nicole Schneider, Thomas Seidl RWTH Aachen University,
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
The Volcano Optimizer Generator Extensibility and Efficient Search.
1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
Dec. 13, 2002 WISE2002 Processing XML View Queries Including User-defined Foreign Functions on Relational Databases Yoshiharu Ishikawa Jun Kawada Hiroyuki.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
CSCI Query Processing1 QUERY PROCESSING & OPTIMIZATION Dr. Awad Khalil Computer Science Department AUC.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Chapter 13: Query Processing
Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.
Query Processing and Optimization, and Database Tuning
Capability-Sensitive Query Processing on Internet Sources
Prepared by : Ankit Patel (226)
Chapter 15 QUERY EXECUTION.
Front End vs Back End of a Compilers
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query Optimization CS 157B Ch. 14 Mien Siao.
Interval Partitioning of a Flow Graph
Query Processing CSD305 Advanced Databases.
A Framework for Testing Query Transformation Rules
Query Optimization.
Translating Imperative Code into SQL
Presentation transcript:

Advanced Topics in Database Systems Cost-Based Optimization of Decision Support Queries using Transient - Views Subbu N. Subramanian IBM Santa Teresa Labs San Jose, CA Shivakumar Venkataraman IBM Santa Teresa Labs San Jose, CA 95141

Introduction Decision Support Applications-Queries require/handle 1.Huge amounts of Data 2.Integrate data from multiple heterogenous data sources 3.Data Sources spread over a wide geographical area 4.redundancies such as repeated access of same data source and multiple execution of similar processing sequences Purpose Of This Paper 1.Propose an architecture for processing complex decision support queries involving multiple,heterogeneous data sources. 2.Introduce the notion of transient-views. 3.Develop a cost-based algorithm that takes a query plan as input and generates an optimal "covering plan", by minimizing redundancies in the original plan. 4.Validate our approach and a detailed performance study based on TPC-D benchmark queries on a commercial database system. 5.Compare and contrast our approach with work in related areas, in particular, the areas of answering queries using views and optimization using common sub-expressions.

Motivating Example List technology stocks owned by computer engineers (portfolio local RDMS) that have a, higher average sales volume over the past year, than the maximum sales volume any oil stock (stock exchange remote DBMS) owned by a chemical engineer reached in the first six months of the year; also list the name of the computer engineer. DB2:Rinvest(name, prof, ticker, qty, date, price) Web: Wstock(ticker, cat, date, vol, endprice) SELECT Rinvest.name, Rinvest.ticker FROM Rinvest, Wstock WHERE Wstock.cat = 'Tech' AND Rinvest.Profession= 'Computer Engineer' AND Wstock.date <= '12/31/97' AND Wstock.date >= '01/01/97' AND Rinvest.ticker = Wstock.ticker GROUP BY Rinvest.name,Rinvest.ticker,Wstock.ticker HAVING AVG(Wstock.vol) > ( SELECT MAX(Wstock.vol) FROM Wstock, Rinvest WHERE Rinvest.cat = 'Oil‘ AND Rinvest.Profession='Chemical Engineer' AND Rinvest.ticker=Wstock.ticker AND Wstock.date <= '06/30/97' AND Wstock.date >= '01/01/97' ) Typically, these complex queries contain sub-queries that share computational steps. Analysis reveals that there is redundancy in computation even in simple queries Currently available database query optimizers do not have the wherewithal to identify these redundancies so that the results of one execution could be used for processing other segments of the query. Since decision support queries are typically time consuming to run, especially in a HDBS setting, identifying and sharing computational results judiciously, could lead to significant improvements in performance.

Transient Views  Materialized views that exist only in the context of execution of a query.  An encapsulation of similar execution steps combined into one.  Analyzing the query plan generated by an optimizer, for 'equivalent' subparts in the plan  Operators Used SCAN JOIN GROUP BY UNION RQUERY (Remote Query) Related Work ?

Architecture Parser: The query from the user is first parsed, semantically analyzed, and translated into a chosen internal representation that is referred to as the Query Graph Model (QGM). Query Rewrite: The output of the parser is transformed by a set of rewrite rules [PHH92]. These rewrite rules are usually heuristics aiding in transforming the query representation into a better form in order for the the query optimizer to search a larger space for an optimal plan. Query Optimizer: The optimizer uses the rewritten query as input to generate query plans based on the capabilities of the remote data source. Code Generation: Code is generated for those portions of the query that needs to be executed locally as well as remotely.

Equivalence Algorithm - Notation Plan tree: The plan tree corresponding to a query Q is a directed acyclic graph (DAG) whose internal nodes correspond to the plan operators (introduced below) and the leaf nodes correspond to the relations at the data source.The plan "tree" could be a DAG since it could have common sub expressions.The algebraic expression computed by the root node of the tree is equivalent to Q. Plan Properties: Associated with a plan is a set of properties that provide meta-information on the work done by the plan. The properties information summarizes the various tasks underlying the plan and its cost. Plan Operators: The plan operators are an encapsulated representation of the fundamental tasks performed by the query engine

Example Display the investor names and their investment value in technology stocks and oil stocks as of '11/03/97' for those investors for whom the average trading volumes of the technology and oil stocks in their portfolio was higher than the market trading volume of oil and technology stocks. Consider only those stocks in the portfolio that were bought prior to '10/01/97'.

Equivalence Of Subplans Definition 1: Let D be a database, P be a plan, and P(D) be the result obtained by applying P on D. We say plans Pi and Pz are equivalent if there exists a plan P, and expressions El and E2 that represent a sequence of application of the operators such that, for any database D, Pi(D) and Pz(D) can be obtained from P(D) by applying El and E2 respectively. P is called a covering plan of Pi and pz. Definition 2 Let m and n be two nodes of a plan tree T. We define a relation ~ on nodes of T as follows, m ~ n if, case m and n are: 1. Tables - m and n should be identical; 2. Plan Operators - their properties should agree as followed

Equivalence Of Subplans (2)

Equivalence Of Subplans (3) Equivalence Algorithm !

Cost Considerations Materializing the transient view involves two major steps: (1)Make use of the property information associated with each subplan in a equivalence class to generate an (SQL) query that defines the transient view corresponding to the equivalence class (2) run the query through the optimizer to generate the (optimized) covering plan as well as the cost for materializing the transient-view. Below, we explain both these steps in detail.

Cost Considerations (2) SELECT Colsview, AFview (EXP) FROM Tview WHERE JPview AND SPview GROUP BY GCview The cost and cardinality information for materializing the transient-view is obtained by passing the above query through the optimizer module of the query processing engine. SELECT Colsi, AFi(Exp) FROM TRANSIENT- VIEW WHERE SPi GROUP BY GCi

Equivalence Class Pruning

Final Plan

TBCD Benchmark

Query Rewrite Vs Transient-View