Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

CS4432: Database Systems II
Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Lecture 10 Query Optimization II Automatic Database Design.
Query Optimization Goal: Declarative SQL query
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Query Processing (overview)
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Chapter 19 Query Processing and Optimization
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Access Path Selection in a Relation Database Management System (summarized in section 2)
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources Mary Tork Roth Peter Schwarz IBM Almaden.
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Loading a Cache with Query Results Laura Haas, IBM Almaden Donald Kossmann, Univ. Passau Ioana Ursu, IBM Almaden.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
A Query Translation Scheme for Rapid Implementation of Wrappers Presented By Preetham Swaminathan 03/22/2007 Yannis Papakonstantinou, Ashish Gupta, Hector.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Programming using C# Joins SQL Injection Stored Procedures
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
The Forest and the Trees Julia Stoyanovich Candidacy Exam in Database Systems Fall 2005.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 4: SQL Complex Queries Complex Queries Views Views Modification of the Database Modification of the Database Joined Relations Joined Relations.
Information Integration By Neel Bavishi. Mediator Introduction A mediator supports a virtual view or collection of views that integrates several sources.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Chapter 4 An Introduction to SQL. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.4-2 Topics in this Chapter SQL: History and Overview The.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Object storage and object interoperability
Relational Algebra p BIT DBMS II.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
The Design of an Acquisitional Query Processor For Sensor Networks Samuel Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong Presentation.
DATABASE OPERATORS AND SOLID STATE DRIVES Geetali Tyagi ( ) Mahima Malik ( ) Shrey Gupta ( ) Vedanshi Kataria ( )
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Chapter 4 An Introduction to SQL.
15.1 – Introduction to physical-Query-plan operators
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Join Processing in Database Systems with Large Main Memories (part 2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Contents Preface I Introduction Lesson Objectives I-2
Overview of Query Evaluation
Relational Query Optimization
Presentation transcript:

Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari

Overview Problem Definition Architectural View of Garlic Query Plan Generation Query Optimization Conclusions

Trying to solve The use of middleware systems. Optimize queries over sources with varying query processing capabilities. Use of cost based model. Implementation of garlic approach.

Cost-based (Disco, Garlic,... ) Quality-based (Object Globe, HiQIQ, … ) Adaptive query optimization (Telegraph, Tukwila, … ) Capability-based (Tsimmis, Infomaster )

Optimizing Queries Reasons to optimize People don’t like to wait, they want the programs to be fast. The response time should be smaller. Even if you use a faster server, this is proved wrong.

How to optimize? Filter as much as possible Where clause is most important in your query. Never write “select * “ specify the correct fields you want to know. Join the two tables by using all keys that are related to the tables.

How do they do Query Optimization? Processing costs (estimated from cost model of CPU, I/O) Communication costs (estimated using constants in catalog) Cost to initiate sub queries & methods (estimated using constants in catalog) Wrapper costs (estimated by wrapper) Plans are pruned upon enumeration Plan A not used as building block for more complex plan if cheaper alternatives available Plans with unique properties are not pruned

How to optimize? (cont..) Example: Query : Select * From Employees In Program : Add a filter on Dept or use command : if Dept = R&D Corrected : Select Name, Salary From Employees Where Dept = R&D For i = 1 to 2000 Call Query : Select salary From Employees Where EmpID = Parameter(i) Corrected: Select salary From Employees Where EmpID >= 1 and EmpID <= 2000

Garlic Architecture Wrapper acts as a interface between query services and data sources. Catalog contains local/global schemas.

Query services contains the query language processor and distributed query execution engine. Query language processor generatesexecution plan based on input. Query execution engine passes sub-queries to wrappers and assembles final result. Assembly may include performing joins, applying predicates, sorting, aggregates

What do wrappers do? Wrappers can wrap various types of data sources. Garlic wrappers are specific to Garlic provides interface to data source using Garlic’s internal protocols. Data described in an OO model, methods can be applied on data. Data source notifies wrapper of capabilities using rules. Wrapper does not have to reflect full query functionality of data source.

What are STARS? STARs = STrategy Alternative Rules Rules are high-level, declarative, compact specification of legal alternatives STARs define high-level constructs from low level database operators or other STARs. JoinRoot(T1,T2,p)={ Permuted Join(T1,T2,P) Permuted Join(T2,T1,P)

How are plans constructed? Tuples are operated upon by POPs (Plan Operators) A POP generally corresponds to one executable operator POPs include: join, sort, filter, fetch, temp, scan, pushdown (work to be performed by source) POPs have properties that describes the specifics of the operations. Source property records where output stream comes from (needed?)

Example Push Down POP performs operations on the data source Data sources only return OID Wrappers take Push Downs and performs them on sources by translation into query or API calls Source property shows where execution occurs Properties of POPs are functions of parent POP (I.e. predicates) Additional properties: cost, card

What do stars do? Stars can be viewed as a grammar with some set of rules. A Star determines how POPs can be combined in a plan. Here f1, f2 … are the name of the star or POPs.

Stars (cont…) A star can retrieve columns that are needed by another star. Access root STAR to create plans

How is plan enumeration performed? Access Root STAR is used to create plans to select all attributes used in query (no real variability in plans, performs a Push Down). Join Root STAR is used to create plans to perform joins. Finish Root STAR is used to include any missing parts of the query (i.e. projections, ordering) Pruning for query optimization performed throughout to minimize number of plans to enumerate.

How are data source capabilities determined? Wrapper implements STARs that describe the capability of each data source. STARs follow POP structure mentioned previously Simple STARs can model basic capabilities of data sources Complex capability is arguably not needed as Garlic’s query engine can make up for it. Wrapper can iteratively add STARs to: – Introduce source quickly into mediated schema – Improve performance

Modeling wrappers using stars University Offers a course, course description and online complaint mechanisms. That is (relational, text, mail) The mail has a sender, date, body and subject

Modeling wrapper using STARs The class objects have attributes like courses and professor.

Modeling Wrappers using STARs

Disco optimizer

Tsimmis

Future Trends

Any Questions? Thank you