Query Optimization for Semistructured Data Jason McHug, Jennifer Widom Stanford University - Rajendra S. Thapa.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Algorithms + L. Grewe.
TIMBER A Native XML Database Xiali He The Overview of the TIMBER System in University of Michigan.
Relational Databases for Querying XML Documents: Limitations & Opportunities VLDB`99 Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., Naughton,
Michael Armbrust A Functional Query Optimization Framework.
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
From Semistructured Data to XML: Migrating The Lore Data Model and Query Language Roy Goldman, Jason McHugh, Jennifer Widom Stanford University
Module 13: Optimizing Query Performance. Overview Introduction to the Query Optimizer Obtaining Execution Plan Information Using an Index to Cover a Query.
Indexing Semistructured Data J. McHugh, J. Widom, S. Abiteboul, Q. Luo, and A. Rajaraman Stanford University January 1998
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Database Systems and XML David Wu CS 632 April 23, 2001.
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Lore: A Database Management System for Semistructured Data.
Query Processing & Optimization
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
T HE Q UERY C OMPILER Prepared by : Ankit Patel (226)
Algorithmics and Applications of Tree and Graph Searching Dennis Shasha, Jason T. L. Wang, and Rosalba Giugno Presenters: Jerod Watson & Christan Grant.
Access Path Selection in a Relation Database Management System (summarized in section 2)
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Access Path Selection in a Relational Database Management System Selinger et al.
Module 12: Optimizing Query Performance. Overview Introducing the Query Optimizer Tuning Performance Using SQL Utilities Using an Index to Cover a Query.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
CMSC424: Database Design Instructor: Amol Deshpande
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Database Systems Part VII: XML Querying Software School of Hunan University
The Forest and the Trees Julia Stoyanovich Candidacy Exam in Database Systems Fall 2005.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
The Volcano Optimizer Generator Extensibility and Efficient Search.
Query optimization in distributed database systems.
Lore: A Database Management System for Semistructured Data.
Lore: A Database Management System for Semi-structured Data Jason McHugh, Serge Abiteboul, Roy Goldman, Dallan Quass, Jennifer Widom Stanford University.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
QUERY PROCESSING RELATIONAL DATABASE KUSUMA AYU LAKSITOWENING
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13: Query Processing
CS4432: Database Systems II Query Processing- Part 1 1.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Query Optimization Heuristic Optimization
UNIT 11 Query Optimization
Database Management System
Prepared by : Ankit Patel (226)
Choosing Access Path The basic methods.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
UNIT V Run Time Environments.
Chapter 12 Query Processing (1)
Chapter 11 Database Performance Tuning and Query Optimization
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Presentation transcript:

Query Optimization for Semistructured Data Jason McHug, Jennifer Widom Stanford University - Rajendra S. Thapa

………..Road Map Lore System Query Execution Engine Statistic and cost model Performance Results

Lore Data Model - OEM

Data Guide

Path Expression Simple Path Expression –specifies a single-step navigating in the database DBGroup.member y –denotes variable y ranges all member -labeled sub- objects of the object assigned to x Path Expression –ordered list of simple path expressions DBGroup.Member x, x.Age y -variable y ranges over all objects that can be reached by starting with the DBGroup object, following an edge labeled Member, then following an edge labeled Age.

Query language Query: SELECT x FROM DBGroup.Member x WHERE exists y in x.Age: y<30 Smith 28 Gates 252 CIS 411 Result:

Lore architecture

Textual Interface Data Engine Query Processing Parsing Preprocessor Logical Query Plan Generation Query Optimization Physical Query Plan Generation Execution of Physical Query Plan

Queries can be executed in many ways Top down Bottom Up Hybrid SELECT x FROM DBGroup.Member x WHERE exists y in x.Age: y<30

CC DBD A Top-down preferred Select x from A.B x where exists y in x.C: y = 5 Query top down would explore only this path - only one path A.B.C bottom-up would visit all leaf objects with value 5 and their parents 555 C

CCC BBB A Bottom-up preferred Many A.B.C paths But only a leaf satisfying the predicate bottom-up is a good candidate 544 Select x from A.B x where exists y in x.C: y = 5 Query

CCC BBB A Hybrid preferred 544 B B D D Select x from A.B x where exists y in x.C: y = 5 Query

Query Execution Engine Logical Query Plans -logical query plan operators - structure of the plan Physical Query Plans -operators - some physical plans Statistics and Cost Model Plan Enumeration

Query Execution Engine Logical operators Discover Chain Glue Create Temp Project --- Logical Query plans Variable binding a variable x in the query is said to be bound if object o has been assigned to x Evaluation an evaluation of a query plan (or sub-plan) is a list of all variables appearing in the plan along with the object(if any) bound to each variable. Rotation

Chain Discover(x,”B”,y) Discover(z,”D”,v) Discover(y,”C”,z) Representation of a Path expression in the logical query plan x.B y, y.C z, z.D v

CreatTemp(x,t2) Select(y,<30)Exists(y) Discover(t1,”Member”,x)Name(“DBGroup”,t1) Glue Chain Project(t2) Discover(x,”Age”,y) Complete logical query plan SELECT x FROM DBGroup.Member x WHERE exists y in x.Age: y<30

Query Execution Engine Operators Scan(x, l, y) Lindex(x, l, y) Pindex( Path Expression, x) Bindex(l, x, y) Name(x, n) Vindex(Op, Value, l, x) --- Physical Query plans l l l c b a y = {a, b, c} x

Some physical plans for a simple logical Query Plan Discover (A,”B”,x) Discover (x,”C”,y) Chain Logical Query Plan A.B x, x.C y

physical plans Scan (A,”B”,x) Scan (x,”C”,y) NLJ Scan Plan Lindex (x,”C”,y) Name (t, A) NLJ Lindex Plan Lindex (t,”B”,x) A.B x, x.C y

more physical plans... A.B x, x.C y Name (t, A) Scan (x,”C”,y) NLJ Bindex Plan Bindex (t,”B”,x) Pindex(“A.B x, x.C y”, y) Pindex Plan

how physical plans are produced. Each logical plan node creates an optimal physical plan given a set of bound variable. During plan enumeration we track 1. Whether the variable is bound or not 2. Which plan operator has bound the variable 3. All other plan operators that use the variable 4. Whether the variable is stored within a temporary result.

how physical plans are produced. SELECT x FROM DBGroup.Member x WHERE exists y in x.Age: y<30 Logical plan

possible physical plans Fig. (a) Logical plan

possible physical plans fig. (c) Logical plan Physical plans

more physical plan…. Fig. (d) Logical plan

Statistic and Cost Model Each physical plan is assigned a cost based on the estimated I/O and CPU time required to execute a plan. The costing procedure is recursive. I/O first then CPU time to decide the cheaper plan.

Performance Result A simple query SELECT DBGroup.Movie.Title - 11 different query plans - * t he best plan uses Lore’s path index to quickly locate all the movie titles - second plan is top-down strategy - the worst plan uses Bindex operators and hash joins Experiment 1

Performance Result Same query with a Genere subobject having value ‘ Comedy ’ - point query Experiment 2

Performance Result Experiment 3 - Same point query - all possible plans are not executed - different plans were generated or disallowing the use of particular operator or indexes.

Performance Result Experiment 4 Query selects movies with certain quality rating.

…….future Work Optimization techniques for branching path expression –a query rewrite that moves Where clause predicates into the From clause and a transformation that introduces a Group-by clause when a large number of paths pass through a small number of objects. Partially correlated sub-plans –similar to correlated subqueries but rely on the bindings passed between portions of the physical query plan rather than on the query itself. In the area of statistic –efficient statistics-gathering algorithms –statistic about the location of objects on disk –modification to the cost formulas to generate more accurate cost estimates