An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server

Slides:



Advertisements
Similar presentations
Tuning in Relational Systems 2012/06/04. Index The performance of queries largely depends upon what indexes or hashing scheme exist. – Efficiency of queries.
Advertisements

Creating an Extensible Table in Biotics A Brief Demo by Carol Fogelsong April 3, 2007.
A question from last class: construct the predictive parsing table for this grammar: S->i E t S e S | i E t S | a E -> B.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
Automated Selection of Materialized Views and Indexes for SQL Databases SANJAY AGRAWAL SURAJIT CHAUDHURI VIVEK NARASAYYA HASAN KUMAR REDDY A ( )
A Robust, Optimization-Based Approach for Approximate Answering of Aggregate Queries By : Surajid Chaudhuri Gautam Das Vivek Narasayya Presented by :Sayed.
1 Primitives for Workload Summarization and Implications for SQL Prasanna Ganesan* Stanford University Surajit Chaudhuri Vivek Narasayya Microsoft Research.
Self-Tuning and Self-Configuring Systems Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems March 16, 2005.
1 CS 430 / INFO 430 Information Retrieval Lecture 8 Query Refinement: Relevance Feedback Information Filtering.
Choosing an Order for Joins Sean Gilpin ID: 119 CS 257 Section 1.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
On Random Sampling over Joins Surajit Chaudhuri Rajeeve Motwani Vivek Narasayya Microsoft Research Stanford University Microsoft Research.
Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.
M ODULE 5 Metadata, Tools, and Data Warehousing Section 4 Data Warehouse Administration 1 ITEC 450.
Objectives of the Lecture :
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
CS 345: Topics in Data Warehousing Thursday, November 4, 2004.
1 Auto administration of databases based on clustering Mujiba Zaman Jyotsna Surabattula Le Gruenwald School of Computer Science The University of Oklahoma.
Winrunner Usage - Best Practices S.A.Christopher.
Access Path Selection in a Relational Database Management System Selinger et al.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
1 DBS201: More on SQL Lecture 3. 2 Agenda How to use SQL to update table definitions How to update data in a table How to join tables together.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Optimization Problems In which a set of choices must be made in order to arrive at an optimal (min/max) solution, subject to some constraints. (There may.
Performance Problems You Can Fix: A Dynamic Analysis of Memoization Opportunities Luca Della Toffola – ETH Zurich Michael Pradel – TU Darmstadt Thomas.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.
Heuristic Search Planners. 2 USC INFORMATION SCIENCES INSTITUTE Planning as heuristic search Use standard search techniques, e.g. A*, best-first, hill-climbing.
Select Complex Queries Database Management Fundamentals LESSON 3.1b.
IFS180 Intro. to Data Management Chapter 10 - Unions.
SQL IMPLEMENTATION & ADMINISTRATION Indexing & Views.
Chapter 14: Query Optimization
Practical Database Design and Tuning
Decision Table Testing
Prepared by : Moshira M. Ali CS490 Coordinator Arab Open University
SQL Creating and Managing Tables
Efficient Join Query Evaluation in a Parallel Database System
Challenges in Creating an Automated Protein Structure Metaserver
Data Virtualization Tutorial… Semijoin Optimization
Merge Sort 7/29/ :21 PM The Greedy Method The Greedy Method.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
SQL Creating and Managing Tables
Automatic Physical Design Tuning: Workload as a Sequence
SQL Creating and Managing Tables
JULIE McLAIN-HARPER LINKEDIN: JM HARPER
Merge Sort 11/28/2018 2:18 AM The Greedy Method The Greedy Method.
Merge Sort 11/28/2018 8:16 AM The Greedy Method The Greedy Method.
Practical Database Design and Tuning
Targeting Wait Statistics with Extended Events
Merge Sort 1/17/2019 3:11 AM The Greedy Method The Greedy Method.
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Presented By: Darlene Banta
A Framework for Testing Query Transformation Rules
Self-organizing Tuple Reconstruction in Column-stores
Merge Sort 5/2/2019 7:53 PM The Greedy Method The Greedy Method.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Implementation of Learning Systems
Probabilistic Ranking of Database Query Results
Presented by: Mariam John CSE /14/2006
Chapter 8 Views and Indexes
Presentation transcript:

An Efficient, Cost-Driven Index Selection Tool for MS-SQL Server Surajit Chaudhuri, Vivek Narasayya Presented by Robert Chen

Motivation Automate the choice of indexes in the physical design of SQL database input: a workload of SQL queries suggest a set of suitable indexes achieve performance competitive with that of systems cares by administrators.

Basic approaches Textbook solutions Expert system approach take semantic information, produce a design ignore workload info. Expert system approach knowledge are encoded as rules disconnect from the query optimizer Cost-driven approach use optimizer’s cost estimates to compare goodness of hypothetical designs

Efficiency Want to reduce Reduce Cost Evaluation module The number of indexes considered The number of configurations enumerated Index Selection Tool iteratively picks more complex index structures Reduce total number of optimizer invocation Cost Evaluation module Queries Cost Config

Architecture of Index Selection Tool Workload What-if Index Creation Candidate index selection SQL Server Configuration Enumeration Cost Evaluation Multi-column Index Generation Final indexes

Starting Point admissible index batch invocation on the optimizer indexable columns: columns that appear in the where clause of a query. Index can be multi-column. Starting points for the index selection tool batch invocation on the optimizer invocation is expensive since it requires communication across process boundaries.

Cost Evaluation efficiency goal: reduce the number of optimizer calls, in addition to batching. Atomic Configurations: a configuration C is automatic if a query uses all indexes in C. Derive cost of a Configuration from Atomic configuration: Cost(Q, C) = Mini{ Cost(Q, Ci)} (for select) Cost(Q, C) = T + (for insert/delete, it’s an overestimation)

Identifying Atomic Config Reduce the number of atomic configurations limit the number of indexes per table(j) limit the number of tables per configuration(t) (2, 2), called single-join atomic configurations two tier Reduce the cost of evaluation Relevant index set optimization Cost(Q, C) = Cost(Q, C’), C’ is a subset of C that only consists of indexable columns.

Relevant Atomic Configurations Decide the atomic configs to evaluate (adaptive Detection) 1. N = 2, A = {atomic configuration of size <=2} 2. Evaluate everything in A. A’ = {} 3. For each C in A, tests if the evaluated cost of C differs significantly from the derived cost. If so add all atomic configurations of size N+1 that are supersets of C to A’ 4. If A’ = {}, exit else A = A’, N = N+1, goto 2.

Candidate Index Selection Goal: Pick the set of indexes from admissible indexes Approach:Determine the best configuration for each query, the candidate index set is just the union of all such best configurations. Determine the best configuration for each query? Enumerate(Ii, Wi)

Configuration Enumeration The Problem: Given an index set (I) and a Workload(W), determine the best subset of I with K or less indexes. Enumerate(K, I, W) Greedy(M, K) In our system, the value K is constrained by the single-join atomic configuration pruning.

Greedy(M, k) let S be the best M index configuration using naïve enumeration. If M= k exit while(|S| < k){ Pick a new index I s.t. Cost(S {I}) < = Cost(S {I’}) for all I’ <> I; if(Cost(S {I}) >= Cost(S) ) exit else S = S {I}; }

Multi-Column Index Generation Goal: Choose a set of admissible multi(two) column indexes M(a, b) MC-LEAD: leading column (a) should be from the output of the configurations enumeration step. MC-ALL: Both columns (a, b) are important

Putting it all together Candidate index set = admissible index set; Repeat{ index-selection; Greedy(2, k), subject to (2, 2) constraints; select multi-column indexes using MC-LEAD; Candidate index set = Union of the above 2 steps; }until( no considerable improvements are made)

Performance It works Overall running time of algorithms improves by a factor of 4 to 10 over baseline algorithms Drop in quality is small, below 10%.

Summary 3 “novel” techniques: remove spurious indexes from consideration by taking into account syntax and cost info an iterative approach to handle multi-column indexes reduce the number of atomic configurations (thus the number of optimizer calls) that must be evaluated for a workload.