Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise.

Slides:



Advertisements
Similar presentations
On-line Index Selection for Physical Database Tuning
Advertisements

CrowdER - Crowdsourcing Entity Resolution
Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.
Hopkins Storage Systems Lab, Department of Computer Science Automated Physical Design in Database Caches T. Malik, X. Wang, R. Burns Johns Hopkins University.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
University of Minnesota 1 Exploiting Page-Level Upper Bound (PLUB) for Multi-Type Nearest Neighbor (MTNN) Queries Xiaobin Ma Advisor: Shashi Shekhar Dec,
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Fast Algorithms For Hierarchical Range Histogram Constructions
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1/44 A simple Test For the Consecutive Ones Property.
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
1 Heuristic Search Chapter 4. 2 Outline Heuristic function Greedy Best-first search Admissible heuristic and A* Properties of A* Algorithm IDA*
Trust and Profit Sensitive Ranking for Web Databases and On-line Advertisements Raju Balakrishnan (Arizona State University)
A Fairy Tale of Greedy Algorithms Yuli Ye Joint work with Allan Borodin, University of Toronto.
Train DEPOT PROBLEM USING PERMUTATION GRAPHS
Outline SQL Server Optimizer  Enumeration architecture  Search space: flexibility/extensibility  Cost and statistics Automatic Physical Tuning  Database.
An Efficient Cost-Driven Selection Tool for Microsoft SQL Server Surajit ChaudhuriVivek Narasayya Indian Institute of Technology Bombay CS632 Course seminar.
CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads Debabrata Dash, Anastasia Ailamaki, Neoklis Polyzotis 1.
Planning under Uncertainty
The Experience Factory May 2004 Leonardo Vaccaro.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Karl Schnaitter and Neoklis Polyzotis (UC Santa Cruz) Serge Abiteboul (INRIA and University of Paris 11) Tova Milo (University of Tel Aviv) Automatic Index.
Graph-Based Synopses for Relational Selectivity Estimation Joshua Spiegel and Neoklis Polyzotis University of California, Santa Cruz.
Decision Tree Algorithm
MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
WiOpt’04: Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks March 24-26, 2004, University of Cambridge, UK Session 2 : Energy Management.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Processing Rate Optimization by Sequential System Floorplanning Jia Wang 1, Ping-Chih Wu 2, and Hai Zhou 1 1 Electrical Engineering & Computer Science.
Depth Estimation for Ranking Query Optimization Karl Schnaitter, UC Santa Cruz Joshua Spiegel, BEA Systems, Inc. Neoklis Polyzotis, UC Santa Cruz.
Mariam Salloum (YP.com) Xin Luna Dong (Google) Divesh Srivastava (AT&T Research) Vassilis J. Tsotras (UC Riverside) 1 Online Ordering of Overlapping Data.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Evaluating the Vulnerability of Network Traffic Using Joint Security and Routing Analysis Patrick Tague, David Slater, and Radha Poovendran Network Security.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Mehdi Kargar Aijun An York University, Toronto, Canada Discovering Top-k Teams of Experts with/without a Leader in Social Networks.
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
Solving the Concave Cost Supply Scheduling Problem Xia Wang, Univ. of Maryland Bruce Golden, Univ. of Maryland Edward Wasil, American Univ. Presented at.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Energy-Aware Scheduling with Quality of Surveillance Guarantee in Wireless Sensor Networks Jaehoon Jeong, Sarah Sharafkandi and David H.C. Du Dept. of.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.
2006/3/211 Multiple Aggregations over Data Stream Rui Zhang, Nick Koudas, Beng Chin Ooi Divesh Srivastava SIGMOD 2005.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
To Tune or not to Tune? A Lightweight Physical Design Alerter Nico Bruno, Surajit Chaudhuri DMX Group, Microsoft Research VLDB’06.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Randomized Kinodynamics Planning Steven M. LaVelle and James J
Dominance and Indifference in Airline Planning Decisions NEXTOR Conference: INFORMS Aviation Session June 2 – 5, 2003 Amy Mainville Cohn, KoMing Liu, and.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
Models of Greedy Algorithms for Graph Problems Sashka Davis, UCSD Russell Impagliazzo, UCSD SIAM SODA 2004.
Tuesday, March 19 The Network Simplex Method for Solving the Minimum Cost Flow Problem Handouts: Lecture Notes Warning: there is a lot to the network.
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
A novel, low-latency algorithm for multiple group-by query optimization Duy-Hung Phan Pietro Michiardi ICDE16.
An Efficient Algorithm for Incremental Update of Concept space
Abolfazl Asudeh Azade Nazi Nan Zhang Gautam DaS
Haim Kaplan and Uri Zwick
Result of Ontology Alignment with RiMOM at OAEI’06
Effective Social Network Quarantine with Minimal Isolation Costs
Presented by: Prof. Ali Jaoua
EA C461 – Artificial Intelligence
Presentation transcript:

Index Interactions in Physical Design Tuning Modeling, Analysis, and Applications Karl Schnaitter, UC Santa Cruz Neoklis Polyzotis, UC Santa Cruz Lise Getoor, Univ. of Maryland VLDB 2009, Lyon, France

2 University of California, Santa Cruz Index Selection Index selection problem: –Given a query workload –Choose indices that improve workload performance Does index benefit depend on other indices? –If so, this is called index interaction Index “benefit” is a key concept –Informally, for an index i, [benefit of i] = [exec cost without i] – [exec cost with i]

3 University of California, Santa Cruz Related Work Interactions are a key concern in physical tuning –[Whang et al. 1981] make assumptions implying that indices on different tables do not interact –[Finklestein et al. 1988] assume that indices do not interact if they are relevant to separate queries –[Bruno and Chaudhuri 2007] explicitly account for some interactions in on-line index selection –Many more… These studies treat interactions as a secondary issue, and often rely on ad hoc assumptions

4 University of California, Santa Cruz Index Interactions Let S be a set of indices relevant to a query Q cost(X) cost(X  {a}) benefit({a}, X) cost(X  {b}) cost(X  {a,b})benefit({a}, X  {b}) Indices a,b are independent with respect to X

5 University of California, Santa Cruz Index Interactions cost(X) cost(X  {a}) benefit({a}, X) cost(X  {b}) cost(X  {a,b})benefit({a}, X  {b}) Indices a,b positively interact with respect to X Let S be a set of indices relevant to a query Q

6 University of California, Santa Cruz Index Interactions cost(X) cost(X  {a}) benefit({a}, X) cost(X  {b}) cost(X  {a,b})benefit({a}, X  {b}) Indices a,b negatively interact with respect to X Let S be a set of indices relevant to a query Q

7 University of California, Santa Cruz = degree of interaction between a,b with respect to X = Degree of Interaction =

8 University of California, Santa Cruz Problem Statement Which indices in S interact? How strong are the interactions? The Degree of Interaction Problem:

9 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information

10 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information

11 University of California, Santa Cruz Query Optimization Computing doi(a,b) is not practical if the optimizer is totally arbitrary –Need to compute In practice, query optimization is not arbitrary –E.g., we expect We put mild assumptions on query optimization: –Plans are selected from some fixed space P –Optimizer chooses the cheapest feasible plan from P –Ties are broken consistently

12 University of California, Santa Cruz Index Benefit Graph An Index Benefit Graph (IBG) encodes the selection of optimal plans for a query –Introduced by [Frank, Omiecinski, and Navathe 1992] Example IBG when S = {a,b,c,d} a b c d a b cb c d a cb c = 20 = 45 d = 80 c = 80 = 50 c d = 65 = 50 = 80 used in opt plan cost of plan –There are 16 subsets of S –IBG has 8 nodes –But IBG can compute

13 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information

14 University of California, Santa Cruz Naive Algorithm Recall that we want the degree of interaction between all pairs of indices in S Each doi(a,b) may be computed directly Upon termination, T[a,b] = doi(a,b) for all a,b Can save time using an IBG as a cache of cost function Downside: iteration over all subsets of S

15 University of California, Santa Cruz The Q I NTERACT Algorithm Naive Algorithm (condensed) We should avoid evaluating doi(a,b,X) for all Q I NTERACT algorithm processes two index sets per IBG node Q I NTERACT Algorithm

16 University of California, Santa Cruz Q I NTERACT Example a b u v = 20 a u v = 30b u v = 30 a u = 40u v = 40 v = 50 u = 50 b v = 40 Let’s calculate doi(a,b) on the graph below What happens on iteration Y = {u} ? Y a b u v = 20 a u v = 30b u v = 30 a u = 40u v = 40 v = 50 u = 50 b v = 40 Y

17 University of California, Santa Cruz Interleaved IBG Processing In Q I NTERACT, the IBG is built, then analyzed –I.e., IBG construction and analysis is serial We can discover interactions in a partial IBG IBG construction and analysis may be interleaved -Improves accuracy of doi over time a b c d a b cb c d a c = 20 = 45 = 50 = b c d = 80 c = 80 c d = 65 = 50

18 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information -Visualizing Index Interactions -Scheduling Index Creation

19 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information -Visualizing Index Interactions -Scheduling Index Creation

20 University of California, Santa Cruz Visualizing Index Interactions We can visualize the doi function as a graph –Nodes correspond to indices –Edge between a and b has weight doi(a,b) O(CK,OK) C(CK,NK) LI(SK,SD,D,EP,OK) LI(SD,D) S(NK,N,SK)S(NK,SK)S(SK,NK) C(NK,CK) LI(SD,Q) TPC-H Query 7

21 University of California, Santa Cruz Interaction Graph The connected components have special meaning

22 University of California, Santa Cruz Outline Properties of Query Optimization Degree of Interaction Algorithm Applying Interaction Information -Visualizing Index Interactions -Scheduling Index Creation

23 University of California, Santa Cruz Scheduling Index Creation Suppose we want to materialize new indices In what order should they be created? Benefit a,baa,b,c Materialized Indices a,cca,b,c Schedule = a,b,c Choose first schedule to maximize benefit over time (shaded area) a,bba,b,c Schedule = b,a,cSchedule = c,a,b

24 University of California, Santa Cruz Scheduling Index Creation We define an optimization problem –M = preexisting indices –{a 1, …, a n } = new indices to create –Permute new indices as t 1, …, t n to maximize This problem is computationally hard –There is a connection to the Set Cover problem, since each new index “covers” more benefit

25 University of California, Santa Cruz Greedy Scheduling We are tempted to use a greedy heuristic This results in the third schedule Greedy schedule can be suboptimal by a factor of about (n – 1) Benefit a,baa,b,c Materialized Indices a,cca,b,c Schedule = a,b,c a,bba,b,c Schedule = b,a,cSchedule = c,a,b

26 University of California, Santa Cruz Interaction-Aware Scheduling Scheduling can use interaction graph Idea:First find optimal sub-schedules for each C i Then choose the best interleaving of sub-schedules Idea:First find optimal sub-schedules for each C i Then choose the best interleaving of sub-schedules This heuristic avoids the pitfalls of greedy scheduling We can also show stronger performance guarantees

27 University of California, Santa Cruz Conclusions Index interactions provide useful insights for physical design tuning The doi metric is an effective characterization of interaction relationships We can analyze interactions efficiently when the Index Benefit Graph has limited size Future work?

28 University of California, Santa Cruz Thank You

29 University of California, Santa Cruz Performance Evaluation Q I NTERACT implementation in Java –Uses JDBC to connect to IBM DB2 database Experiments use 22 TPC-H benchmark queries We generate indices based on the DB2 advisor –S ALL = all indices recommended by DB2 –S 1C = indices in S ALL with first column only We monitor the progress of the “serial” and “interleaved” approaches over time

30 University of California, Santa Cruz Experimental Results S ALL index set 0.1 threshold S 1C index set 0.1 threshold

31 University of California, Santa Cruz Applications Q I NTERACT returns doi(a,b) for all a,b We propose two applications of this information –Visualizing index interactions Illustrates the global interactions as a graph Useful when manually tuning the index set –Scheduling index construction Want to choose when new indices will be created Goal is to increase performance as quickly as possible Knowledge of index interactions can help

32 University of California, Santa Cruz Problem Statement Which indices in S interact? How strong are the interactions? The Degree of Interaction Problem: It may be useful to ignore “minor” interactions A threshold-based variant:

33 University of California, Santa Cruz Index Selection Index selection problem: Does benefit(a, X) depend on X ? –If so, this is called index interaction We can quantify the benefit of an index:

34 University of California, Santa Cruz Future Work Expand our support for updates Implementation of visualization tool Experiments with materialization scheduling Incremental updates to doi function Exploring stronger assumptions on query optimization –Efficient upper bounds on doi function?