1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.

Slides:



Advertisements
Similar presentations
BackTracking Algorithms
Advertisements

1 Abdeslame ALILAOUAR, Florence SEDES Fuzzy Querying of XML Documents The minimum spanning tree IRIT - CNRS IRIT : IRIT : Research Institute for Computer.
Searching on Multi-Dimensional Data
Main Index Contents 11 Main Index Contents Week 6 – Binary Trees.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Advanced Topics in Algorithms and Data Structures 1 Rooting a tree For doing any tree computation, we need to know the parent p ( v ) for each node v.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Coloring Away Communication in Parallel Query Optimization Waqar Hasan, Rajeev Motwani Stanford University Παυλάτος Χρήστος
Trees Chapter 25 Slides by Steve Armstrong LeTourneau University Longview, TX  2007,  Prentice Hall.
Chapter 12 Trees. Copyright © 2005 Pearson Addison-Wesley. All rights reserved Chapter Objectives Define trees as data structures Define the terms.
P2P Course, Structured systems 1 Introduction (26/10/05)
Query Relaxation for XML Database Award #: PI: Wesley W. Chu Computer Science Dept. UCLA.
CSC 2300 Data Structures & Algorithms February 6, 2007 Chapter 4. Trees.
COSC2007 Data Structures II
Tree.
Efficient Keyword Search over Virtual XML Views Feng Shao and Lin Guo and Chavdar Botev and Anand Bhaskar and Muthiah Chettiar and Fan Yang Cornell University.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 Ranking Inexact Answers. 2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Querying Structured Text in an XML Database By Xuemei Luo.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
Compiled by: Dr. Mohammad Omar Alhawarat
MA/CSSE 473 Day 28 Dynamic Programming Binomial Coefficients Warshall's algorithm Student questions?
All Pair Shortest Path IOI/ACM ICPC Training June 2004.
Early Profile Pruning on XML-aware Publish- Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras University of California VLDB 2007 Presented.
Bipartite Matching. Unweighted Bipartite Matching.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Adaptive Processing of Top-k Queries in XML Amelie Marian, Sihem Amer-Yahia Nick Koudas, Divesh Srivastava Proceedings of the 21st International Conference.
PC-Trees & PQ-Trees. 2 Table of contents Review of PQ-trees –Template operations Introducing PC-trees The PC-tree algorithm –Terminal nodes –Splitting.
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
1 Holistic Twig Joins: Optimal XML Pattern Matching Nicolas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 2002 Presented by Jun-Ki Min.
Holistic Twig Joins: Optimal XML Pattern Matching Nicholas Bruno, Nick Koudas, Divesh Srivastava ACM SIGMOD 02 Presented by: Li Wei, Dragomir Yankov.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
1 Efficient Processing of XML Twig Patterns with Parent Child Edges: A Look-ahead Approach Presenter: Qi He.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
IOI/ACM ICPC Training 4 June 2005.
An Efficient Algorithm for Incremental Update of Concept space
By A. Aboulnaga, A. R. Alameldeen and J. F. Naughton Vldb’01
Data Indexing Herbert A. Evans.
CSCE 210 Data Structures and Algorithms
AA Trees.
Database Management System
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Multi-Step Equations How to Identify Multistep Equations |Combining Terms| How to Solve Multistep Equations | Consecutive Integers.
Distributed database approach,
A paper on Join Synopses for Approximate Query Answering
Source Code for Data Structures and Algorithm Analysis in C (Second Edition) – by Weiss
B+ Tree.
Probabilistic Data Management
(edited by Nadia Al-Ghreimil)
Database Management Systems (CS 564)
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Joining Interval Data in Relational Databases
B- Trees D. Frey with apologies to Tom Anastasio
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Structure and Content Scoring for XML
Lecture 2- Query Processing (continued)
Early Profile Pruning on XML-aware Publish-Subscribe Systems
Suffix trees and suffix arrays
CE 221 Data Structures and Algorithms
Structure and Content Scoring for XML
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Relax and Adapt: Computing Top-k Matches to XPath Queries
Efficient Aggregation over Objects with Extent
Blockchain Mining Games
Presentation transcript:

1 Ranking Inexact Answers

2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness Ranking the answers allows the user to quickly see the (hopefully) most relevant answers Preference: Create answers in ranking order –Why is this important? We will consider several different approaches to this problem

3 Tree Pattern Relaxation Amer-Yahia, Cho, Srivastava EDBT 2002

4 Tree Patterns Queries are tree patterns, as considered in previous lessons Book CollectionEditor NameAddress Double line indicates descendent

5 Relaxed Queries Four types of “relaxations” are allowed on the trees Node Generalization: Assume that we know a relationship of types/super-types among labels. Allow label to be changed to super-type Book CollectionEditor NameAddress Document CollectionEditor NameAddress

6 Relaxed Queries Leaf Node Deletion: Delete a leaf node (and its incoming edge) from the tree Book CollectionEditor NameAddress Book Editor NameAddress

7 Relaxed Queries Edge Generalization: Change a parent-child edge to an ancestor-descendent edge Book CollectionEditor NameAddress Book Editor NameAddress Collection

8 Relaxed Queries Subtree Promotion: A query subtree can be promoted so that it is directly connected to its former grandparent by an ancestor-descendent edge Book CollectionEditor NameAddress Book Editor Name Address Collection

9 Composing Relaxations Relaxations can be composed. Are the following relaxations of Q? Book CollectionEditor NameAddress Q Book Collection Book CollectionAddress Name DocumentAddress

10 Approximate Answers and Ranking An approximate answer to Q is an exact answer to a relaxed query derived from Q In order to give different answers different rankings, tree patterns are weighted Each node and edge has 2 weights – value when exactly satisfied, value when satisfied by a relaxation Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) A fragment of a document that exactly satisfies the query will have a score of: 45

11 Example Ranking Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Book Person Name Address Details Sam NY How much would this answer score?

12 Example Ranking Book Collection Editor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Book Person Name Address Details Sam NY How much would this answer score?

13 Problem Definition Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t Naive strategy to solve the problem: –Find all relaxations of Q –For each relaxation, compute all exact answers –remove answers with score below t Is this a good strategy?

14 Problem Definition Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t A better strategy to compute an answer to a relaxation of a query: –Intuition: Compute the query as a series of joins –Can use stack-merge algorithms (studied before) for computing joins –filter out intermediate results whose scores are too low

15 The Query Plan We now show the how to derive a plan for evaluating queries in this setting First, we show how an exact plan is derived Then, we consider how each individual relaxation can be added in Finally, we show the complete relaxed plan

16 Query Plan: Exact Answers Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) BookCollection Editor Address Name c(Book, Collection) c(Book, Editor) c(Editor, Name) d(Editor, Address) c(x,y) = y is child of x d(x,y) = y is descendent of x (6, 0)

17 Query Plan: Exact Answers Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) BookCollection Editor Address Name c(Book, Collection) c(Book, Editor) c(Editor, Name) d(Editor, Address) Remember, to compute a join, e.g., of Book and Collection, we actually find the list of Books and the list of Collections (from the index) and perform the stack-merge algorithms (6, 0)

18 Adding Relaxations into Plan Node generalization: Book relaxed to Document Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) Document c(Book, Collection) c(Document, Collection) c(Document, Editor)

19 Adding Relaxations into Plan Edge generalization: Relax Editor-Name Edge Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) c(Editor, Name) or (Not exists c(Editor,Name) and d(Editor, Name(( Written in short as: c(Editor, Name) or d(Editor, Name( We only allow relaxations when a direct child does not exist

20 Adding Relaxations into Plan Subtree Promotion: Promote tree rooted at Name Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) c(Editor, Name) or (Not exists c(Editor,Name) and d(Book, Name(( Written in short as: c(Editor, Name) or d(Book, Name(

21 Adding Relaxations into Plan Leaf Node Deletion: Make Address Optional Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) Outer Join Operator: Means that should join if possible, but not delete values that cannot join

22 Combining All Possible Relaxations All approximate answers can be derived from the following query plan Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document, Address) c(Book, Collection) OR d(Document, Collection) Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) (6, 0)

23 Creating “Best Answers” Want to find answers whose ranking is over the threshold t Naive solution: Create all answers. Delete answers with low ranking Algorithm Thres: Goal of the algorithm is to prune intermediate answers that cannot possibly meet the specified threshold

24 Associating Nodes with Maximal Weight The maximal weight of a node in the evaluation plan is the largest value by which the score of an intermediate answer computed for that node can grow Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection)

25 Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection) (38)(39) (6, 0) (30)(40) (39) (41) (21) (7) (0)

26 Algorithm Thres Relaxed query evaluation plan is computed bottom-up –Note that the joins are computed for all matching intermediate results at the same time At each step, intermediate results are computed, along with their scores If the sum of an intermediate result score with the maximal weight of the current node is less than the threshold, prune the intermediate result

27 Example: Threshold = 35 Book Editor Name Address Details Sam NY Document Collection Editor Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection) (38)(39) (30)(40) (39) (41) (21) (7) (0) Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Address (6, 0) When will the answer be pruned?

28 Test Yourself

29 Example Ranking Book Collection Editor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Document Name Address Sam NY How much would this answer score? Collection

30 (8, 5) Query Plan Book CollectionEditor Name (7, 1) (4, 3) (2, 1) (5, 0) (6, 0) 1. What will the exact plan look like? FNameLName 2. What will the plan look like if all possible relaxations are added? 3. What is maximal weight by which the score of an intermediate answer can grow, for each node? (2, 1) (2, 0)(1, 0)

31 Algorithm OptiThres Predict during evaluation if subsequent relaxation produces additional plans that will not meet the threshold In this case, undo this relaxation from the plan, e.g., –Convert outer join to join –Revert to original node type –Etc. Improves efficiency since less intermediate results are created and fewer conditions are tested

32 Three Weights for OptiThres relaxNode: the largest value by which the score of an intermediate result for the left child of a join can grow if it joins with a relaxed match to the right node –use to decide to unrelax right node relaxJoin: the largest value by which the score of an intermediate result for the left child of a join can grow if it cannot join with any right child –use to decide if an outerjoin can be reverted to a join

33 Three Weights for OptiThres (cont) relaxPred: the largest value by which the sum of scores of a pair of intermediate results for both children can grow if they are joined using a relaxed structural predicate –use to decide to unrelax edge generalization and subtree promotion

34 The Algorithm Before computing each join node, check if: –maximal left intermediate result + relaxNode < threshold  unrelax right node –maximal left intermediate result + relaxJoin < threshold  unrelax outer-join –maximal left intermediate result + maximal right intermediate result + relaxPred < threshold  unrelax join predicate

35 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Person Month c(Document, Month) OR d(Document, Month) c(Document, Person) OR d(Document, Person) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 Unrelax Person relaxNode relaxJoin relaxPred

36 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Publisher Month c(Document, Month) OR d(Document, Month) c(Document, Publisher) OR d(Document, Publisher) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 – 2+8<14 Unrelax Join

37 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Publisher Month c(Document, Month) OR d(Document, Month) c(Document, Publisher) OR d(Document, Publisher) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 – 2+8<14 – >14 Do not unrelax!

38 (8, 5) Query Plan (cont) Book CollectionEditor Name (7, 1) (4, 3) (2, 1) (5, 0) (6, 0) 4. Fill in the three values: relaxNode, relaxJoin, relaxPred for each node FNameLName (2, 1) (2, 0)(1, 0)