Download presentation
Presentation is loading. Please wait.
Published byJasper Chase Modified over 8 years ago
1
1 Ranking Inexact Answers
2
2 Ranking Issues When inexact querying is allowed, there may be MANY answers –different answers have a different level of incompleteness Ranking the answers allows the user to quickly see the (hopefully) most relevant answers Preference: Create answers in ranking order –Why is this important? We will consider several different approaches to this problem
3
3 Tree Pattern Relaxation Amer-Yahia, Cho, Srivastava EDBT 2002
4
4 Tree Patterns Queries are tree patterns, as considered in previous lessons Book CollectionEditor NameAddress Double line indicates descendent
5
5 Relaxed Queries Four types of “relaxations” are allowed on the trees Node Generalization: Assume that we know a relationship of types/super-types among labels. Allow label to be changed to super-type Book CollectionEditor NameAddress Document CollectionEditor NameAddress
6
6 Relaxed Queries Leaf Node Deletion: Delete a leaf node (and its incoming edge) from the tree Book CollectionEditor NameAddress Book Editor NameAddress
7
7 Relaxed Queries Edge Generalization: Change a parent-child edge to an ancestor-descendent edge Book CollectionEditor NameAddress Book Editor NameAddress Collection
8
8 Relaxed Queries Subtree Promotion: A query subtree can be promoted so that it is directly connected to its former grandparent by an ancestor-descendent edge Book CollectionEditor NameAddress Book Editor Name Address Collection
9
9 Composing Relaxations Relaxations can be composed. Are the following relaxations of Q? Book CollectionEditor NameAddress Q Book Collection Book CollectionAddress Name DocumentAddress
10
10 Approximate Answers and Ranking An approximate answer to Q is an exact answer to a relaxed query derived from Q In order to give different answers different rankings, tree patterns are weighted Each node and edge has 2 weights – value when exactly satisfied, value when satisfied by a relaxation Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) A fragment of a document that exactly satisfies the query will have a score of: 45
11
11 Example Ranking Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Book Person Name Address Details Sam NY How much would this answer score?
12
12 Example Ranking Book Collection Editor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Book Person Name Address Details Sam NY How much would this answer score?
13
13 Problem Definition Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t Naive strategy to solve the problem: –Find all relaxations of Q –For each relaxation, compute all exact answers –remove answers with score below t Is this a good strategy?
14
14 Problem Definition Given an XML document D, a weighted tree pattern Q and a threshold t, find all approximate answers of Q in D whose scores are ≥ t A better strategy to compute an answer to a relaxation of a query: –Intuition: Compute the query as a series of joins –Can use stack-merge algorithms (studied before) for computing joins –filter out intermediate results whose scores are too low
15
15 The Query Plan We now show the how to derive a plan for evaluating queries in this setting First, we show how an exact plan is derived Then, we consider how each individual relaxation can be added in Finally, we show the complete relaxed plan
16
16 Query Plan: Exact Answers Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) BookCollection Editor Address Name c(Book, Collection) c(Book, Editor) c(Editor, Name) d(Editor, Address) c(x,y) = y is child of x d(x,y) = y is descendent of x (6, 0)
17
17 Query Plan: Exact Answers Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) BookCollection Editor Address Name c(Book, Collection) c(Book, Editor) c(Editor, Name) d(Editor, Address) Remember, to compute a join, e.g., of Book and Collection, we actually find the list of Books and the list of Collections (from the index) and perform the stack-merge algorithms (6, 0)
18
18 Adding Relaxations into Plan Node generalization: Book relaxed to Document Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) Document c(Book, Collection) c(Document, Collection) c(Document, Editor)
19
19 Adding Relaxations into Plan Edge generalization: Relax Editor-Name Edge Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) c(Editor, Name) or (Not exists c(Editor,Name) and d(Editor, Name(( Written in short as: c(Editor, Name) or d(Editor, Name( We only allow relaxations when a direct child does not exist
20
20 Adding Relaxations into Plan Subtree Promotion: Promote tree rooted at Name Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) c(Editor, Name) or (Not exists c(Editor,Name) and d(Book, Name(( Written in short as: c(Editor, Name) or d(Book, Name(
21
21 Adding Relaxations into Plan Leaf Node Deletion: Make Address Optional Book Collection Editor Address Name c(Book, Editor) c(Editor, Name) d(Editor, Address) c(Book, Collection) Outer Join Operator: Means that should join if possible, but not delete values that cannot join
22
22 Combining All Possible Relaxations All approximate answers can be derived from the following query plan Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document, Address) c(Book, Collection) OR d(Document, Collection) Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) (6, 0)
23
23 Creating “Best Answers” Want to find answers whose ranking is over the threshold t Naive solution: Create all answers. Delete answers with low ranking Algorithm Thres: Goal of the algorithm is to prune intermediate answers that cannot possibly meet the specified threshold
24
24 Associating Nodes with Maximal Weight The maximal weight of a node in the evaluation plan is the largest value by which the score of an intermediate answer computed for that node can grow Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection)
25
25 Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Document Collection Editor Address Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection) (38)(39) (6, 0) (30)(40) (39) (41) (21) (7) (0)
26
26 Algorithm Thres Relaxed query evaluation plan is computed bottom-up –Note that the joins are computed for all matching intermediate results at the same time At each step, intermediate results are computed, along with their scores If the sum of an intermediate result score with the maximal weight of the current node is less than the threshold, prune the intermediate result
27
27 Example: Threshold = 35 Book Editor Name Address Details Sam NY Document Collection Editor Name c(Document, Editor) OR d(Document, Editor) c(Editor, Name) OR d(Editor, Name) OR d(Document,Name) d(Editor, Address) OR d(Document,Address) c(Book, Collection) OR d(Document, Collection) (38)(39) (30)(40) (39) (41) (21) (7) (0) Book CollectionEditor NameAddress (7, 1) (4, 3) (2, 1) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Address (6, 0) When will the answer be pruned? 7 7 16 27
28
28 Test Yourself
29
29 Example Ranking Book Collection Editor NameAddress (7, 1) (4, 3) (2, 1) (6, 0) (5, 0) (8, 5) (6, 0) (4, 0) (3, 0) Document Name Address Sam NY How much would this answer score? Collection
30
30 (8, 5) Query Plan Book CollectionEditor Name (7, 1) (4, 3) (2, 1) (5, 0) (6, 0) 1. What will the exact plan look like? FNameLName 2. What will the plan look like if all possible relaxations are added? 3. What is maximal weight by which the score of an intermediate answer can grow, for each node? (2, 1) (2, 0)(1, 0)
31
31 Algorithm OptiThres Predict during evaluation if subsequent relaxation produces additional plans that will not meet the threshold In this case, undo this relaxation from the plan, e.g., –Convert outer join to join –Revert to original node type –Etc. Improves efficiency since less intermediate results are created and fewer conditions are tested
32
32 Three Weights for OptiThres relaxNode: the largest value by which the score of an intermediate result for the left child of a join can grow if it joins with a relaxed match to the right node –use to decide to unrelax right node relaxJoin: the largest value by which the score of an intermediate result for the left child of a join can grow if it cannot join with any right child –use to decide if an outerjoin can be reverted to a join
33
33 Three Weights for OptiThres (cont) relaxPred: the largest value by which the sum of scores of a pair of intermediate results for both children can grow if they are joined using a relaxed structural predicate –use to decide to unrelax edge generalization and subtree promotion
34
34 The Algorithm Before computing each join node, check if: –maximal left intermediate result + relaxNode < threshold unrelax right node –maximal left intermediate result + relaxJoin < threshold unrelax outer-join –maximal left intermediate result + maximal right intermediate result + relaxPred < threshold unrelax join predicate
35
35 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Person Month c(Document, Month) OR d(Document, Month) c(Document, Person) OR d(Document, Person) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 Unrelax Person relaxNode relaxJoin relaxPred
36
36 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Publisher Month c(Document, Month) OR d(Document, Month) c(Document, Publisher) OR d(Document, Publisher) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 – 2+8<14 Unrelax Join
37
37 Proceeding PublisherMonth (7, 2) (6, 2) (2, 1) (2, 0) Document Publisher Month c(Document, Month) OR d(Document, Month) c(Document, Publisher) OR d(Document, Publisher) (10, 1) (11,8,9) (-,0,2) Query Suppose threshold = 14 Suppose that the maximal scores for Document nodes is 2, i.e., there are no Proceeding nodes in the database –2+11<14 – 2+8<14 – 2+10 + 9>14 Do not unrelax!
38
38 (8, 5) Query Plan (cont) Book CollectionEditor Name (7, 1) (4, 3) (2, 1) (5, 0) (6, 0) 4. Fill in the three values: relaxNode, relaxJoin, relaxPred for each node FNameLName (2, 1) (2, 0)(1, 0)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.