Download presentation
Presentation is loading. Please wait.
Published byRoxanne Foster Modified over 8 years ago
1
1 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 Lazy Query Evaluation for Active XML Abiteboul, Benjelloun, Cautis, Manolescu, Milo, Preda INRIA Futurs presented by: Grigoris Karvounarakis Univ. of Pennsylvania CIS 650 October 14, 2004
2
2 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Active XML function nodes
3
3 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Tree Pattern Queries result nodes descendant edge
4
4 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Tree Pattern Queries Similar to Pattern Trees from TAX/TLC algebra + variable nodes, used to bind variables to sub-trees (variable nodes with the same name must be mapped to elements with the same tag name) + result nodes Embedding (of a query q into a doc d) = Match Result of embedding = bindings of output variables on witness tree
5
5 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 No embedding …
6
6 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 No embedding … … but if we evaluate 1
7
7 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example
8
8 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example
9
9 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Embedding Example XY
10
10 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Relevant rewriting (getNearbyRestos) is a relevant function node In general, a function node is relevant, if there exists some rewriting of the document where some of the nodes it produces belongs to a match Rewriting the document by invoking relevant function nodes produces relevant rewritings d 1 ! v 1 d 2 ! v 2 … d n A document that contains no calls that are relevant to a query q is said to be complete for q 1
11
11 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Problem definition Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document Naïve approach: interleave query evaluation with function calls Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls)
12
12 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Problem definition Given an Active XML document d and a query q, find an efficient way to evaluate the query over the document Naïve approach: interleave query evaluation with function calls Better: try to compute (a superset of) the relevant functions calls for q and execute q over the rewriting of d (that results from executing these function calls) Efficiency tradeoff time to compute approximation of set of relevant functions (larger for more accurate approx) time to execute the function calls (smaller for more accurate approx) and time to execute query over resulting rewriting of document (smaller document for more accurate approx)
13
13 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion
14
14 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Linear Path Queries /*() /nyHotels/*() /nyHotels/hotel/*() /nyHotels/hotel/name/*() /nyHotels/hotel/rating/*() /nyHotels/hotel/nearby/*() /nyHotels/hotel/nearby//*() /nyHotels/hotel/nearby//restaurant/*() /nyHotels/hotel/nearby//restaurant/name/*() /nyHotels/hotel/nearby//restaurant/address/*() /nyHotels/hotel/nearby//restaurant/rating/*()
15
15 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Linear Path Queries Correct, but usually inaccurate Ignores filtering conditions in the path from the root or in other branches that could make some of the functions irrelevant (e.g. there is no chance that a getNearbyRestos() function node under a hotel is relevant, if the hotel rating is not “*****”)
16
16 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Node Focused Queries For each node in the query tree, replace it with an OR node (to add a branch *() to match any functions, similarly with LPQs) Then, for every node v in the resulting query tree, create q v = q – {v and its subtree}, with output node f v pointing at the position of the *() OR-sibling of v Each such query tree involves the path from the root to the node (as in LPQ) + any parts of the tree that would have to be matched anyway, for the whole query tree to match.
17
17 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 NFQ Example nyHotels hotel namenearby “Best Western”“*****” restaurant nameaddress rating “*****”XY * * * * * * * ** *
18
18 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels hotel namenearby “Best Western”“*****” restaurant nameaddress rating “*****”XY * * * * * * * ** * NFQ Example
19
19 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels NFQ Example *
20
20 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels NFQ Example *
21
21 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels * NFQ Example
22
22 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 nyHotels hotel namenearby “*****” restaurant nameaddress rating “*****”XY * * * * * * * ** * Another NFQ Example “Best Western”
23
23 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel namenearby “*****” rating * * * * * * * “Best Western”
24
24 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel namenearby “*****” rating * * * * * * * “Best Western”
25
25 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Another NFQ Example nyHotels hotel name nearby “*****” rating * * * * * “Best Western”
26
26 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Node Focused Queries Assuming that functions can return data of arbitrary type, the function nodes that are relevant for a query q are precisely the ones retrieved by the NFQs of q
27
27 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion
28
28 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Sequencing relevant calls Naïve NFQA algorithm: 1. Evaluate all NFQs 2. Pick one of the returned functions, say f v 3. Evaluate the function and rewrite the document (d ! f v d’) 4. Until all NFQs return empty results (i.e., there are no more relevant calls) After every loop, although the NFQs remain the same, their result can change (since evaluating functions at step 3 above can introduce new function nodes or make some results irrelevant)
29
29 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Improving NFQA “Predict” when NFQ results could not have possibly changed and avoid reevaluating them Identify dependences between NFQs and the effect of executing functions they return
30
30 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs nyHotels * hotel name nearby “*****” rating * * * * * “Best Western” NFQ 1 NFQ 2 NFQ 1 can influence NFQ 2, but not vice versa
31
31 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs NFQ 1 may influence NFQ 2 iff the output function node of NFQ 1 is an ancestor (in the query tree) of the output node of NFQ 2 Two NFQs belong in the same layer if they may influence (directly or transitively) each other. Inside every layer, we have to reevaluate every NFQ after every function call Multiple equivalent NFQs (i.e., in the same layer) can only exist under //– so that, not knowing the output type, both nodes could appear as descendants of each other, e.g. //a, //b: in /a/b, //a matches /a and //b matches /a/b, while in /b/a, //b matches /b and //a matches /b/a
32
32 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Influence of NFQs L 1 < L 2 iff some NFQ in L 1 may influence (directly or transitively) some NFQ in We have to process L 1 before L 2 (without having to process L 1 again afterwards) When processing L 1 has finished, OR-nodes corresponding to returned functions are redundant and thus NFQs in L 2 can be simplified by removing them
33
33 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Parallelizing calls Let q lin be the linear path from the root to the output node of NFQ q, not inclusive (note: q lin is a regular expression) Two NFQs q, q’ that belong to the same layer are independent iff there are no common words in the regular languages of q lin, q’ lin E.g: //a, //b are independent But //a//c and //b//c are not: (e.g. both match /a/b/c) If all NFQs in a layer are independent, we can call all functions returned by the same NFQ in a step of NFQA in parallel. Other sufficient conditions could exist, too …
34
34 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion
35
35 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Using types Use function return type to “predict” shape of data that a function call can return Similar to check for existence of a possible rewriting If this shape cannot match the (corresponding part of) the query pattern, they can be discarded In some cases, one can go further and restrict not only the output type but also the specific names of functions that could match Refined NFQs Use set of function names of appropriate return type instead of *() Use F-guides (later) to make them even more refined
36
36 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Refined NFQ example nyHotels hotel name nearby “*****” rating * * * * “Best Western” *
37
37 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Refined NFQ example nyHotels hotel name nearby “*****” rating * * getRating getNearbyRestos * “Best Western”
38
38 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Pushing queries Similar to pushing selections on scans in relational queries or pushing queries to data sources in mediator systems Reduce amount of (useless) data that are transferred (assuming functions correspond to remote (web) services), by filtering irrelevant matches and projecting only on output variable nodes
39
39 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Outline Definitions Finding relevant calls Sequencing relevant calls Improving accuracy Reducing detection time Conclusions - Discussion
40
40 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Lenient rewriting Trade accuracy for efficiency Use XPath or LPQs instead of NFQ (faster processing) Use a lenient form of type checking (ignoring order and cardinality of elements)
41
41 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes
42
42 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes paths that don’t lead to functions are left out
43
43 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Similar to dataguides for function calls One occurrence for each path that leads to some function node + pointers to function nodes pointers to getRating callspointers to getNearbyRestos, getNearbyMuseums calls pointers to getHotels calls
44
44 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Function call guides Use F-guides for: Generation of Refined NFQs (use return type within appropriate F-guide part to get only function names that can indeed appear in the corresponding tree fragment) Efficient approximation of relevant function nodes: evaluate queries (NFQs) on F-guide evaluate queries on original document using LPQs Initial filtering: Can get rid of NFQs for nodes that don’t have any children in the F-guide
45
45 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Conclusions Active XML: Interesting new area Nothing fundamentally novel Applies known tools (distributed processing, lazy evaluation) in a new context, giving new life to documents Greatest challenge: formulate the right research questions well Answers to these well-formulated questions are fairly easy. Contributions of this paper: Formulates such an interesting question Thorough understanding of different aspects of the problem (accuracy vs. performance and their effect to overall efficiency)
46
46 UNIVERSITY of PENNSYLVANIAGrigoris Karvounarakis October 04 CIS 650 Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.