Download presentation
Presentation is loading. Please wait.
Published byBartholomew Greene Modified over 9 years ago
1
Xpath Query Evaluation
2
Goal Evaluating an Xpath query against a given document – To find all matches We will also consider the use of types Complexity is important – Huge Documents
3
Data complexity vs. Combined Complexity Two inputs to the query evaluation problem – Data (XML document) of size |D| – Query (Xpath expression) of size |Q| – Usually |Q| << |D| Polynomial data complexity – Complexity that is polynomial in |D|, possibly exponential in |Q| Polynomial combined complexity – Complexity that is polynomial in |D| and |Q| Fixed Parameter Tractable complexity – Complexity Poly(|D|)*f(|Q|)
4
Xpath standard semantics
5
Core XPath locpath ::= `/' locpath j locpath `/' locpath j locpath ` j' locpath j locstep. locstep ::= axis `::' ntst `[' bexpr `]'... `[' bexpr `]'. bexpr ::= bexpr `and' bexpr j bexpr `or' bexpr j `not(' bexpr `)' j locpath. axis ::= `self' j `child' j `parent' j `descendant' j `descendant-or-self' j `ancestor' j `ancestor-or-self' `following' j `following-sibling' `preceding' j `preceding-sibling'.
6
Xpath Query Evaluation Input: XML Document D, Xpath query Q Output: A subset of the nodes of D, as defined by Q We will follow Efficient Algorithms for Processing Xpath Queries / Gottlob, Koch, Pichler, TODS 2005
7
Simple algorithm process-location-step(n,Q) { S:-= Apply Q.first to n; If |Q|> 1 For each node n’ in s do process-location-step(n’,Q.next) }
8
Complexity Worst case: in each step of Q the axis is “following” So we apply the query in each step on O(|D|) nodes And we get Time(|Q|)= |D|*Time(|Q|-1) I.e. the complexity is O(|D|^|Q|)
9
Early Systems Performance Figure taken from Gottlob, Koch, Pichler ‘05
10
Internet Explorer 6 Figure taken from Gottlob, Koch, Pichler ‘05
11
IE6 – performance as a function of document size Figure taken from Gottlob, Koch, Pichler ‘05
12
Polynomial data complexity Poly data complexity is sometimes considered good even if exponential in the query size But can we have polynomial combined complexity for Xpath query evaluation? Yes!
13
Two main principles Query parse trees: the query is divided to parts according to its structure (not to be confused with the XML tree structure) Context-value tables: for every expression e occurring in the parse tree, compute a table of all valid combinations of context c and value v such that e evaluates to v in c.
14
Xpath query parse tree descendant::b/following-sibling::* [position() != last()]
15
Bottom-up vs. Top-down evaluation We will discuss two kinds of query evaluation algorithms: – Bottom-up means that the query parse tree is processed from the leaves up to the root – Top-down means that the parse tree is processed from the root to the leaves When processing we will fill in the context- value table
16
Bottom-up evaluation Main idea: compute the value for each leaf for every possible context Propagate upwards until the root Dynamic programming algorithm to avoid re- evaluation of queries in the same context
17
Operational semantics Needed as a first step for evaluation algorithms Similar ideas used in compilers design Here the semantics is based on the notion of contexts
18
Contexts The domain of contexts is C= dom X { | 1<k<n< |dom|} A context is c= where x is a context node k is a context position n is the context size
19
Types
20
Semantics for Xpath expressions The semantics of evaluating an expression is a 4-tuple where the first 3 elements are the context, and the fourth is the value obtained by evaluation in the context
21
Some notations T(t): all nodes satisfying a predicate t E(e): all nodes satisfying a regular exp. e (applied with respect to a given axis) Idx x (x,S) is the index of a node x in the set s with respect to a given axis and the document order
23
Context-value Table Given a query sub-expression e, the context- value table of e specifies all combinations of context c and value v, such that computing e on the context c results in v Bottom-up algorithm follows: compute the context-value table in a bottom-up fashion with respect to the query
24
Bottom-up algorithm
25
Example 4 times
26
Complexity O(|D|^3*|Q|) space ignoring strings and numbers – O(|Q|) tables, with 3 columns, each including values in 1…|D| thus O(|D|^3*|Q|) – An extra O(|D|*|Q|) multiplicative factor for strings and numbers O(|D|^5*|Q|) time ignoring strings and numbers – It can take O(|D|^2) to combine two nodesets – Extra O(|Q|) in case of strings and numbers
27
Optimization Represent contexts as pairs of current and previous node Allows to get the time complexity down to O(|D|^4* |Q|^2) Space complexity can be brought down to O(|D|^2*|Q|^2) via more optimizations
28
Top-down evaluation Similar idea But allows to compute only values for contexts that are needed Same worst-case bounds
29
Top-down or bottom-up? General question in processing XML trees The tradeoff: – Usually easier to combine results computed in children to obtain the result at the parent So bottom-up traversal is usually easier to design – On the other hand, some of the computation is redundant since we don’t know if it will become relevant So top-down traversal may be more efficient
30
Linear-time fragment Core Xpath includes only navigation – \ and \\ Core Xpath can be evaluated in O(|D|*|Q|) Observtion: no need to consider the entire triple, only current context node Top-down or bottom-up evaluation with essentially the same algorithm But smaller tables (for every query node, all document nodes and values of evaluation) are maintained.
31
Types are helpful Can direct the search – In some parts of the tree there is no hope to get a match to a given sub-expression of the query – As a result we may have tables with less entries. Whiteboard discussion
32
Type Checking and Inference Type checking a single document: straightforward – Polynomial combined complexity if automaton representing type is deterministic, exponential in automaton size but polynomial in document size otherwise Type checking the results of a (Xpath) query Inferring the results of a query
33
Type Inference An (incomplete) algorithm for type inference can work its way to the top of the query parse tree to infer a type in a bottom-up fashion – Start by inferring a type for the leaves (simple queries), then use it for their parents Type Inference is inherently incomplete. Can be performed for some languages that are “regular” in a sense.
34
Restricted language allowing for type inference Axes: child, descendant, parent, ancestor, following-sibling, etc. variables can be bound to nodes in the input tree= then passed as parameters An equality test can be performed between node ID's, but not between node values.
35
Type Checking In addition to inferring a type we need to verify containment in another type. Type Inference can be used as a tool for Type Checking. Type Checking was shown to be decidable for the same language fragment, but with high complexity.
36
Intuitive connection to text Queries => regular expressions Types (tree automata) => context free languages Type Inference => intersection of context free and regular languages, resulting in a context free one Type checking => Type Inference + inclusion of context free languages (with some restrictions to guarantee decidability)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.