Download presentation
Presentation is loading. Please wait.
1
Managing XML and Semistructured Data
Lecture 4: Path Expressions Prof. Dan Suciu Spring 2001
2
In this lecture Path expressions Regular path expressions
Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1
3
Path Expressions Examples: Bib.paper Bib.book.publisher
Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects
4
Path Expressions Examples: DB = Bib.paper={&o12,&o29}
&96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” 122 133 paper book references author title year http publisher page firstname lastname first last Bib &o44 &o45 &o46 &o47 &o48 &o49 &o50 &o51 &o52 Examples: DB = Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}
5
Answer of a Path Expression
Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}
6
Regular Path Expressions
R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: Bib.(paper|book).author Bib.book.author.lastname? Bib.book.(references)*.author Bib.(_)*.zip
7
Applications of Regular Path Expressions
Navigating uncertain structure: Bib.book.author.lastname? Syntactic substitute for inheritance: Bib.(paper|book).author Better: Bib.publication.author, but we don’t have inheritance
8
Applications of Regular Path Expressions
Computing transitive closure: Bib.(_)*.zip = everything accessible Bib.book.(references)*.author = everything accessible via references Some regular expressions of doubtful practical use: (references.references)* = a path with an even number of references (_._)* = paths of even length (_._._.(_)?)* = paths of length (3m + 4n) for some m,n But make great examples for illustration
9
Answer of a Regular Path Expression
Recall: Lang(R) = the set of words P generated by R Answer of regular path expressions: Answer(R,DB) = {Answer(P,DB) | P Lang(R)} Need an evaluation algorithm that copes with cycles
10
Regular Path Expressions
Recall: each regular expression NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4
11
Regular Path Expressions
Canonical Evaluation Algorithm Answer(R,DB): construct A from R construct product automaton G = A x DB: nodes(G) = states(A) x nodes(db) edges(G) = {((s,x),L,(s’,x’) | (s,L,s’) edges(A), (x,L,x’) edges(DB)} root(G) = (initial(A), root(DB)) compute Gacc = set of nodes accessible from root(G) return {x | s terminal(A) s.t. (s,x) Gacc}
12
Regular Path Expressions
Example: R = _.(_._)*.a A = DB = s1 s2 s3 _ a &o1 &o2 &o3 &o4 a b Answer of R on DB = { &o2, &o3}
13
Compute Product Automaton G
_ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b
14
Compute Accessible Part Gacc
_ a _ s1,&o1 s2,&o1 s3,&o1 a a a s1,&o2 s2,&o2 s3,&o2 a a a a a a s1,&o3 s1,&o4 s2,&o3 s2,&o4 s3,&o3 s3,&o4 b b b Answer(R,DB) = {&o2, &o3}
15
Complexity of Regular Path Expressions
The evaluation algorithm runs in PTIME in size(R), size(DB) Even when there are cycles in DB
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.