1 Web Data Management Path Expressions. 2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul,

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

1 Syntax-directed Transformations of XML Streams Stefanie Scherzinger joint work with Alfons Kemper.
Spring Part III: Introduction to XPath XML Path Language.
Web Data Management XQuery 1. In this lecture Summary of XQuery FLWOR expressions – For, Let, Where, Order by, Return FOR and LET expressions Collections.
XML, XML Schema, Xpath and XQuery Slides collated from various sources, many from Dan Suciu at Univ. of Washington.
&o1 &o12&o24&o29 &o43 &o96 &o243 &o206 &o25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author.
Compiler Construction Sohail Aslam Lecture Finite Automaton of Items Then for every item A →  X  we must add an  -transition for every production.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 311 Database Systems I The Semistructured Data Model.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27, Part D Based on slides by Dan Suciu University of.
Agenda from now on Done: SQL, views, transactions, conceptual modeling, E/R, relational algebra. Starting: XML To do: the database engine: –Storage –Query.
Database Management Systems, R. Ramakrishnan1 Introduction to Semistructured Data and XML Chapter 27.
Friday, September 4 th, 2009 The Systems Group at ETH Zurich XML and Databases Exercise Session 6 courtesy of Ghislain Fourny/ETH © Department of Computer.
Managing XML and Semistructured Data Lecture 8: Query Languages - XML-QL Prof. Dan Suciu Spring 2001.
Web-site Management System Strudel Presented by: LAKHLIFI Houda Instructor: Dr. Haddouti.
1 Lecture 10 XML Wednesday, October 18, XML Outline XML (4.6, 4.7) –Syntax –Semistructured data –DTDs.
Query Languages Aswin Yedlapalli. XML Query data model Document is viewed as a labeled tree with nodes Successors of node may be : - an ordered sequence.
1 1. Show the result of each of the following set operations in terms of set property. Write your sets as simple as possible. (a) L 0  L 4 (b) L 0  L.
Managing XML and Semistructured Data Lecture 6: XPath Prof. Dan Suciu Spring 2001.
1 Introduction to Database Systems CSE 444 Lecture 11 Xpath/XQuery April 23, 2008.
1 Lecture 11: Xpath/XQuery Friday, October 20, 2006.
Managing XML and Semistructured Data Lecture 16: Indexes Prof. Dan Suciu Spring 2001.
Managing XML and Semistructured Data
Managing XML and Semistructured Data Lecture 14: Constraints and Keys Prof. Dan Suciu Spring 2001.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 08: XML and Semistructured Data. 2 Outline XML (Section 17) –XML syntax, semistructured data –Document Type Definitions (DTDs) XPath.
1 Lecture 16: Querying XML Data: XPath, XQuery Friday, February 11, 2005.
Querying XML February 12 th, Querying XML Data XPath = simple navigation through the tree XQuery = the SQL of XML XSLT = recursive traversal –will.
IS432: Semi-Structured Data Dr. Azeddine Chikh. 1. Semi Structured Data Object Exchange Model.
Introduction to XQuery Resources: Official URL: Short intros:
XML by Dan Suciu 1 Introduction to Semistructured Data and XML Based on slides by Dan Suciu University of Washington.
XML and XPath. Web Services: XML+XPath2 EXtensible Markup Language (XML) a W3C standard to complement HTML A markup language much like HTML origins: structured.
S EMISTRUCTURED D ATA AND XML D ATA F ILES ON THE W EB HTML documents often generated by applications consumed by humans only easy access: across.
Web Data Management Indexes. In this lecture Indexes –XSet –Region algebras –Indexes for Arbitrary Semistructured Data –Dataguides –T-indexes –Index Fabric.
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001.
Managing XML and Semistructured Data Lecture 13: XDuce and Regular Tree Languages Prof. Dan Suciu Spring 2001.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
Lecture 5: XML Tuesday, January 16, Outline XML, DTDs (Data on the Web, 3.1) Semistructured data in XML (3.2) Exporting Relational Data in XML (8.3.1)
CSE 636 Data Integration Fall 2006 XML Query Languages XPath.
XML query. introduction An XML document can represent almost anything, and users of an XML query language expect it to perform useful queries on whatever.
1 Introduction to Semistructured Data and XML. 2 How the Web is Today  HTML documents often generated by applications consumed by humans only easy access:
More XML: semantics, DTDs, XPATH February 18, 2004.
Management of XML and Semistructured Data Lecture 11: Schemas Wednesday, May 2nd, 2001.
1 XQuery Slides From Dr. Suciu. 2 XQuery Based on Quilt, which is based on XML-QL Uses XPath to express more complex queries.
Part One XML and Databases Soumen Chakrabarti CSE, IIT Bombay.
IS432 Semi-Structured Data Lecture 4: XPath Dr. Gamal Al-Shorbagy.
1 CSE 326: Data Structures: Graphs Lecture 24: Friday, March 7 th, 2003.
Title Page The title page is the first page in the book. It tells you the title of the book, the author and the illustrator. It also tells you who published.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Lecture 17: XPath and XQuery Wednesday, Nov. 7, 2001.
1 Lecture 12: XML, XPath, XQuery Friday, October 24, 2003.
S EMISTRUCTURED D ATA AND XML D ISCUSSION Q UESTION Think about your personal Itunes library. Should it be maintained in a database system?
Lecture 14: Relational Algebra Projects XML?
XML path expressions CSE 350 Fall 2003.
Managing XML and Semistructured Data
Management of XML and Semistructured Data
Lecture 11: Xpath/XQuery
Management of XML and Semistructured Data
Two issues in lexical analysis
Managing XML and Semistructured Data
Managing XML and Semistructured Data
On Inferring K Optimum Transformations of XML Document from Update Script to DTD Nobutaka Suzuki Graduate School of Library, Information and Media Studies.
Lecture 12: XML, XPath, XQuery
Lecture 10: Query Complexity
Lecture 9: XML Monday, October 17, 2005.
Wednesday, May 29, 2002 XML Storage Final Review
Semi-structured Data In many applications, data does not have a rigidly and predefined schema: e.g., structured files, scientific data, XML. Managing such.
Introduction to Database Systems CSE 444 Lecture 10 XML
Lecture 15: Querying XML Friday, October 27, 2000.
Lecture 11: XML and Semistructured Data
Presentation transcript:

1 Web Data Management Path Expressions

2 In this lecture Path expressions Regular path expressions Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1

3 Path Expressions Examples: Bib.paper Bib.book.publisher Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects

4 Path Expressions Examples: DB = &o1 &o12&o24&o29 &o43 &o70&o71 &96 &243 &206 &25 “Serge” “Abiteboul” 1997 “Victor” “Vianu” paper book paper references author title year http author title publisher author title page firstname lastname firstnamelastnamefirst last Bib &o44&o45&o46 &o47&o48 &o49 &o50 &o51 &o52 Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206} Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}

5 Answer of a Path Expression Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): –PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f( , x) = {x} f(L.P, x) =  {f(P,y) |  (x,L,y)  edges(DB)} Answer(P, DB) = f(P, root(DB)) Where: f( , x) = {x} f(L.P, x) =  {f(P,y) |  (x,L,y)  edges(DB)}

6 Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: Bib.(paper|book).author Bib.book.author.lastname? Bib.book.(references)*.author Bib.(_)*.zip

7 Applications of Regular Path Expressions Navigating uncertain structure: –Bib.book.author.lastname? Syntactic substitution for inheritance: –Bib.(paper|book).author –Better: Bib.publication.author, but we don’t have inheritance

8 Applications of Regular Path Expressions Computing transitive closure: –Bib.(_)*.zip = everything accessible –Bib.book.(references)*.author = everything accessible via references Some regular expressions of doubtful practical use: –(references.references)* = a path with an even number of references –(_._)* = paths of even length –(_._._.(_)?)* = paths of length (3m + 4n) for some m,n But make great examples for illustration

9 Answer of a Regular Path Expression Recall: –Lang(R) = the set of words P generated by R Answer of regular path expressions: –Answer(R,DB) =  {Answer(P,DB) | P  Lang(R)} Need an evaluation algorithm that copes with cycles

10 Regular Path Expressions Recall: each regular expression  NDFA Example: R = (a.a)*.a.b A = s1s2 s3 s4 a a a b states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4}

11 Regular Path Expressions Canonical Evaluation Algorithm Answer(R,DB): 1.construct A from R 2.construct product automaton G = A x DB: –nodes(G) = states(A) x nodes(db) –edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} –root(G) = (initial(A), root(DB)) 3.compute G acc = set of nodes accessible from root(G) 4.return {x | s  terminal(A) s.t. (s,x)  G acc } Answer(R,DB): 1.construct A from R 2.construct product automaton G = A x DB: –nodes(G) = states(A) x nodes(db) –edges(G) = {((s,x),L,(s’,x’) | (s,L,s’)  edges(A), (x,L,x’)  edges(DB)} –root(G) = (initial(A), root(DB)) 3.compute G acc = set of nodes accessible from root(G) 4.return {x | s  terminal(A) s.t. (s,x)  G acc }

12 Regular Path Expressions Example: R = _.(_._)*.a A = DB = &o1 &o2 &o3&o4 a a b a s1s2 s3 _ _ a Answer of R on DB = { &o2, &o3}

13 Compute Product Automaton G s1,&o1 s1,&o2 s1,&o3s1,&o4 a a b a s2,&o1 s2,&o2 s2,&o3s2,&o4 a a b a s3,&o1 s3,&o2 s3,&o3s3,&o4 a a b a _ _ a

14 Compute Accessible Part G acc s1,&o1 s1,&o2 s1,&o3s1,&o4 a a b a s2,&o1 s2,&o2 s2,&o3s2,&o4 a a b a s3,&o1 s3,&o2 s3,&o3s3,&o4 a a b a _ _ a Answer(R,DB) = {&o2, &o3}

15 Complexity of Regular Path Expressions The evaluation algorithm runs in PTIME in size(R), size(DB) Even when there are cycles in DB