Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001
Strudel and StruQL Strudel = a Website management tool Idea: separate the following three tasks –Management of data use some database –Management of the site’s structure use StruQL –Management of the site’s presentation use HTML templates (this was before XML...)
Example: Bibliography Data {Bib: { paper: { author: “Jones”, author: “Smith”, title: “The Comma”, year: 1994 }, paper: { author: “Jones”, title: “The Dot”, year: 1998 }, paper: { author: “Mark”,.... }... } {Bib: { paper: { author: “Jones”, author: “Smith”, title: “The Comma”, year: 1994 }, paper: { author: “Jones”, title: “The Dot”, year: 1998 }, paper: { author: “Mark”,.... }... } Input data: Bib paper author title year “Jones”“Smith”“The Comma”.....
Simple Website Definition in StruQL WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() Root() HomePage(“Smith”)HomePage(“Jones”)HomePage(“Mark”) person StruQL query: Result: Root(), HomePage(A) = Skolem Functions (more later) “Smith”“Jones”“Mark” name home
Complex Website Definition in StruQL WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T
Example: A Complex Web Site Root() YearPage(“Smith”, 1994) YearPage(“Smith”, 1996) YearPage(“Jones”, 1994) YearPage(“Jones”, 1998) YearPage(“Mark”, 1996) yearentry publication PubPage(“The Comma”)PubPage(“The Dot”) publication title author HomePage(“Smith”)HomePage(“Jones”)HomePage(“Mark”) person “The Comma”“The Dot”
Skolem Functions Maier, 1986 –in OO systems Kifer et al, 1989 –F-logic Hull and Yoshikawa, 1990 –deductive db (ILOG) Papakonstantinou et al., 1996 –semistructured db (MSL)
Skolem Functions in Logic Origins: First Order Logic The Satisfiability problem given a formula , does it have a model ?
Skolem Functions in Logic Example: does have a model ? Skolem functions: replace with functions, drop Fact: has a model iff ’ “has a model”
Skolem Functions in Databases Recall Datalog: Means: Answer(title, author) :- Paper(author, title, year)
Skolem Functions in Databases Now consider: I want to “create a new object x”. What meaning ? Answer(author, x) :- Paper(author, title, year)
Skolem Functions in Databases Better: use Skolem functions directly in Datalog Choices: Answer(author, NewObj(author)) :- Paper(author, title, year) Answer(author, NewObj(author,title)) :- Paper(author, title, year) Answer(author, NewObj(title,year)) :- Paper(author, title, year) Answer(author, NewObj()) :- Paper(author, title, year)
Skolem Functions in StruQL StruQL’s semantics: Input graph: (Node, Edge) Output graph:(Node’, Edge’) Example: WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() WHERE Root -> “Bib.paper.author” -> A CREATE Root(), HomePage(A) LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root() Node’(Root()) :- Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Node’(Root()) :- Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)
XPath (11/99) Building block for other W3C standards: – XSL Transformations (XSLT) – XML Link (XLink) – XML Pointer (XPointer) – XML Query Was originally part of XSL
Example for XPath Queries Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998 Addison-Wesley Serge Abiteboul Rick Hull Victor Vianu Foundations of Databases 1995 Freeman Jeffrey D. Ullman Principles of Database and Knowledge Base Systems 1998
Data Model for XPath bib book publisherauthor.. Addison-WesleySerge Abiteboul The root The root element Much like the Xquery data model
XPath: Simple Expressions /bib/book/year Result: /bib/paper/year Result: empty (there were no papers)
XPath: Restricted Kleene Closure //author Result: Serge Abiteboul Rick Hull Victor Vianu Jeffrey D. Ullman /bib//first-name Result: Rick
Xpath: Text Nodes /bib/book/author/text() Result: Serge Abiteboul Jeffrey D. Ullman Rick Hull doesn’t appear because he has firstname, lastname Functions in XPath: –text() = matches the text value –node() = matches any node (= * or text()) –name() = returns the name of the current tag
Xpath: Wildcard //author/* Result: Rick Hull * Matches any element
Xpath: Attribute Nodes Result: means that price is has to be an attribute
Xpath: Qualifiers /bib/book/author[firstname] Result: Rick Hull
Xpath: More Qualifiers /bib/book/author[firstname][address[//zip][city]]/lastname Result: … …
Xpath: More Qualifiers < “60”] < “25”] /bib/book[author/text()]
Xpath: Summary bibmatches a bib element *matches any element /matches the root element /bibmatches a bib element under root bib/papermatches a paper in bib bib//papermatches a paper in bib, at any depth //papermatches a paper at any depth paper|bookmatches a paper or a a price attribute price attribute in book, in bib matches…
Xpath: More Details An Xpath expression, p, establishes a relation between: –A context node, and –A node in the answer set In other words, p denotes a function: –S[p] : Nodes -> {Nodes} Examples: –author/firstname –. = self –.. = parent –part/*/*/subpart/../name = part/*/*[subpart]/name
The Root and the Root 1 2 bib is the “document element” The “root” is above bib /bib = returns the document element / = returns the root Why ? Because we may have comments before and after ; they become siblings of This is advanced xmlogy
Xpath: More Details We can navigate along 13 axes: ancestor ancestor-or-self attribute child descendant descendant-or-self following following-sibling namespace parent preceding preceding-sibling self
Xpath: More Details Examples: –child::author/child:lastname = author/lastname –child::author/descendant::zip = author//zip –child::author/parent::* = author/.. –child::author/attribute::age = What does this mean ? –paper/publisher/parent::*/author –/bib//address[ancestor::book] –/bib//author/ancestor::*//zip
Xpath: Even More Details name() = the name of the current node –/bib//*[name()=book] same as /bib//book What does this mean ? /bib//*[ancestor::*[name()!=book]] –In a different notation bib.[^book]*._ Navigation axis give us strictly more power !