Presentation is loading. Please wait.

Presentation is loading. Please wait.

Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented.

Similar presentations


Presentation on theme: "Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented."— Presentation transcript:

1 Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented by Gil Barash in the course SDBI 05 ’

2 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

3 XQuery XQuery is an XML Query Language. XQuery is an XML Query Language. Sometimes referred as the SQL of XML files. It is built on XPath expressions. It is supported by all major database engines. It will soon become a W3C standard.

4 XPath XPath is used to navigate through XML documents. XPath is used to navigate through XML documents. In order for us to write an XQuery query, we should first get familiar with XPath …

5 Bibliography XML (version 1) 1999 SQL Bob XML Mary … bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

6 XPath - example 1999 SQL Bob XML Mary … The expression: /bibliograph/bib/* Will return the nodes:, and Will return the nodes:, and Look from the root of the document Under the path “ bibliography/bib ” For all child nodes / bibliograph/bib /*

7 XPath - example The expression: /bibliography//title Will return both the titles “ SQL ” and “ XML ” For all child nodes of the root which are named “ bibliography ” Look for any descendent (not only direct children) For the nodes named “ title ” /bibliography // title 1999 SQL Bob XML Mary …

8 XPath - example 1999 SQL Bob XML Mary … The expression: //bib[1] Will return the sub tree rooted by the first ‘ bib ’ // bib[1] Look somewhere in the document For the 1st bib node

9 XQuery queries FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title Suppose we want to find the title of the book of which Mary is an author. Our Query will be:

10 XQuery - example For all sub trees (marked as $x) in the document “ doc.xml ” under the XPath: /bibliograyph/bib/book FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” If in the sub tree $x there is a path /author/ and the text of the node at the end of the path is “ Mary ”.

11 XQuery - example Return the node which is under the path /title from the $x sub tree. RETURN $x/title

12 Bibliography XML (version 1) bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title

13 XQuery - example Suppose we want to find the authors that wrote a book with Mary. Suppose we want to find the authors that wrote a book with Mary. bibliography bib year article book 1999 titleauthor SQLMary titleauthor XMLMary bib yearbook 2000titleauthor D.B.David author Bill

14 XQuery - example Suppose we want to find the authors that wrote a book with Mary. Suppose we want to find the authors that wrote a book with Mary. FOR $b IN doc( “ doc.xml ” )/bibliography/bib/book, $a IN $b/author WHERE $b/author/text()= “ Mary ” AND $a/text() != “ Mary ” RETURN $a

15 XQuery - example FOR $b IN doc( “ doc.xml ” )/bibliography/bib/book, $a IN $b/author  For all sub trees (marked as $b) in the document “ doc.xml ” under the XPath: /bibliograyph/bib/book  And all sub trees (marked as $a) in the tree $b under the XPath: /author Ahhh … $b is a book and $a is an author of the book

16 XQuery - example WHERE $b/author/text()= “ Mary ” AND $a/text() != “ Mary ”  If $b contains a path /author ending with “ Mary ”  And $a isn ’ t “ Mary ” RETURN $a Return the sub tree $a

17 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

18 The Schema-Based problem Remember the first query? Remember the first query? We wanted to find a title of a book of which Mary is an author. We wanted to find a title of a book of which Mary is an author. We never said that it will be under the path /bibliography/bib/book We never said that it will be under the path /bibliography/bib/book FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title

19 The Schema-Based problem Furthermore Furthermore Suppose we want to get the year of the book that Mary wrote … 1999 SQL Mary … Notice that the year of the book IS NOT a descendent node of the book node, but of the bib node

20 The Schema-Based problem FOR $x in doc( “ doc.xml ” )/bibliography/bib/ WHERE $x/book/author/text()= “ Mary ” RETURN $x/year $x is now the bib node. If there exists a book written by Mary under that bib then the year of that bib is returned Before: Before: FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title After: After: (getting the title) (getting the year)

21 The Schema-Based problem We could have never written that query without knowledge about the structure of the XML file. We could have never written that query without knowledge about the structure of the XML file. The query we wrote will not work on other files, even if they represent the same data, under a different structure. The query we wrote will not work on other files, even if they represent the same data, under a different structure.

22 Bibliography XML (version 2) 1999 SQL Bob 2000 D.B. David … bibliography bib year book 1999 titleauthor SQLBob bib year book 2000 titleauthor D.B.David BeforeAfter

23 The Schema-Based problem FOR $x in doc( “ doc.xml ” )/bibliography/bib/ WHERE $x/book/author/text()= “ Mary ” RETURN $x/year bibliography bib year book 1999 titleauthor SQLBob bib year book 2000 titleauthor D.B.David Our query (getting the year) from before: Our query (getting the year) from before: $x is a ‘ bib ’ node, and it has no child named year

24 3 kinds of people … If the user has FULL knowledge of the structure, she can simply use XQuery. If the user has FULL knowledge of the structure, she can simply use XQuery. If the user has NO knowledge of the structure, she can use keyword based queries (like XKeyword) If the user has NO knowledge of the structure, she can use keyword based queries (like XKeyword) If the user has PARTIAL knowledge of the structure, she can use schema-free queries, and make good use of her knowledge. If the user has PARTIAL knowledge of the structure, she can use schema-free queries, and make good use of her knowledge.

25 Partial knowledge Suppose you want to search all the books about Albert Einstein … Suppose you want to search all the books about Albert Einstein … If you will be using a keyword based search. You will enter the keyword “ Albert Einstein ”. If you will be using a keyword based search. You will enter the keyword “ Albert Einstein ”. Now, what if you want all the books written by Albert Einstein? Now, what if you want all the books written by Albert Einstein? Your query will not change. Even though you know what you are really looking for. Your query will not change. Even though you know what you are really looking for.

26 XQuery with partial knowledge Suppose we want to find the title and year of the publications of which Mary is an author: FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c } All we know are the names of the nodes which we are looking for

27 XQuery with partial knowledge bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c }

28 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

29 LCA We would like to guess which part of the XML document is relevant for our search. We would like to guess which part of the XML document is relevant for our search. By reducing the XML tree, we would get more precise answers and avoid wrong ones. By reducing the XML tree, we would get more precise answers and avoid wrong ones. bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

30 LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary What is the LCA of “ title ” and “ author ” ?

31 LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary The LCA of “ author ” and “ title ” “ book ” is the root of the tree we should look within.

32 LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary The LCA of “ author ” and “ title ” “ bib ” doesn ’ t help us refine our search

33 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

34 MLCA Blindly computing the LCA might bring undesired results. Blindly computing the LCA might bring undesired results. What we are looking for is: Meaningful Lowest Common Ancestor What we are looking for is: Meaningful Lowest Common Ancestor

35 Entity Type A Type of a node is it ’ s tag name A Type of a node is it ’ s tag name bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary Nodes of the “ title ” type

36 Meaningfully Related AB Consider two nodes “ A ” and “ B ”, of type “ T1 ” and “ T2 ” respectively. Consider two nodes “ A ” and “ B ”, of type “ T1 ” and “ T2 ” respectively. If, we say that A and B are meaningfully related. If, we say that A and B are meaningfully related. If, we say that A and B are related, being descendents of node C. If, we say that A and B are related, being descendents of node C. So far, this is much like LCA … So far, this is much like LCA … AB C

37 Meaningfully Related There is an exception to the second case: There is an exception to the second case: Suppose that node B* is of the same type as B AB* C B D In this case, nodes “ A ” and “ B ” are NOT meaningfully related. AuthorTitle book Title bib

38 MLCA So we say that a node “ D ” is the MLCA of nodes “ A ” and “ B ” if: So we say that a node “ D ” is the MLCA of nodes “ A ” and “ B ” if: –“ D ” is a common ancestor of nodes “ A ” and “ B ”. –There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ” AB* C B D X

39 MLCA For multiple nodes, we require that all the subsets will have a MLCA and that the MLCA of the whole set will be an ancestor of the MLCAs of the subsets. For multiple nodes, we require that all the subsets will have a MLCA and that the MLCA of the whole set will be an ancestor of the MLCAs of the subsets. yearbook 2000titleauthor D.B.David titleauthor.NETBill bib For example, if we are looking at the types: year, title and author bib is the MLCA of the types: year, title and author book is the MLCA of the types: title and author

40 MLCA FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c } Lets ’ try the query again … Lets ’ try the query again …

41 Bibliography XML bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill “ bib ” is the MLCA of “ author ”, “ title ” and “ year ” “ bib ” is the MLCA of “ author ”, “ title ” and “ year ” “ author ” = Mary “ author ” = Mary year 1999 year 1999 title SQL title XML FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c }

42 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

43 MLCAS The result of the query was almost right. The result of the query was almost right. The problem was that “ bib ” is the MLCA of several groups of nodes which satisfy the query. The problem was that “ bib ” is the MLCA of several groups of nodes which satisfy the query. To solve this, we use: Meaningful Lowest Common Ancestor Structure To solve this, we use: Meaningful Lowest Common Ancestor Structure bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary year 1999 year 1999 title SQL title XML Nodes requested: Title Title Author Author Year Year

44 MLCAS Given a set of types {t 1 … t m } from the query Given a set of types {t 1 … t m } from the query MLCAS is a set of nodes {r, a 1, …, a m } MLCAS is a set of nodes {r, a 1, …, a m } Where {a 1 … a m } are nodes matching the types {t 1 … t m } Where {a 1 … a m } are nodes matching the types {t 1 … t m } And r is the MLCA of {a 1 … a m } And r is the MLCA of {a 1 … a m }

45 MLCAS example We are looking for the types: Author, Title and Year. We are looking for the types: Author, Title and Year. Set of nodes matching those types: Set of nodes matching those types: The MLCA of the set: The MLCA of the set: bibliography yearbook 1999titleauthor SQLBob titleauthor XMLMary yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill {David, SQL, 1999} There is none bib nodes are the MLCA of the types: Author, Title, Year bibliography is the LCA of the nodes: David, SQL, 1999 So this set isn ’ t good for us So this set is good for us bib[2]bib[1] {Mary, SQL, 1999} book is the MLCA of the types: Title, Author bib is the LCA of the nodes: Mary, SQL {Bob, SQL, 1999} bib[2]

46 MLCAS query example FOR $a in doc( “ doc.xml ” )//year, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//author WHERE $c/text()= “ Mary ” RETURN { $a, $b } bib yeararticlebook 1999title author SQL Bob title author XML Mary year 1999 year 1999 title SQL title XML bib author Bob author Mary

47 Other work on creating meaningful results “ Integrating Keyword Search into XML Query Processing (XML-QL) ” - Daniela Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Kossmann from Univ. of Passau, Germany. “ Integrating Keyword Search into XML Query Processing (XML-QL) ” - Daniela Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Kossmann from Univ. of Passau, Germany. –Use of hierarchical location in the XML (at what level the keyword should be). –Use of semantical location in the XML (tag name, CDATA, attribute … ) –Use of the user ’ s knowledge of the structure of the XML file (Ex: if she knows that books are under the bib tag she can ask for those elements only).

48 “ XSEarch: A Semantic Search Engine for XML ” - Sara Cohen, Jonathan Mamou, Yaron Kanza and Yehoshua Sagiv from the Hebrew University. – –Enables the user to specify a tag name under which the keyword should be found. – –Use of the fact that if the shortest path between two elements goes through the same tag name more than once, they are probably not meaningfully related. – –Gives ranking to the results. Other work on creating meaningful results book titleauthor D.B.David titleauthor.NETBill bib

49 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery –mlcas –Expand Conclusion Conclusion

50 Integrating MLCAS with XQuery In order for us to integrate MLCAS into XQuery we will introduce a new function into XQuery: mlcas (surprising, isn't it?) In order for us to integrate MLCAS into XQuery we will introduce a new function into XQuery: mlcas (surprising, isn't it?) Whenever we want to make sure that the nodes exist in an MLCAS, we will add the condition: exists mlcas ($a, $b, $c) (exists is a keyword in XQuery) Whenever we want to make sure that the nodes exist in an MLCAS, we will add the condition: exists mlcas ($a, $b, $c) (exists is a keyword in XQuery)

51 Query example number 1 Find the title and year of the publications of which Mary is an author. Find the title and year of the publications of which Mary is an author. for $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year where $a/text() = “ Mary ” and exists mlcas ($a, $b, $c) return { $b, $c } This will make sure that the “ author ”, “ title ” and “ year ” that we get, are really of the same publication

52 Query example number 2 Find additional authors of the publications, of which Mary is an author Find additional authors of the publications, of which Mary is an author for $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//author where $a/text() = “ Mary ” and $a != $b and exists mlcas ($a, $b) return $b This will make sure that both the authors are really of the same publication

53 Query example number 3 Find year and author of the publications with similar titles to a publication of which Mary is an author Find year and author of the publications with similar titles to a publication of which Mary is an author for $a in doc( “ doc.xml ” )//author, $t in doc( “ doc.xml ” )//title, $y in doc( “ doc.xml)//year, $t2 in { for $aM in doc( “ doc.xml ” )//author, $tM in doc( “ doc.xml ” )//title where $aM/text() = “ Mary ” and exists mlcas($aM, $tM) return $tM } where $t ≈ $t2 and exists mlcas ($y, $a, $t) return { $y, $a }

54 Not integrated enough? The user who will want to use the MLCAS feature will have to add the line: and exists mlcas($a, $b, … ) to the where statement. The user who will want to use the MLCAS feature will have to add the line: and exists mlcas($a, $b, … ) to the where statement. This might not be simple enough, especially when changing an already existing query. This might not be simple enough, especially when changing an already existing query.

55 The mlcas keyword The keyword mlcas will be used to ask the system to use MLCAS when choosing nodes: The keyword mlcas will be used to ask the system to use MLCAS when choosing nodes: for $a in mlcas doc( “ doc.xml ” )//author, $b in mlcas doc( “ doc.xml ” )//title where $a/text() = “ Mary ” return $b and exists mlcas ($a, $b)

56 Some we know Suppose you do know that you are interested only in the first ‘ bib ’ node Suppose you do know that you are interested only in the first ‘ bib ’ node You can make use of your knowledge … You can make use of your knowledge … bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

57 Some we know for $b in doc( “ doc.xml ” )//bib[1], $a in mlcas $b//author, $t in mlcas $b//title return { $a, $t } bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

58 Some we know bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill for $b in doc( “ doc.xml ” )//bib[1], $a in mlcas $b//author, $t in mlcas $b//title return { $a, $t }

59 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery –mlcas –Expand Conclusion Conclusion

60 Many ways to say … There are different tag names that represent the same thing. Author: Author / Writer / Au Title: Title / Name / Headline There are different tag names that represent the same thing. Author: Author / Writer / Au Title: Title / Name / Headline Less then 20% choose the same term for a single well known object. Less then 20% choose the same term for a single well known object. Our partial knowledge of the XML file will still have to be accurate of how it tags the information we want. Our partial knowledge of the XML file will still have to be accurate of how it tags the information we want.

61 The expand keyword To solve this issue, we will include yet another keyword: expand To solve this issue, we will include yet another keyword: expand Whenever we are not sure of the exact tag name, we could use the expand keyword to find it for us. Whenever we are not sure of the exact tag name, we could use the expand keyword to find it for us. for $a in mlcas doc( “ doc.xml ” )//expand(author), $b in mlcas doc( “ doc.xml ” )//title where $a/text() = “ Mary ” return $b

62 The expand keyword The synonyms of a word can be found using a domain-specific thesaurus (developed by domain experts or WordNet). The synonyms of a word can be found using a domain-specific thesaurus (developed by domain experts or WordNet). Another application is an ontology-driven hierarchical thesaurus. For example, use the word “ publication ” to get both “ book ” and “ article ” tags. Another application is an ontology-driven hierarchical thesaurus. For example, use the word “ publication ” to get both “ book ” and “ article ” tags. Think of other applications where this can useful. (google?) Think of other applications where this can useful. (google?)

63 Ontology-based Query Processing An Ontology for Domain-oriented Semantic Similarity Search On XML Data - An Ontology for Domain-oriented Semantic Similarity Search On XML Data - Anja Theobald from the university of the Saarland, Germany. – –Use of tag name and keyword similarity. – –Use of WordNet and Google to give a ranking to how similar objects are.   WordNet is used to get synonyms or broader terms   Google is used to get a rank of how close two terms are – –Gives ranking to the results.

64 Ontology-based Query Processing Taken from “ The Index Based XXL Search Engine for Querying XML Data with Relevance Ranking ” by: Anja Theobald, Gerhard Weikum University of the Saarland, Germany

65 Ontology-based Query Processing (taken from a presentation of Anja Theobald - 26.02.03)  XXL Query:... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ sim(universe, galaxy) 0.94 1.0 sim(star, sun) *  tfidf (sun) 0.43  XXL Query Representation: ~universe ~appearance % % ~ “star” 1.0 sim(app, app) 1.0  XML Data Graph: galaxy object “…light and heat…” description sun appearance location history

66 Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

67 Conclusion We wanted to find a way to get accurate results from an XML file which it ’ s structure we don ’ t know. We wanted to find a way to get accurate results from an XML file which it ’ s structure we don ’ t know. We used the MLCAS concept to get meaningful results. We used the MLCAS concept to get meaningful results. We integrated the ability into an already existing query language. We integrated the ability into an already existing query language.

68 Thank you Questions?

69 Computing MLCAS One could implement MLCAS computation using the definition of MLCAS: One could implement MLCAS computation using the definition of MLCAS: –“ D ” is a MLCA for nodes “ A ” and “ B ” of types “ T1 ” and “ T2 ” respectively. If:  “ D ” is a common ancestor of nodes “ A ” and “ B ”.  There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ” Take each pair {n1, n2} when “ n1 ” and “ n2 ” are of types “ T1 ” and “ T2 ” respectively. Take each pair {n1, n2} when “ n1 ” and “ n2 ” are of types “ T1 ” and “ T2 ” respectively. Find their LCA by going up from both the nodes till you find a common ancestor. And produce a tree, rooted by the LCA, with n1 and n2 as it ’ s leaves. Find their LCA by going up from both the nodes till you find a common ancestor. And produce a tree, rooted by the LCA, with n1 and n2 as it ’ s leaves. For each pair of trees that you found (TA and TB), if the root of TA is a descendent of the root of TB, remove TB. For each pair of trees that you found (TA and TB), if the root of TA is a descendent of the root of TB, remove TB. –Because TB contradicts the second rule:  There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ”


Download ppt "Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented."

Similar presentations


Ads by Google