Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented.

Slides:



Advertisements
Similar presentations
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA.
Advertisements

XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
XSLT 11-Apr-17.
1 XSLT – eXtensible Stylesheet Language Transformations Modified Slides from Dr. Sagiv.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
XQUERY. What is XQuery? XQuery is the language for querying XML data The best way to explain XQuery is to say that XQuery is to XML what SQL is to database.
1 XQuery Web and Database Management System. 2 XQuery XQuery is to XML what SQL is to database tables XQuery is designed to query XML data What is XQuery?
XSEarch XML Search Engine Jonathan MAMOU October 2002.
ISP 433/533 Week 2 IR Models.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Trees. 2 Definition of a tree A tree is like a binary tree, except that a node may have any number of children Depending on the needs of the program,
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
1 COS 425: Database and Information Management Systems XML and information exchange.
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
AT, Anja Theobald University of the Saarland, Germany An Ontology for Domain-oriented Semantic.
XSEarch: A Semantic Search Engine for XML Sara Cohen Jonathan Mamou Yaron Kanza Yehoshua Sagiv Presented at VLDB 2003, Germany.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
September 15, 2003Houssam Haitof1 XSL Transformation Houssam Haitof.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
SD2520 Databases using XML and JQuery
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
HTML Tags Basic Tags Doctype or HTML Head Title Body Use the website to find the definitions
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
Basics of Information Retrieval Lillian N. Cassel Some of these slides are taken or adapted from Source:
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Querying Structured Text in an XML Database By Xuemei Luo.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
Inexact Querying of XML. XML Data May be Irregular Relational data is regular and organized. XML may be very different. –Data is incomplete: Missing values.
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Database Systems Part VII: XML Querying Software School of Hunan University
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
The Internet Do you really know what is out there?
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
XML and Database.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
1 Typing XQuery WANG Zhen (Selina) Something about the Internship Group Name: PROTHEO, Inria, France Research: Rewriting and strategies, Constraints,
Submitted To: Ms. Poonam Saini, Asst. Prof., NITTTR Submitted By: Rohit Handa ME (Modular) CSE 2011 Batch.
XPath. XPath, the XML Path Language, is a query language for selecting nodes from an XML document. The XPath language is based on a tree representation.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Chapter 5 The Semantic Web 1. The Semantic Web  Initiated by Tim Berners-Lee, the inventor of the World Wide Web.  A common framework that allows data.
Querying XML, Part II Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems February 5, 2008.
 Every word matters. Generally, all the words you put in the query will be used.  Search is always case insensitive. A search for [ new york times ]
XRANK: RANKED KEYWORD SEARCH OVER XML DOCUMENTS Lin Guo Feng Shao Chavdar Botev Jayavel Shanmugasundaram Abhishek Chennaka, Alekhya Gade Advanced Database.
1 Efficient Processing of Partially Specified Twig Queries Junfeng Zhou Renmin University of China.
XML Query languages--XPath. Objectives Understand XPath, and be able to use XPath expressions to find fragments of an XML document Understand tree patterns,
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
Querying and Transforming XML Data
Improving Data Discovery Through Semantic Search
(b) Tree representation
Issues in Knowledge Representation
Introduction into Knowledge and information
Enabling Unambiguous GRDDL Results
MCN: A New Semantics Towards Effective XML Keyword Search
Advanced Database Concepts: Reports & Views
Search Engine Architecture
Discussion Class 9 Google.
Introduction to XML IR XML Group.
Presentation transcript:

Schema-Free XQuery Based on the work of: Yanyao Li, Cong Yu and H.V.Jagadish From the University of Michigan From the University of Michigan Presented by Gil Barash in the course SDBI 05 ’

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

XQuery XQuery is an XML Query Language. XQuery is an XML Query Language. Sometimes referred as the SQL of XML files. It is built on XPath expressions. It is supported by all major database engines. It will soon become a W3C standard.

XPath XPath is used to navigate through XML documents. XPath is used to navigate through XML documents. In order for us to write an XQuery query, we should first get familiar with XPath …

Bibliography XML (version 1) 1999 SQL Bob XML Mary … bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

XPath - example 1999 SQL Bob XML Mary … The expression: /bibliograph/bib/* Will return the nodes:, and Will return the nodes:, and Look from the root of the document Under the path “ bibliography/bib ” For all child nodes / bibliograph/bib /*

XPath - example The expression: /bibliography//title Will return both the titles “ SQL ” and “ XML ” For all child nodes of the root which are named “ bibliography ” Look for any descendent (not only direct children) For the nodes named “ title ” /bibliography // title 1999 SQL Bob XML Mary …

XPath - example 1999 SQL Bob XML Mary … The expression: //bib[1] Will return the sub tree rooted by the first ‘ bib ’ // bib[1] Look somewhere in the document For the 1st bib node

XQuery queries FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title Suppose we want to find the title of the book of which Mary is an author. Our Query will be:

XQuery - example For all sub trees (marked as $x) in the document “ doc.xml ” under the XPath: /bibliograyph/bib/book FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” If in the sub tree $x there is a path /author/ and the text of the node at the end of the path is “ Mary ”.

XQuery - example Return the node which is under the path /title from the $x sub tree. RETURN $x/title

Bibliography XML (version 1) bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title

XQuery - example Suppose we want to find the authors that wrote a book with Mary. Suppose we want to find the authors that wrote a book with Mary. bibliography bib year article book 1999 titleauthor SQLMary titleauthor XMLMary bib yearbook 2000titleauthor D.B.David author Bill

XQuery - example Suppose we want to find the authors that wrote a book with Mary. Suppose we want to find the authors that wrote a book with Mary. FOR $b IN doc( “ doc.xml ” )/bibliography/bib/book, $a IN $b/author WHERE $b/author/text()= “ Mary ” AND $a/text() != “ Mary ” RETURN $a

XQuery - example FOR $b IN doc( “ doc.xml ” )/bibliography/bib/book, $a IN $b/author  For all sub trees (marked as $b) in the document “ doc.xml ” under the XPath: /bibliograyph/bib/book  And all sub trees (marked as $a) in the tree $b under the XPath: /author Ahhh … $b is a book and $a is an author of the book

XQuery - example WHERE $b/author/text()= “ Mary ” AND $a/text() != “ Mary ”  If $b contains a path /author ending with “ Mary ”  And $a isn ’ t “ Mary ” RETURN $a Return the sub tree $a

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

The Schema-Based problem Remember the first query? Remember the first query? We wanted to find a title of a book of which Mary is an author. We wanted to find a title of a book of which Mary is an author. We never said that it will be under the path /bibliography/bib/book We never said that it will be under the path /bibliography/bib/book FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title

The Schema-Based problem Furthermore Furthermore Suppose we want to get the year of the book that Mary wrote … 1999 SQL Mary … Notice that the year of the book IS NOT a descendent node of the book node, but of the bib node

The Schema-Based problem FOR $x in doc( “ doc.xml ” )/bibliography/bib/ WHERE $x/book/author/text()= “ Mary ” RETURN $x/year $x is now the bib node. If there exists a book written by Mary under that bib then the year of that bib is returned Before: Before: FOR $x IN doc( “ doc.xml ” )/bibliography/bib/book WHERE $x/author/text()= “ Mary ” RETURN $x/title After: After: (getting the title) (getting the year)

The Schema-Based problem We could have never written that query without knowledge about the structure of the XML file. We could have never written that query without knowledge about the structure of the XML file. The query we wrote will not work on other files, even if they represent the same data, under a different structure. The query we wrote will not work on other files, even if they represent the same data, under a different structure.

Bibliography XML (version 2) 1999 SQL Bob 2000 D.B. David … bibliography bib year book 1999 titleauthor SQLBob bib year book 2000 titleauthor D.B.David BeforeAfter

The Schema-Based problem FOR $x in doc( “ doc.xml ” )/bibliography/bib/ WHERE $x/book/author/text()= “ Mary ” RETURN $x/year bibliography bib year book 1999 titleauthor SQLBob bib year book 2000 titleauthor D.B.David Our query (getting the year) from before: Our query (getting the year) from before: $x is a ‘ bib ’ node, and it has no child named year

3 kinds of people … If the user has FULL knowledge of the structure, she can simply use XQuery. If the user has FULL knowledge of the structure, she can simply use XQuery. If the user has NO knowledge of the structure, she can use keyword based queries (like XKeyword) If the user has NO knowledge of the structure, she can use keyword based queries (like XKeyword) If the user has PARTIAL knowledge of the structure, she can use schema-free queries, and make good use of her knowledge. If the user has PARTIAL knowledge of the structure, she can use schema-free queries, and make good use of her knowledge.

Partial knowledge Suppose you want to search all the books about Albert Einstein … Suppose you want to search all the books about Albert Einstein … If you will be using a keyword based search. You will enter the keyword “ Albert Einstein ”. If you will be using a keyword based search. You will enter the keyword “ Albert Einstein ”. Now, what if you want all the books written by Albert Einstein? Now, what if you want all the books written by Albert Einstein? Your query will not change. Even though you know what you are really looking for. Your query will not change. Even though you know what you are really looking for.

XQuery with partial knowledge Suppose we want to find the title and year of the publications of which Mary is an author: FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c } All we know are the names of the nodes which we are looking for

XQuery with partial knowledge bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c }

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

LCA We would like to guess which part of the XML document is relevant for our search. We would like to guess which part of the XML document is relevant for our search. By reducing the XML tree, we would get more precise answers and avoid wrong ones. By reducing the XML tree, we would get more precise answers and avoid wrong ones. bibliography bib yeararticlebook 1999titleauthor SQLMary titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary What is the LCA of “ title ” and “ author ” ?

LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary The LCA of “ author ” and “ title ” “ book ” is the root of the tree we should look within.

LCA Lowest Common Ancestor Lowest Common Ancestor bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary The LCA of “ author ” and “ title ” “ bib ” doesn ’ t help us refine our search

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

MLCA Blindly computing the LCA might bring undesired results. Blindly computing the LCA might bring undesired results. What we are looking for is: Meaningful Lowest Common Ancestor What we are looking for is: Meaningful Lowest Common Ancestor

Entity Type A Type of a node is it ’ s tag name A Type of a node is it ’ s tag name bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary Nodes of the “ title ” type

Meaningfully Related AB Consider two nodes “ A ” and “ B ”, of type “ T1 ” and “ T2 ” respectively. Consider two nodes “ A ” and “ B ”, of type “ T1 ” and “ T2 ” respectively. If, we say that A and B are meaningfully related. If, we say that A and B are meaningfully related. If, we say that A and B are related, being descendents of node C. If, we say that A and B are related, being descendents of node C. So far, this is much like LCA … So far, this is much like LCA … AB C

Meaningfully Related There is an exception to the second case: There is an exception to the second case: Suppose that node B* is of the same type as B AB* C B D In this case, nodes “ A ” and “ B ” are NOT meaningfully related. AuthorTitle book Title bib

MLCA So we say that a node “ D ” is the MLCA of nodes “ A ” and “ B ” if: So we say that a node “ D ” is the MLCA of nodes “ A ” and “ B ” if: –“ D ” is a common ancestor of nodes “ A ” and “ B ”. –There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ” AB* C B D X

MLCA For multiple nodes, we require that all the subsets will have a MLCA and that the MLCA of the whole set will be an ancestor of the MLCAs of the subsets. For multiple nodes, we require that all the subsets will have a MLCA and that the MLCA of the whole set will be an ancestor of the MLCAs of the subsets. yearbook 2000titleauthor D.B.David titleauthor.NETBill bib For example, if we are looking at the types: year, title and author bib is the MLCA of the types: year, title and author book is the MLCA of the types: title and author

MLCA FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c } Lets ’ try the query again … Lets ’ try the query again …

Bibliography XML bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill “ bib ” is the MLCA of “ author ”, “ title ” and “ year ” “ bib ” is the MLCA of “ author ”, “ title ” and “ year ” “ author ” = Mary “ author ” = Mary year 1999 year 1999 title SQL title XML FOR $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year WHERE $a/text()= “ Mary ” RETURN { $b, $c }

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS –LCA –MLCA –MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

MLCAS The result of the query was almost right. The result of the query was almost right. The problem was that “ bib ” is the MLCA of several groups of nodes which satisfy the query. The problem was that “ bib ” is the MLCA of several groups of nodes which satisfy the query. To solve this, we use: Meaningful Lowest Common Ancestor Structure To solve this, we use: Meaningful Lowest Common Ancestor Structure bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary year 1999 year 1999 title SQL title XML Nodes requested: Title Title Author Author Year Year

MLCAS Given a set of types {t 1 … t m } from the query Given a set of types {t 1 … t m } from the query MLCAS is a set of nodes {r, a 1, …, a m } MLCAS is a set of nodes {r, a 1, …, a m } Where {a 1 … a m } are nodes matching the types {t 1 … t m } Where {a 1 … a m } are nodes matching the types {t 1 … t m } And r is the MLCA of {a 1 … a m } And r is the MLCA of {a 1 … a m }

MLCAS example We are looking for the types: Author, Title and Year. We are looking for the types: Author, Title and Year. Set of nodes matching those types: Set of nodes matching those types: The MLCA of the set: The MLCA of the set: bibliography yearbook 1999titleauthor SQLBob titleauthor XMLMary yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill {David, SQL, 1999} There is none bib nodes are the MLCA of the types: Author, Title, Year bibliography is the LCA of the nodes: David, SQL, 1999 So this set isn ’ t good for us So this set is good for us bib[2]bib[1] {Mary, SQL, 1999} book is the MLCA of the types: Title, Author bib is the LCA of the nodes: Mary, SQL {Bob, SQL, 1999} bib[2]

MLCAS query example FOR $a in doc( “ doc.xml ” )//year, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//author WHERE $c/text()= “ Mary ” RETURN { $a, $b } bib yeararticlebook 1999title author SQL Bob title author XML Mary year 1999 year 1999 title SQL title XML bib author Bob author Mary

Other work on creating meaningful results “ Integrating Keyword Search into XML Query Processing (XML-QL) ” - Daniela Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Kossmann from Univ. of Passau, Germany. “ Integrating Keyword Search into XML Query Processing (XML-QL) ” - Daniela Florescu and Ioana Manolescu from INRIA Rocquencourt, France and Donald Kossmann from Univ. of Passau, Germany. –Use of hierarchical location in the XML (at what level the keyword should be). –Use of semantical location in the XML (tag name, CDATA, attribute … ) –Use of the user ’ s knowledge of the structure of the XML file (Ex: if she knows that books are under the bib tag she can ask for those elements only).

“ XSEarch: A Semantic Search Engine for XML ” - Sara Cohen, Jonathan Mamou, Yaron Kanza and Yehoshua Sagiv from the Hebrew University. – –Enables the user to specify a tag name under which the keyword should be found. – –Use of the fact that if the shortest path between two elements goes through the same tag name more than once, they are probably not meaningfully related. – –Gives ranking to the results. Other work on creating meaningful results book titleauthor D.B.David titleauthor.NETBill bib

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery –mlcas –Expand Conclusion Conclusion

Integrating MLCAS with XQuery In order for us to integrate MLCAS into XQuery we will introduce a new function into XQuery: mlcas (surprising, isn't it?) In order for us to integrate MLCAS into XQuery we will introduce a new function into XQuery: mlcas (surprising, isn't it?) Whenever we want to make sure that the nodes exist in an MLCAS, we will add the condition: exists mlcas ($a, $b, $c) (exists is a keyword in XQuery) Whenever we want to make sure that the nodes exist in an MLCAS, we will add the condition: exists mlcas ($a, $b, $c) (exists is a keyword in XQuery)

Query example number 1 Find the title and year of the publications of which Mary is an author. Find the title and year of the publications of which Mary is an author. for $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//title, $c in doc( “ doc.xml ” )//year where $a/text() = “ Mary ” and exists mlcas ($a, $b, $c) return { $b, $c } This will make sure that the “ author ”, “ title ” and “ year ” that we get, are really of the same publication

Query example number 2 Find additional authors of the publications, of which Mary is an author Find additional authors of the publications, of which Mary is an author for $a in doc( “ doc.xml ” )//author, $b in doc( “ doc.xml ” )//author where $a/text() = “ Mary ” and $a != $b and exists mlcas ($a, $b) return $b This will make sure that both the authors are really of the same publication

Query example number 3 Find year and author of the publications with similar titles to a publication of which Mary is an author Find year and author of the publications with similar titles to a publication of which Mary is an author for $a in doc( “ doc.xml ” )//author, $t in doc( “ doc.xml ” )//title, $y in doc( “ doc.xml)//year, $t2 in { for $aM in doc( “ doc.xml ” )//author, $tM in doc( “ doc.xml ” )//title where $aM/text() = “ Mary ” and exists mlcas($aM, $tM) return $tM } where $t ≈ $t2 and exists mlcas ($y, $a, $t) return { $y, $a }

Not integrated enough? The user who will want to use the MLCAS feature will have to add the line: and exists mlcas($a, $b, … ) to the where statement. The user who will want to use the MLCAS feature will have to add the line: and exists mlcas($a, $b, … ) to the where statement. This might not be simple enough, especially when changing an already existing query. This might not be simple enough, especially when changing an already existing query.

The mlcas keyword The keyword mlcas will be used to ask the system to use MLCAS when choosing nodes: The keyword mlcas will be used to ask the system to use MLCAS when choosing nodes: for $a in mlcas doc( “ doc.xml ” )//author, $b in mlcas doc( “ doc.xml ” )//title where $a/text() = “ Mary ” return $b and exists mlcas ($a, $b)

Some we know Suppose you do know that you are interested only in the first ‘ bib ’ node Suppose you do know that you are interested only in the first ‘ bib ’ node You can make use of your knowledge … You can make use of your knowledge … bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

Some we know for $b in doc( “ doc.xml ” )//bib[1], $a in mlcas $b//author, $t in mlcas $b//title return { $a, $t } bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill

Some we know bibliography bib yeararticlebook 1999titleauthor SQLBob titleauthor XMLMary bib yeararticlebook 2000titleauthor D.B.David titleauthor.NETBill for $b in doc( “ doc.xml ” )//bib[1], $a in mlcas $b//author, $t in mlcas $b//title return { $a, $t }

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery –mlcas –Expand Conclusion Conclusion

Many ways to say … There are different tag names that represent the same thing. Author: Author / Writer / Au Title: Title / Name / Headline There are different tag names that represent the same thing. Author: Author / Writer / Au Title: Title / Name / Headline Less then 20% choose the same term for a single well known object. Less then 20% choose the same term for a single well known object. Our partial knowledge of the XML file will still have to be accurate of how it tags the information we want. Our partial knowledge of the XML file will still have to be accurate of how it tags the information we want.

The expand keyword To solve this issue, we will include yet another keyword: expand To solve this issue, we will include yet another keyword: expand Whenever we are not sure of the exact tag name, we could use the expand keyword to find it for us. Whenever we are not sure of the exact tag name, we could use the expand keyword to find it for us. for $a in mlcas doc( “ doc.xml ” )//expand(author), $b in mlcas doc( “ doc.xml ” )//title where $a/text() = “ Mary ” return $b

The expand keyword The synonyms of a word can be found using a domain-specific thesaurus (developed by domain experts or WordNet). The synonyms of a word can be found using a domain-specific thesaurus (developed by domain experts or WordNet). Another application is an ontology-driven hierarchical thesaurus. For example, use the word “ publication ” to get both “ book ” and “ article ” tags. Another application is an ontology-driven hierarchical thesaurus. For example, use the word “ publication ” to get both “ book ” and “ article ” tags. Think of other applications where this can useful. (google?) Think of other applications where this can useful. (google?)

Ontology-based Query Processing An Ontology for Domain-oriented Semantic Similarity Search On XML Data - An Ontology for Domain-oriented Semantic Similarity Search On XML Data - Anja Theobald from the university of the Saarland, Germany. – –Use of tag name and keyword similarity. – –Use of WordNet and Google to give a ranking to how similar objects are.   WordNet is used to get synonyms or broader terms   Google is used to get a rank of how close two terms are – –Gives ranking to the results.

Ontology-based Query Processing Taken from “ The Index Based XXL Search Engine for Querying XML Data with Relevance Ranking ” by: Anja Theobald, Gerhard Weikum University of the Saarland, Germany

Ontology-based Query Processing (taken from a presentation of Anja Theobald )  XXL Query:... WHERE #.~universe AS U AND U.#.~appearance AS A AND U.#.S ~ „star“ sim(universe, galaxy) sim(star, sun) *  tfidf (sun) 0.43  XXL Query Representation: ~universe ~appearance % % ~ “star” 1.0 sim(app, app) 1.0  XML Data Graph: galaxy object “…light and heat…” description sun appearance location history

Content What is XQuery What is XQuery The problem of Schema-Based queries The problem of Schema-Based queries MLCAS MLCAS Integrating MLCAS with XQuery Integrating MLCAS with XQuery Conclusion Conclusion

Conclusion We wanted to find a way to get accurate results from an XML file which it ’ s structure we don ’ t know. We wanted to find a way to get accurate results from an XML file which it ’ s structure we don ’ t know. We used the MLCAS concept to get meaningful results. We used the MLCAS concept to get meaningful results. We integrated the ability into an already existing query language. We integrated the ability into an already existing query language.

Thank you Questions?

Computing MLCAS One could implement MLCAS computation using the definition of MLCAS: One could implement MLCAS computation using the definition of MLCAS: –“ D ” is a MLCA for nodes “ A ” and “ B ” of types “ T1 ” and “ T2 ” respectively. If:  “ D ” is a common ancestor of nodes “ A ” and “ B ”.  There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ” Take each pair {n1, n2} when “ n1 ” and “ n2 ” are of types “ T1 ” and “ T2 ” respectively. Take each pair {n1, n2} when “ n1 ” and “ n2 ” are of types “ T1 ” and “ T2 ” respectively. Find their LCA by going up from both the nodes till you find a common ancestor. And produce a tree, rooted by the LCA, with n1 and n2 as it ’ s leaves. Find their LCA by going up from both the nodes till you find a common ancestor. And produce a tree, rooted by the LCA, with n1 and n2 as it ’ s leaves. For each pair of trees that you found (TA and TB), if the root of TA is a descendent of the root of TB, remove TB. For each pair of trees that you found (TA and TB), if the root of TA is a descendent of the root of TB, remove TB. –Because TB contradicts the second rule:  There is no node “ C ” that is the LCA of types “ T1 ” and “ T2 ” which is a descendent of node “ D ”