1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.

Slides:



Advertisements
Similar presentations
XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten
Advertisements

CYCLADES Kickoff 19/02/01 Gudrun Fischer, Norbert Fuhr University of Dortmund (Germany) CYCLADES Acess Service.
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
XML: Extensible Markup Language
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Modern Information Retrieval Chapter 1: Introduction
RDF Tutorial.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Search Engines and Information Retrieval
ISP 433/533 Week 2 IR Models.
Modern Information Retrieval Chapter 1: Introduction
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
IR Models: Structural Models
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
1 Configurable Indexing and Ranking for XML Information Retrieval Shaorong Liu, Qinghua Zou and Wesley W. Chu UCLA Computer Science Department {sliu, zou,
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
1 COS 425: Database and Information Management Systems XML and information exchange.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
XML for Information Management – Day 3: Formal and Natural Languages in XML Airi Salminen XML for Information Management University of Erlangen-Nuremberg.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
CS276B Text Retrieval and Mining Winter 2005 Lecture 12.
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Chapter 5: Information Retrieval and Web Search
Overview of Search Engines
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Information Retrieval in Practice
4/20/2017.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Search Engines and Information Retrieval Chapter 1.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation An Introduction to XQuery.
Extensible Markup and Beyond
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
1 SAMT’08 Semantic-driven multimedia retrieval with the MPEG Query Format Ruben Tous and Jaime Delgado Distributed Multimedia Applications Group (DMAG)
Company LOGO OODB and XML Database Management Systems – Fall 2012 Matthew Moccaro.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Querying Structured Text in an XML Database By Xuemei Luo.
XQL, OQL and SQL Xia Tang Sixin Qian Shijun Shen Feb 18, 2000.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.
Controlling Overlap in Content-Oriented XML Retrieval Charles L. A. Clarke School of Computer Science University of Waterloo Waterloo, Canada.
Chapter 6: Information Retrieval and Web Search
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
ITCS 6265 Information Retrieval & Web Mining Lecture 18-A Fall 2009.
WEB BAR 2004 Advanced Retrieval and Web Mining Lecture 11.
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
MIND: An architecture for multimedia information retrieval in federated digital libraries Henrik Nottelmann University of Dortmund, Germany.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
CS 157B: Database Management Systems II February 11 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
XML Information Retreival Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign Some slides are borrowed from Nobert Fuhr’s.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
Chapter 5 The Semantic Web 1. The Semantic Web  Initiated by Tim Berners-Lee, the inventor of the World Wide Web.  A common framework that allows data.
Welcome to CPSC 534B: Information Integration Laks V.S. Lakshmanan Rm. 315.
1 DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen, Germany.
Metadata Michael J. Watts
XML: Extensible Markup Language
Querying and Transforming XML Data
Information Retrieval
2/18/2019.
Introduction to Information Retrieval
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
CoXML: A Cooperative XML Query Answering System
Presentation transcript:

1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität Dortmund

2 - Fuhr: Information Retrieval Methods for XML Documents Outline of Talk I. XML retrieval II. XIRQL: XML IR Query Language III. XIRQL vs. XQuery IV. User Interface V. INEX: Initiative for the Evaluation of XML Retrieval VI. Summary

3 - Fuhr: Information Retrieval Methods for XML Documents I. XML documents John Smith XML Retrieval Introduction This text explains all about XML and IR. XML Query Language XQL Examples Syntax Now we describe the XQL syntax. Elements:  start tag  end tag  content  attribute

4 - Fuhr: Information Retrieval Methods for XML Documents Tree view document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter

5 - Fuhr: Information Retrieval Methods for XML Documents XML query languages  Data-centric view: XML as exchange format for structured data  Document-centric view: XML as format for representing the logical structure of documents W3C WG proposal for XML query language: XQuery Focuses on data-centric view here:  Information Retrieval for document-centric view  Starting point: XPath (XQL)

6 - Fuhr: Information Retrieval Methods for XML Documents XPath document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter Path condition: parent/child node chapter/heading

7 - Fuhr: Information Retrieval Methods for XML Documents XPath document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter Path condition: ancestor-descendant chapter//heading

8 - Fuhr: Information Retrieval Methods for XML Documents XPath document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter Filter wrt. structure: //chapter[heading]

9 - Fuhr: Information Retrieval Methods for XML Documents XPath document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter Filter wrt. content:  author="John Smith"]

10 - Fuhr: Information Retrieval Methods for XML Documents XPath properties Conditions wrt. logical structure Conditions wrt. content Results are always complete elements - Boolean Retrieval (poor retrieval quality) - Relevance-oriented search (irrespective of structure) not supported - Few data types only

11 - Fuhr: Information Retrieval Methods for XML Documents II. XIRQL: XML IR Query Language Extend XPath by:  Probabilistic retrieval with weighted document indexing  Relevance-oriented search (irrespective of structure)  (Extensible) data types with vague predicates  Structural relativism

12 - Fuhr: Information Retrieval Methods for XML Documents II.1 Probabilistic Retrieval with XIRQL Problem: weighting of different forms of occurrence of terms /document[.//heading  "XML" .//section//*  "XML"] document Introduction chapter headingThis... heading SyntaxExamples headingXML Query Language XQL section We describe syntax of XQL chapter headingsection

13 - Fuhr: Information Retrieval Methods for XML Documents Weighting of term occurrences in documents a) Weighting wrt. single query conditions P(.//heading  "XML“,d) = 0.5 P(.//section//*  "XML“,d) = 0.7  Possible overlapping of query conditions  Dependent probabilistic events  Only probability intervals for answers  No linear ranking of documents

14 - Fuhr: Information Retrieval Methods for XML Documents Weighting of term occurrences in documents b) Weighting wrt. document parts  Term weighting depends on context of term occurrence  All occurrences within same context refer to same probabilistic event  Only identical and independent events  Point probabilities for answers  Linear ranking of documents

15 - Fuhr: Information Retrieval Methods for XML Documents Index nodes as units for term weighting Application of known indexing functions (e.g. tf*idf)

16 - Fuhr: Information Retrieval Methods for XML Documents Probabilistic events and event expressions Problem: combination of term weights consistent with probability theory  Basic event: term occurrence in an index node  Basic events are independent (different terms, same term in different index nodes)  Event expressions describe combination of basic events in a document wrt. a query

17 - Fuhr: Information Retrieval Methods for XML Documents Event expressions document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading Syntax Examples heading sectionheading XML Query Lang. XQL section We describe syntax of XQL chapter //section[.//*  "XQL" .//*  "syntax"] [5,XQL]  [5,syntax]

18 - Fuhr: Information Retrieval Methods for XML Documents Event expressions /document/chapter [.//*  "XQL" .//*  "syntax"] ([3,XQL]  [5,XQL])  [5,syntax] document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading Syntax Examples heading sectionheading XML Query Lang. XQL section We describe syntax of XQL chapter

19 - Fuhr: Information Retrieval Methods for XML Documents Evaluation of event expressions (as in probabilistic Datalog) 1. Transform event expression into disjunctive normal form e = C 1  …  C n C i : Conjunction of event atoms Event atom: positive or negated basic event 2. Application of inclusion/exclusion formula:

20 - Fuhr: Information Retrieval Methods for XML Documents II.2 Relevance-oriented search (Queries irrespective of document structure) 1)Restrict possible answers (not all elements suitable) 2)Retrieval strategy: return most specific element satisfying the query but: combination with weighted indexing? Solution:  Index nodes as roots of possible answers  Augmentation as concept for computing tradeoff between indexing weights and specifity of answers

21 - Fuhr: Information Retrieval Methods for XML Documents Index nodes for relevance- oriented search document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Lang. XQL section We describe syntax of XQL chapter

22 - Fuhr: Information Retrieval Methods for XML Documents Augmentation …by disjunction Example query: syntax  example 0.5 example0.8 XQL 0.7 syntax section1section2 0.3 XQL chapter 0.5 example 0.7 syntax *0.5

23 - Fuhr: Information Retrieval Methods for XML Documents Augmentation …by disjunction 0.5 example0.8 XQL 0.7 syntax section1section2 0.3 XQL chapter 0.5 example 0.7 syntax 0.86 Example query: XQL

24 - Fuhr: Information Retrieval Methods for XML Documents Augmentation …with augmentation weight 0.5 example0.8 XQL 0.7 syntax section1section2 0.3 XQL chapter 0.30 example 0.42 syntax 0.64 Example query: XQL

25 - Fuhr: Information Retrieval Methods for XML Documents II.3 XIRQL: Data types with vague predicates XML markup allows for detailed markup of text elements  Exploit markup for more precise searches  Consider also vagueness and imprecision of IR  Data types with vague queries ``Search for an artist named Ulbrich, living in the Rhine- Main area of Germany about 100 years ago” Ernst Olbrich, Darmstadt, 1899  (Extensible) data types for document-centric view (person names, dates, geographic locations, classifications/ images, audio,...)

26 - Fuhr: Information Retrieval Methods for XML Documents Extensible type hierarchy  Extensible type hierarchy with vague predicates for each data type 1) text: substring-match 2) Western language: single word search, truncation, word distance 3) English text: stemming, noun phrases  Data types of XML documents defined in extended DTD (XML schema)

27 - Fuhr: Information Retrieval Methods for XML Documents II.4 Structural Relativism  Drop distinction attribute/element: ~author searches for attribute or element  Generalize to data types: #personname searches for attributes/elements of specific data type  Exploit ontology over element names: region – country – continent  Edit distance on paths: author=“Smith” vs. author/name vs. author/name/lastname

28 - Fuhr: Information Retrieval Methods for XML Documents III. XIRQL vs. XQuery XQuery (proposed as standard XML query language by W3C WG):  No IR support (weighting, vague predicates, relevance-oriented search, structural relativism)  Aggregation operators (sum, count, min, max, avg)  Restructuring of results

29 - Fuhr: Information Retrieval Methods for XML Documents XIRQL as IR extension of XQuery subset XQuery structure: FOR PathExpression WHERE AdditionalSelectionCriteria RETURN ResultConstruction XIRQL subset: FOR $X IN PathExpression RETURN $X

30 - Fuhr: Information Retrieval Methods for XML Documents IV. User Interface  Query formulation  Result visualization

31 - Fuhr: Information Retrieval Methods for XML Documents Query Formulation: Layout- oriented

32 - Fuhr: Information Retrieval Methods for XML Documents Query Formulation: Structure- oriented

33 - Fuhr: Information Retrieval Methods for XML Documents Visualization of Results: Textbars

34 - Fuhr: Information Retrieval Methods for XML Documents Visualization of Results : Treemaps

35 - Fuhr: Information Retrieval Methods for XML Documents V. INEX: Initiative for the Evaluation of XML Retrieval  49 participating groups from 20 countries  Documents: 7 years of IEEE-CS journals (12107 articles, 494 MB)  Queries: 30 content-only, 30 content+structural conditions  Results due: August 15, 2002  Final Workshop: December 2002

36 - Fuhr: Information Retrieval Methods for XML Documents Example query  Title: Nonmonotonic Reasoning  Description: Retrieve all articles from the years that deal with nonmonotonic reasoning. Do not retrieve articles that are calendar/calls for papers.  Condition: /article[./bdy/sec  “nonmonotonic reasoning” ./hdr/yr[.= 2000 . = 1999] .//.  “belief revision”  .//tig/atl  “calendar”]

37 - Fuhr: Information Retrieval Methods for XML Documents VI. Summary  Data-centric vs. document-centric view on XML (database vs. IR view)  IR methods for XML must support uncertainty and vagueness…

38 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: XML query language implementing  Combination of structural conditions with probabilistic weighting  Relevance-oriented search by augmentation  Extensible data types with vague predicates  Structural relativism HyREX: Open source XML retrieval engine: dortmund.de/ir/projects/hyrex