ISP 433/533 Week 11 XML Retrieval. Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity.

Slides:



Advertisements
Similar presentations
XIRQL: Eine Anfragesprache für Information Retrieval in XML-Dokumenten
Advertisements

INEX: Evaluating content-oriented XML retrieval Mounia Lalmas Queen Mary University of London
XML Retrieval: from modelling to evaluation Mounia Lalmas Queen Mary University of London qmir.dcs.qmul.ac.uk.
Introduction to HTML & CSS
XML: Extensible Markup Language
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
XML R ETRIEVAL Tarık Teksen Tutal I NFORMATION R ETRIEVAL XML (Extensible Markup Language) XQuery Text Centric vs Data Centric.
XSEarch XML Search Engine Jonathan MAMOU October 2002.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
DYNAMIC ELEMENT RETRIEVAL IN A STRUCTURED ENVIRONMENT MAYURI UMRANIKAR.
Made by: Dan Ye. Introduction Basic Last Page ☆ HTML stands for Hyper Text Markup Language; ☆ HTML is not a programming language, it is a markup language;
1 COS 425: Database and Information Management Systems XML and information exchange.
INEX 2003, Germany Searching in an XML Corpus Using Content and Structure INEX 2003, Germany Yiftah Ben-Aharon, Sara Cohen, Yael Grumbach, Yaron Kanza,
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
1 - Fuhr: Information Retrieval Methods for XML Documents XIRQL: Eine Anfragesprache für Information Retrieval in XML- Dokumenten Norbert Fuhr Universität.
Overview of Search Engines
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Information Retrieval in Practice
10/14/2001 Coping with Semantics in XML Document Management Thomas Kudrass Leipzig University of Applied Sciences Department of Computer Science and Mathematics.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 XML Taken from Chapter 7.
Lecture 6 of Advanced Databases XML Schema, Querying & Transformation Instructor: Mr.Ahmed Al Astal.
CREATED BY ChanoknanChinnanon PanissaraUsanachote
XML Retrieval with slides of C. Manning und H.Schutze 04/12/2008.
HTML Structure & syntax
1 Holistic Twig Joins: Optimal XML Pattern Matching ACM SIGMOD 2002.
The main mathematical concepts that are used in this research are presented in this section. Definition 1: XML tree is composed of many subtrees of different.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
1 Searching XML Documents via XML Fragments D. Camel, Y. S. Maarek, M. Mandelbrod, Y. Mass and A. Soffer Presented by Hui Fang.
Semantic Learning Instructor: Professor Cercone Razieh Niazi.
Querying Structured Text in an XML Database By Xuemei Luo.
XQL, OQL and SQL Xia Tang Sixin Qian Shijun Shen Feb 18, 2000.
NaLIX Natural Language Interface for querying XML Huahai Yang Department of Information Studies Joint work with Yunyao Li and H.V. Jagadish at University.
CA Professional Web Site Development Class 2: Anatomy of a Web Site and Web Page & Intro to HTML.
Lecture 6: XML Query Languages Thursday, January 18, 2001.
HTML Basics Let’s Make a Web Page. What is HTML? HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a.
Database Systems Part VII: XML Querying Software School of Hunan University
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engine Architecture
BNCOD07Indexing & Searching XML Documents based on Content and Structure Synopses1 Indexing and Searching XML Documents based on Content and Structure.
WPI, MOHAMED ELTABAKH PROCESSING AND QUERYING XML 1.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Chapter 27 The World Wide Web and XML. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.27-2 Topics in this Chapter The Web and the Internet.
Towards Contextual and Structural Relevance Feedback in XML Retrieval Lobna Hlaoua IRIT (Institut de Recherche en Informatique de Toulouse) Equipe SIG-RI.
The Semistructured-Data Model Programming Languages for XML Spring 2011 Instructor: Hassan Khosravi.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
HTML Basic. What is HTML HTML is a language for describing web pages. HTML stands for Hyper Text Markup Language HTML is not a programming language, it.
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 4 1COMP9321, 15s2, Week.
Introduction to HTML Year 8. What is HTML O Hyper Text Mark-up Language O The language that all the elements of a web page are written in. O It describes.
Reviews Crawler (Detection, Extraction & Analysis) FOSS Practicum By: Syed Ahmed & Rakhi Gupta April 28, 2010.
XML Information Retreival Hui Fang Department of Computer Science University of Illinois at Urbana-Champaign Some slides are borrowed from Nobert Fuhr’s.
Working with XML. Markup Languages Text-based languages based on SGML Text-based languages based on SGML SGML = Standard Generalized Markup Language SGML.
CSE 6331 © Leonidas Fegaras XQuery 1 XQuery Leonidas Fegaras.
XML 1. Chapter 8 © 2013 Pearson Education, Inc. Publishing as Prentice Hall SAMPLE XML SCHEMA (XSD) 2 Schema is a record definition, analogous to the.
Querying Structured Text in an XML Database Shurug Al-Khalifa Cong Yu H. V. Jagadish (University of Michigan) Presented by Vedat Güray AFŞAR & Esra KIRBAŞ.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
XML: Extensible Markup Language
Querying and Transforming XML Data
Search Engine Architecture
XML Indexing and Search
Toshiyuki Shimizu (Kyoto University)
Information Retrieval
eXtensible Markup Language (XML)
Introduction to Information Retrieval
Search Engine Architecture
HyperText Markup Language
Information Retrieval and Web Design
Introduction to XML IR XML Group.
Presentation transcript:

ISP 433/533 Week 11 XML Retrieval

Structured Information Traditional IR –Unit of information: terms and documents –No structure Need more granularity Document has structure –E.g. title, sections, footnotes, etc A markup language is a mechanism to identify structures in a document –Data + Metadata

Extensible Markup Language XML Markup (tags – not a fixed set) Content Nested, named trees with attributes One Fish Two Fish John Meyer Peter Smith 7.95 Goodnight Moon Margaret Brown

Elements Delimited by angle brackets Identify the nature of the content they surround Elements can be nested within another element –A tree structure Element may have attributes –E.g.

Unit of Retrieval Traditional IR –Document XML IR –Element or fragment of element

Example Retrieval Units

Requirements for XML Retrieval Basic needs for XML retrieval –Query both Data and Metadata –express the query in an user convenient way –return proper document fragments –rank the results according to their relevance

INEX The initiative for evaluating XML retrieval –international, coordinated effort to promote evaluation procedures for content-based XML retrieval –provides large test collection of XML documents (12,000 articles in IEEE CS publications since 1995) –introduces both content-only (CO) and content-and- structure (CAS) topics –designed to be a long-term initiative with workshops held on a yearly basis (currently in the second year)

INEX CO Topic example semantic web Research and business opportunities and challenges in developing and deploying the concept of the Semantic Web and the associated idea of web services. To be relevant, a document/component must either discuss the technical issues and opportunities associated with the semantic web, or it must discuss the business challenges, especially the question of viable business models for web services. semantic web, ontologies, SOAP, UDDI, RDF…

INEX CAS Topic example //fig, //p, //ip1 Corba architecture //fgc Figure Corba Architecture //p, //ip1 Find figures that describe the Corba architecture and the paragraphs that refer to those figures. To be relevant a figure must describe the standard Corba architecture or a system architecture that relies heavily on Corba…Retrieved components would ideally contain both the figure and the paragraph referring to it. CORBA Object Request Broker Architecture …

An Inverted Indexing for XML (1, 1:23, 0) (1, 8:22, 1) (1, 14:21, 2) … … (1, 2:7, 1) (1, 9:13, 2) (1, 15:20, 3) … … (1, 3, 2) … … (1, 4, 2) … … “retrieval” “information” Element index Text index Information Retrieval Using RDBMS Beyond Simple Translation Extension of IR Features

XPath XPath is a non-XML language for identifying particular parts of XML documents –picking nodes and sets of nodes Similar to Unix file system expression “ /people/person/name/first_name ” “*” wildcard “..” parent “.” context node –“//” descendents attribute –[] predicate,specify a condition

XPath Example chapter/heading document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter

XPath Example chapter//heading document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter

XPath Example //chapter[heading] document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter

XPath Example  author="John Smith"] document class="H.3.3" author John Smith title XML Retrieval Introduction chapter headingThis... heading SyntaxExamples heading sectionheading XML Query Language XQL section We describe syntax of XQL chapter

More XPath Examples –All the elements that have attribute “id” //middle_initial/../first_name –All the first_name elements that are siblings of middle_initial elements //person[profession=‘physicist’] –All person elements that have a profession child element with the value “physicist”

XQuery A language to query data that is similar to XML in structure –nested, named trees with attributes Based on XPath FOR/LET PathExpression WHERE AdditionalSelectionCriteria RETURN ResultConstruction

XQuery Example Find the name(s) of customers who have ordered the part whose part_id is "xx" FOR $c IN customers FOR $o IN orders WHERE $c.cust_id=$o.cust_id AND $o.part_id="xx" RETURN $c.name

More XQuery Example Find titles and prices of books by ‘Meyer’ or ‘Smith’ FOR $b IN document(“bib.xml”)//book WHERE $b/author contains ‘Meyer’ OR $b/author contains ‘Smith’ RETURN $b/title $b/price

One Document Structure Previous XQuery works bookinfo Just Lost book title author price Mercy Meyer Gina Meyer $5.75 book title price Brown Hedi $13.95

Another Document Structure Same XQuery doesn’t work author name Dr. Meyer author name book M. Brown Goodnight Moon title book title price One Fish Two Fish $12.50 book title price Cat in the Hat $14.95 bookinfo

Problem with XQuery Requires knowledge of document structure Dependent on document structure Difficult for naive user Need extensions to solve the problem Still in active research

Don’t know the tags? Integrating with full-text keywords search Automatically identifying tag names Translate query terms to tag names Query expansion

Don’t know the structure? Schema-free XQuery –Automatically identifying minimum, meaningful set of nodes that can provide answer Just Lost title bookinfo book name price Mercy Meyer Gina Meyer $5.75 book title price Brown Bear $13.95

Querying XML with Natural Language Translate natural language query to Schema-free XQuery NaLIX demo

Relevance Scoring Query: articles about “search engine”

TermJoin User-defined score function generates the score based on term occurrences and other information They are then joined score = 1 score = 2 score = 4 score = 5