Structured Text Retrieval Models. Str. Text Retrieval Text Retrieval retrieves documents based on index terms. Observation: Documents have implicit structure.

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Module 4 Social Determinants of Financial Reporting
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
The Trie Data Structure Basic definition: a recursive tree structure that uses the digital decomposition of strings to represent a set of strings for searching.
Modern Information Retrieval Chapter 8 Indexing and Searching.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
IR Models: Overview, Boolean, and Vector
Modern Information Retrieval
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
Query Languages: Patterns & Structures. Pattern Matching Pattern –a set of syntactic features that must occur in a text segment Types of patterns –Words:
1 Extending PRIX for Similarity-based XML Query Group Members: Yan Qi, Jicheng Zhao, Dan Situ, Ning Liao.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Experiments on Using Semantic Distances Between Words in Image Caption Retrieval Presenter: Cosmin Adrian Bejan Alan F. Smeaton and Ian Quigley School.
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
1 Chapter 2 Database Environment. 2 Chapter 2 - Objectives u Purpose of three-level database architecture. u Contents of external, conceptual, and internal.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
XML files (with LINQ). Introduction to LINQ ( Language Integrated Query ) C#’s new LINQ capabilities allow you to write query expressions that retrieve.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Query Relevance Feedback and Ontologies How to Make Queries Better.
Indexing Knowledge Daniel Vasicek 2014 March 27 Introduction Basic topic is : All Human Knowledge Who Cares? Simple Examples.
Chapter 2 CIS Sungchul Hong
1 ECE 453 – CS 447 – SE 465 Software Testing & Quality Assurance Instructor Kostas Kontogiannis.
Nancy Lawler U.S. Department of Defense ISO/IEC Part 2: Classification Schemes Metadata Registries — Part 2: Classification Schemes The revision.
Winter 2007SEG2101 Chapter 71 Chapter 7 Introduction to Languages and Compiler.
Processing of structured documents Spring 2003, Part 7 Helena Ahonen-Myka.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Chapter 6: Information Retrieval and Web Search
 Three-Schema Architecture Three-Schema Architecture  Internal Level Internal Level  Conceptual Level Conceptual Level  External Level External Level.
Book: Bayesian Networks : A practical guide to applications Paper-authors: Luis M. de Campos, Juan M. Fernandez-Luna, Juan F. Huete, Carlos Martine, Alfonso.
Lecture # 3 & 4 Chapter # 2 Database System Concepts and Architecture Muhammad Emran Database Systems 1.
44220: Database Design & Implementation Modelling the ‘Real’ World Ian Perry Room: C41C Ext.: 7287
Semantic web course – Computer Engineering Department – Sharif Univ. of Technology – Fall Knowledge Representation Semantic Web - Fall 2005 Computer.
A Probabilistic Quantifier Fuzzification Mechanism: The Model and Its Evaluation for Information Retrieval Felix Díaz-Hemida, David E. Losada, Alberto.
Programming Languages and Design Lecture 3 Semantic Specifications of Programming Languages Instructor: Li Ma Department of Computer Science Texas Southern.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
Sept. 27, 2002 ISDB’02 Transforming XPath Queries for Bottom-Up Query Processing Yoshiharu Ishikawa Takaaki Nagai Hiroyuki Kitagawa University of Tsukuba.
Vector Space Models.
Information Retrieval
CIS 530 Lecture 2 From frequency to meaning: vector space models of semantics.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
1 Chapter 2 Database Environment Pearson Education © 2009.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
Databases and Information Retrieval: Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University.
Chapter 13. Structured Text Retrieval With Mounia Lalmas 무선 / 이동 시스템 연구실 김민혁.
1 Keyword Search over XML. 2 Inexact Querying Until now, our queries have been complex patterns, represented by trees or graphs Such query languages are.
Plan for Today’s Lecture(s)
Chapter 2 Database Environment.
CS 4700: Foundations of Artificial Intelligence
Chapter 2 Database Environment.
Chapter 2 Database Environment Pearson Education © 2009.
Database.
Ontology Reuse In MBSE Henson Graves Abstract January 2011
HCC class lecture 13 comments
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Introduction to Data Structure
Chapter 2 Database Environment Pearson Education © 2009.
Recuperação de Informação B
Chapter 2 Database Environment Pearson Education © 2009.
CH 4 - Language semantics
Presentation transcript:

Structured Text Retrieval Models

Str. Text Retrieval Text Retrieval retrieves documents based on index terms. Observation: Documents have implicit structure. Regular text retrieval and indexing strategies lose the information available within the structure. Text Retrieval desired based on structure. e.g. All documents having “George Bush” in the caption of a photo.

Models for Str. Text Retrieval PAT Expressions Overlapped Lists Proximal Nodes List of References Tree-based Query Languages (SFQL,CCL)

Proximal Nodes By Gonzalo Navarro and Ricardo Baeza-Yates Based on hierarchical structure of documents Structure computation is static and all structural elements are defined. “nodes” Model attempts to define operators on these nodes based on their definition and content. Only nodes at a particular hierarchy are returned as results.

Proximal Nodes Document Chapter Section

Proximal Nodes Nodes are structural in nature, e.g. Chapter, Section, etc. Each node has a defined segment (Contiguous part of text) Operators are defined with respect to this model. Structure operators and Text operators.

Proximal Nodes Structure Operators Name Inclusion Positional Inclusion Distance operators Child/Parent operators Set Manipulation operators Text Operators Match

Retrieval on Evidence By Mounia Lalmas Based on documents made up of objects. Objects are modeled as independent entities and can be in different media, language or locations. Document indexing – degree of uncertainty that the index term actually represents the object. Uncertainty must be captured to get better results. Use the Dempster-Shafer theory of evidence

Retrieval on Evidence Model takes into consideration disparity between indexing vocabularies. Aggregation of indexing vocabulary and also the aggregation of the uncertainty. Object o Є O and a type t Є T, the function type is defined as O →∂(T) Aggregation is defined over objects and composite object types contain all the types of the contained objects

Retrieval on Evidence Indexing vocabulary is defined over a proposition-space. e.g. Wine (english,text), Blue(colour,feature) Sentence space defines that indexes in the same proposition space can be used together. Semantic between indexing vocabulary is maintained using the the notion of worlds.

Retrieval on Evidence Each type t has S, W, v, π S t is the sentence space for a type W is the possible worlds associated with S t v t is {true, false} over W t x P t Π t is {true, false} over W t x S t Logical and equivalence between sentences is built around the notion of their semantics being equivalent in all or most worlds.

Retrieval on Evidence However, the uncertainty of the representation remains. This is represented by the weighting function based on the Dempster Shafer model. These objects and their syntactic and semantic models are aggregated for the objects which contain them. E.g. A section containing sentences indexed by terms a,b,c,d.. Will be equivalent to sentences over the worlds also implying a,b,c,d…

Comparisons Proximal Nodes is based on structured documents. It presents the matter clearly and provides approaches towards building a software architecture. It presents findings of conducted experiments. The Evidence paper tries to model heterogeneous documents, made up of different media, languages, etc. Overall the model is complex and no results are given to its implementation and performance.