IR Models: Structural Models

Slides:



Advertisements
Similar presentations
INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Multimedia Database Systems
Basic IR: Modeling Basic IR Task: Slightly more complex:
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Modern Information Retrieval Chapter 1: Introduction
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Information Retrieval Modeling CS 652 Information Extraction and Integration.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Chapter 4 : Query Languages Baeza-Yates, 1999 Modern Information Retrieval.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modern Information Retrieval Chapter 4 Query Languages.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
LIS618 lecture 1 Thomas Krichel economic rational for traditional model In olden days the cost of telecommunication was high. database use.
Querying Structured Text in an XML Database By Xuemei Luo.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Introduction to Digital Libraries Searching
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Information Retrieval and Web Search IR models: Vectorial Model Instructor: Rada Mihalcea Class web page: [Note: Some.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CSCE 5300 Information Retrieval and Web Search Introduction to IR models and methods Instructor: Rada Mihalcea Class web page:
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Information Retrieval
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
1 Information Retrieval LECTURE 1 : Introduction.
Information Retrieval
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
Chapter 13. Structured Text Retrieval With Mounia Lalmas 무선 / 이동 시스템 연구실 김민혁.
Lecture 1: Introduction and the Boolean Model Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
موضوع پروژه : بازیابی اطلاعات Information Retrieval
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Introduction to Information Retrieval
Models for Retrieval and Browsing - Structural Models and Browsing
Query Languages Berlin Chen 2003 Reference:
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Advanced information retrieval
Presentation transcript:

IR Models: Structural Models

IR Model Taxonomy s e Adhoc r Filtering T a k Browsing Set Theoretic Fuzzy Extended Boolean Classic Models boolean vector probabilistic Algebraic Generalized Vector Lat. Semantic Index Neural Networks U s e r T a k Retrieval: Adhoc Filtering Non-Overlapping Lists Proximal Nodes Structured Models Probabilistic Inference Network Belief Network Browsing Browsing Flat Structure Guided Hypertext

Introduction Keyword-based query answering considers that the documents are flat A word in the title has the same weight as a word in the body of the document, etc. But, the document structure is information which can improve IR Enhancing keyword-based models Enabling alternative forms of requests Example of enhancing keyword-based models: Words appearing in the title or in sub-titles within the document could receive higher weights.

Introduction Consider the following information need: Retrieve all documents which contain a page in which the string “atomic holocaust” appears in italics in the text surrounding a Figure whose label contains the word earth The corresponding query could be: same-page( near( “atomic holocaust”, Figure( label( “earth” ))))

Introduction Advanced interfaces that facilitate the specification of the structure are also highly desirable Models which allow combining information on text content with information on document structure are called structured text models

Solution? Brute force approach: Problem: Treat each document as being made up of many (sub-)documents Problem: How many sub-documents? How to represent relations of components of original document?

Non-Overlapping Lists Idea: divide the text in non-overlapping regions which are collected in a list Multiple ways to divide the text in non-overlapping parts yield multiple lists: a list for chapters a list for sections a list for subsections

Non-Overlapping Lists Chapter L1 Sections L2 SubSections L3 SubSubSections

Non-Overlapping Lists Regions are non-overlapping which limits the queries that can be asked Types of queries: select a region that contains a given word select a region A that does not contain a region B (regions A and B belong to distinct lists) select a region not contained within any other region

Non-Overlapping Lists The non-overlapping lists model is simple and allows efficient implementation But, types of queries that can be asked are limited Also, model does not include any provision for ranking the documents by degree of similarity to the query What does structural similarity mean?

Proximal Nodes Idea: define a strict hierarchical index over the text. This enrichs the previous model that used flat lists. Multiple index hierarchies might be defined Two distinct index hierarchies might refer to text regions that overlap

Definitions Each indexing structure is a strict hierarchy composed of chapters sections subsections paragraphs lines Each of these components is called a node To each node is associated a text region

Proximal Nodes Chapter Sections SubSections SubSubSections holocaust 10 256 48,324

Proximal Nodes Key points: In the hierarchical index, one node might be contained within another node But, two nodes of the same level in a hierarchy cannot overlap The inverted list for keywords complements the hierarchical index The implementation here is more complex than that for non-overlapping lists

Proximal Nodes Queries are now regular expressions: search for strings references to structural components combination of these Model is a compromise between expressiveness and efficiency Model is more expressive than non-overlapping lists

Proximal Nodes Query: find the sections, the subsections, and the subsubsections that contain the word “holocaust” [(*section) with (“holocaust”)] Simple query processing: traverse the inverted list for “holocaust” and determine all match points use the match points to search in the hierarchical index for the structural components

Proximal Nodes Query: [(*section) with (“holocaust”)] Sophisticated query processing: get the first entry in the inverted list for “holocaust” use this match point to search in the hierarchical index for the structural components Innermost matching component: smaller one Check if innermost matching component includes the second entry in the inverted list for “holocaust” If it does, check the third entry and so on This allows matching efficiently the nearby (or proximal) nodes

Conclusions Model allows formulating queries that are more sophisticated than those allowed by non-overlapping lists To speed up query processing, nearby nodes are inspected Model is a compromise between efficiency and expressiveness