Chapter 2Modeling 資工 4B 86075800 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

INFO624 - Week 2 Models of Information Retrieval Dr. Xia Lin Associate Professor College of Information Science and Technology Drexel University.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
Modern Information Retrieval Chapter 1: Introduction
Query Languages. Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Lecture 11 Search, Corpora Characteristics, & Lucene Introduction.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
ISP 433/533 Week 2 IR Models.
Query Operations: Automatic Local Analysis. Introduction Difficulty of formulating user queries –Insufficient knowledge of the collection –Insufficient.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Information Retrieval Modeling CS 652 Information Extraction and Integration.
6/16/20151 Recent Results in Automatic Web Resource Discovery Soumen Chakrabartiv Presentation by Cui Tao.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Information Retrieval
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 5: Information Retrieval and Web Search
Information Retrieval: Foundation to Web Search Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems August 13, 2015 Some.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
Modern Information Retrieval Chap. 02: Modeling (Structured Text Models)
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32-33: Information Retrieval: Basic concepts and Model.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
PrasadL2IRModels1 Models for IR Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Introduction to Digital Libraries Searching
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Chapter 6: Information Retrieval and Web Search
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
CSCE 5300 Information Retrieval and Web Search Introduction to IR models and methods Instructor: Rada Mihalcea Class web page:
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Vector Space Models.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
C.Watterscsci64031 Classical IR Models. C.Watterscsci64032 Goal Hit set of relevant documents Ranked set Best match Answer.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
Chapter 13. Structured Text Retrieval With Mounia Lalmas 무선 / 이동 시스템 연구실 김민혁.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Models for Retrieval and Browsing - Structural Models and Browsing
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Information Retrieval and Web Design
Advanced information retrieval
Presentation transcript:

Chapter 2Modeling 資工 4B 陳建勳

Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.  An index term is a keyword(or group of related words) which has some meaning of its own (usually a noun).

The advantage of using index terms Simple The semantic of the documents and of the user information need can be naturally expressed through sets of index terms.  Ranking algorithms are at the core of information retrieval systems(predicting which documents are relevant and which are not).

A taxonomy of information retrieval models Retrieval: Ad hoc Filtering Classic Models Browsing USERTASKUSERTASK Boolean Vector Probabilistic Structured Models Non-overlapping lists Proximal Nodes Flat Structured Guided Hypertext Browsing Fuzzy Extended Boolean Set Theoretic Algebraic Generalized Vector Lat. Semantic Index Neural Networks Inference Network Belief Network Probabilistic

Index TermsFull TextFull Text+ Structure RetrievalClassic Set Theoretic Algebraic Probabilistic Classic Set Theoretic Algebraic Probabilistic Structured BrowsingFlat Hypertext Structure Guided Hypertext Figure 2.2 Retrieval models most frequently associated with distinct combinations of a document logical view and a user task.

Retrieval : Ad hoc and Filtering Ad hoc : The documents in the collection remain relatively static while new queries are submtted to the system. Filtering : The queries remain relatively static while new documents come into the system

Filtering Typically, the filtering task simply indicates to the user the documents which might be of interest to him. Routing : Rank the filtering documents and show this ranking to the user. Constructing user profiles in two ways.

A formal characterization of IR models D : A set composed of logical views(or representation) for the documents in the collection. Q : A set composed of logical views(or representation) for the user information needs(queries). F : A framework for modeling document representations, queries, and their relationships. R(q i, d j ) : A ranking function which defines an ordering among the documents with regard to the query.

Classic information retrieval model Basic concepts : Each document is described by a set of representative keywords called index terms. Assign a numerical weights to distinct relevance between index terms.

Define k i : A generic index term K : The set of all index terms {k 1, …,k t } w i,j : A weight associated with index term k i of a document d j g i : A function returns the weight associated with k i in any t-dimensoinal vector( g i (d j )=w i,j )

Boolean model Based on a binary decision criterion without any notion of a grading scale. Boolean expressions have precise semantics.It is not simple to translate an information need into a Boolean expression. Can be represented as a disjunction of conjunction vectors(in disjunctive normal form- DNF).

Vector model Assign non-binary weights to index terms in queries and in documents. Compute the similarity between documents and query. More precise than Boolean model.

想法 We think of the documents as a collection C of objects and think of the user query as a specification of a set A of objects.In this scenario, the IR problem can be reduced to the problem of determine which documents are in the set A and which ones are not(i.e., the IR problem can be viewed as a clustering problem).

Intra-cluster : One needs to determine what are the features which better describe the objects in the set A. Inter-cluster : One needs to determine what are the features which better distinguish the objects in the set A.

tf : inter-clustering similarity is quantified by measuring the raw frequency of a term k i inside a document d j, such term frequency is usually referred to as the tf factor and provides one measure of how well that term describes the document contents. idf : inter-clustering similarity is quantified by measuring the inverse of the frequency of a term k i among the documents in the collection.This frequency is often referred to as the inverse document frequency.

Vector model is simple and fast. It ’ s a popular retrieval model. Disadvantage : Index terms are assumed to be mutually independent. It doesn ’ t account for index term dependencies.

Probabilistic model  We can think of the querying process as a process of specifying the properties of an ideal answer set(The problem is that we do not know exactly what these properties are.).

Structured text retrieval model  Retrieval models which combine information on text content with information on the document structure are called structured text retrieval model.  Match point : refer to the position in the text of a sequence of words which matches the user query.  Region : refer to a contiguous portion of the text.  Node : refer to a structural component of the document such as a chapter, a section, a subsection.

Model based on Non-overlapping lists  Divide the whole text of each document in non-overlapping text regions which are collected in a list.  Text regions in the same list have no overlapping, but text regions from distinct lists might overlap.

Model based on Proximal nodes  A model which allows the definition of independent hierarchical indexing structures over the same document text.  Each of these index structures is a strict hierarchy composed of chapters, sections, paragraphs, pages, and lines which called nodes.

Models for browsing  Flat browsing  Structure guided browsing  The hypertext model

Flat browsing The documents might be represented as dots in a plan or as elements in a list. Relevance feedback Disadvantage : In a given page or screen there may not be any indication about the context where the user is.

Structure guided browsing Organized in a directory structure. It groups documents covering related topics. The same idea can be applied to a single document. Using history map.

The hypertext model Written text is usually conceived to be read sequentially. The reader should not expect to fully understand the message conveyed by the writer by randomly reading pieces of text here and there.