Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Fuzzy Set and Opertion. Outline Fuzzy Set and Crisp Set Expanding concepts Standard operation of fuzzy set Fuzzy relations Operations on fuzzy relations.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by.
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
ISP 433/533 Week 2 IR Models.
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element u.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Project Management: The project is due on Friday inweek13.
Vector Space Model CS 652 Information Extraction and Integration.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Fussy Set Theory Definition A fuzzy subset A of a universe of discourse U is characterized by a membership function which associate with each element.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Chapter 7 Retrieval Models.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32-33: Information Retrieval: Basic concepts and Model.
PrasadL2IRModels1 Models for IR Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 19: Fuzzy Logic and Neural Net Based IR.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Information Retrieval Chap. 02: Modeling - Part 2 Slides from the text book author, modified by L N Cassel September 2003.
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Information Retrieval CSE 8337 Spring 2005 Modeling (Part II) Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
1 Boolean Model. 2 A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including.
Information Retrieval Models School of Informatics Dept. of Library and Information Studies Dr. Miguel E. Ruiz.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Recuperação de Informação B
Recuperação de Informação B
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Advanced information retrieval
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Berlin Chen Department of Computer Science & Information Engineering
Presentation transcript:

Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999

Set Theoretic Models n The Boolean model imposes a binary criterion for deciding relevance n The question of how to extend the Boolean model to accomodate partial matching and a ranking has attracted considerable attention in the past n We discuss now two set theoretic models for this: u Fuzzy Set Model u Extended Boolean Model

Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can be modeled using a fuzzy framework, as follows: u with each term is associated a fuzzy set u each doc has a degree of membership in this fuzzy set n This interpretation provides the foundation for many models for IR based on fuzzy theory n In here, we discuss the model proposed by Ogawa, Morita, and Kobayashi (1991)

Fuzzy Set Theory n Framework for representing classes whose boundaries are not well defined n Key idea is to introduce the notion of a degree of membership associated with the elements of a set n This degree of membership varies from 0 to 1 and allows modeling the notion of marginal membership n Thus, membership is now a gradual notion, contrary to the crispy notion enforced by classic Boolean logic

Fuzzy Set Theory n Definition u A fuzzy subset A of U is characterized by a membership function  (A,u) : U  [0,1] which associates with each element u of U a number  (u) in the interval [0,1] n Definition u Let A and B be two fuzzy subsets of U. Also, let ¬A be the complement of A. Then, F  (¬A,u) = 1 -  (A,u) F  (A  B,u) = max(  (A,u),  (B,u)) F  (A  B,u) = min(  (A,u),  (B,u))

Fuzzy Information Retrieval n Fuzzy sets are modeled based on a thesaurus n This thesaurus is built as follows: u Let vec(c) be a term-term correlation matrix u Let c(i,l) be a normalized correlation factor for (ki,kl): c(i,l) = n(i,l) ni + nl - n(i,l) ni: number of docs which contain ki nl: number of docs which contain kl n(i,l): number of docs which contain both ki and kl n We now have the notion of proximity among index terms.

Fuzzy Information Retrieval n The correlation factor c(i,l) can be used to define fuzzy set membership for a document dj as follows:  (i,j) = 1 -  (1 - c(i,l)) ki  dj  (i,j) : membership of doc dj in fuzzy subset associated with ki n The above expression computes an algebraic sum over all terms in the doc dj n A doc dj belongs to the fuzzy set for ki, if its own terms are associated with ki

Fuzzy Information Retrieval n  (i,j) = 1 -  (1 - c(i,l)) ki  dj  (i,j) : membership of doc dj in fuzzy subset associated with ki n If doc dj contains a term kl which is closely related to ki, we have u c(i,l) ~ 1 u  (i,j) ~ 1 u index ki is a good fuzzy index for doc

Fuzzy IR: An Example n q = ka  (kb   kc) n vec(qdnf) = (1,1,1) + (1,1,0) + (1,0,0) = vec(cc1) + vec(cc2) + vec(cc3) n  (q,dj) =  (cc1+cc2+cc3,j) = 1 - (1 -  (a,j)  (b,j)  (c,j)) * (1 -  (a,j)  (b,j) (1-  (c,j))) * (1 -  (a,j) (1-  (b,j)) (1-  (c,j))) cc1 cc3 cc2 KaKb Kc

Fuzzy Information Retrieval n Fuzzy IR models have been discussed mainly in the literature associated with fuzzy theory n Experiments with standard test collections are not available n Difficult to compare at this time

Extended Boolean Model n Booelan retrieval is simple and elegant n But, no ranking is provided n How to extend the model? u interpret conjunctions and disjunctions in terms of Euclidean distances