Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by.

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
Modern Information Retrieval Chapter 1: Introduction
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
CS 430 / INFO 430 Information Retrieval
CS 430 / INFO 430 Information Retrieval
Web Search - Summer Term 2006 II. Information Retrieval (Basics Cont.)
IR Models: Overview, Boolean, and Vector
IR Models: Structural Models
Models for Information Retrieval Mainly used in science and research, (probably?) less often in real systems But: Research results have significance for.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Chapter 2Modeling 資工 4B 陳建勳. Introduction.  Traditional information retrieval systems usually adopt index terms to index and retrieve documents.
Modeling Modern Information Retrieval
Project Management: The project is due on Friday inweek13.
Construction of Index: (Page 197) Objective: Given a document, find the number of occurrences of each word in the document. Example: Computer Science students.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Recuperação de Informação. IR: representation, storage, organization of, and access to information items Emphasis is on the retrieval of information (not.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
Introduction n Keyword-based query answering considers that the documents are flat i.e., a word in the title has the same weight as a word in the body.
1 A Theoretical Framework for Association Mining based on the Boolean Retrieval Model on the Boolean Retrieval Model Peter Bollmann-Sdorra.
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
PrasadL2IRModels1 Models for IR Adapted from Lectures by Berthier Ribeiro-Neto (Brazil), Prabhakar Raghavan (Yahoo and Stanford) and Christopher Manning.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Introduction to Digital Libraries Searching
Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Vector Space Models.
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Vector Space Classification 1.Vector space text classification 2.Rochhio Text Classification.
1 Boolean Model. 2 A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including.
Fuzzy Relations( 關係 ), Fuzzy Graphs( 圖 形 ), and Fuzzy Arithmetic( 運算 ) Chapter 4.
CS 430: Information Discovery
Latent Semantic Indexing
Information Retrieval and Web Search
Representation of documents and queries
Recuperação de Informação B
Recuperação de Informação B
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Advanced information retrieval
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Berlin Chen Department of Computer Science & Information Engineering
CS 430: Information Discovery
Presentation transcript:

Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by relaxing the condition on set membership n Extend the Boolean model with the notions of partial matching and term weighting n Combine characteristics of the Vector model with properties of Boolean algebra

The Idea n The extended Boolean model (introduced by Salton, Fox, and Wu, 1983) is based on a critique of a basic assumption in Boolean algebra n Let, u q = kx  ky u wxj = fxj * idf(x) associated with [kx,dj] max(idf(i)) u Further, wxj = x and wyj = y

The Idea: n qand = kx  ky; wxj = x and wyj = y dj dj+1 y = wyj x = wxj(0,0) (1,1) kx ky sim(qand,dj) = 1 - sqrt( (1-x) + (1-y) ) 2 22 AND

The Idea: n qor = kx  ky; wxj = x and wyj = y dj dj+1 y = wyj x = wxj(0,0) (1,1) kx ky sim(qor,dj) = sqrt( x + y ) 2 22 OR

Generalizing the Idea n We can extend the previous model to consider Euclidean distances in a t-dimensional space n This can be done using p-norms which extend the notion of distance to include p-distances, where 1  p   is a new parameter n A generalized conjunctive query is given by u qor = k1 k2... kt n A generalized disjunctive query is given by u qand = k1 k2... kt p  p  p   p  p  p

Generalizing the Idea u sim(qand,dj) = 1 - ((1-x1) + (1-x2) (1-xm) ) m ppp p 1 u sim(qor,dj) = (x1 + x xm ) m ppp p 1

Properties u sim(qor,dj) = (x1 + x xm ) m u If p = 1 then (Vector like) F sim(qor,dj) = sim(qand,dj) = x xm m u If p =  then (Fuzzy like) F sim(qor,dj) = max (wxj) F sim(qand,dj) = min (wxj) ppp p 1

Properties u By varying p, we can make the model behave as a vector, as a fuzzy, or as an intermediary model u This is quite powerful and is a good argument in favor of the extended Boolean model u (k1 k2) k3 F k1 and k2 are to be used as in a vectorial retrieval while the presence of k3 is required.    2

Properties u q = (k1 k2) k3 u sim(q,dj) = ( (1 - ( (1-x1) + (1-x2) ) ) + x3 ) 2 2    2 p p p 1 p 1 p

Conclusions n Model is quite powerful n Properties are interesting and might be useful n Computation is somewhat complex n However, distributivity operation does not hold for ranking computation: u q1 = (k1  k2)  k3 u q2 = (k1  k3)  (k2  k3) u sim(q1,dj)  sim(q2,dj)