Recuperação de Informação B

Slides:



Advertisements
Similar presentations
Traditional IR models Jian-Yun Nie.
Advertisements

Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Basic IR: Modeling Basic IR Task: Slightly more complex:
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Extended Boolean Model n Boolean model is simple and elegant. n But, no provision for a ranking n As with the fuzzy model, a ranking can be obtained by.
| 1 › Gertjan van Noord2014 Zoekmachines Lecture 4.
CS 430 / INFO 430 Information Retrieval
CS 430 / INFO 430 Information Retrieval
IR Models: Overview, Boolean, and Vector
IR Models: Structural Models
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Project Management: The project is due on Friday inweek13.
Construction of Index: (Page 197) Objective: Given a document, find the number of occurrences of each word in the document. Example: Computer Science students.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
Information Retrieval Chapter 2: Modeling 2.1, 2.2, 2.3, 2.4, 2.5.1, 2.5.2, Slides provided by the author, modified by L N Cassel September 2003.
Information Retrieval Models - 1 Boolean. Introduction IR systems usually adopt index terms to process queries Index terms:  A keyword or group of selected.
Introduction to Digital Libraries Searching
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
Advanced information retrieval Chapter. 02: Modeling (Set Theoretic Models) – Fuzzy model.
Chapter. 02: Modeling Contenue... 19/10/2015Dr. Almetwally Mostafa 1.
Generalized Vector Model n Classic models enforce independence of index terms. n For the Vector model: u Set of term vectors {k1, k2,..., kt} are linearly.
Information Retrieval CSE 8337 Spring 2005 Modeling Material for these slides obtained from: Modern Information Retrieval by Ricardo Baeza-Yates and Berthier.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Najah Alshanableh. Fuzzy Set Model n Queries and docs represented by sets of index terms: matching is approximate from the start n This vagueness can.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
1 Patrick Lambrix Department of Computer and Information Science Linköpings universitet Information Retrieval.
Information Retrieval and Web Search Probabilistic IR and Alternative IR Models Rada Mihalcea (Some of the slides in this slide set come from a lecture.
The Boolean Model Simple model based on set theory
Information Retrieval and Web Search IR models: Boolean model Instructor: Rada Mihalcea Class web page:
Recuperação de Informação B Cap. 02: Modeling (Set Theoretic Models) 2.6 September 08, 1999.
Set Theoretic Models 1. IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models.
Recuperação de Informação B Cap. 02: Modeling (Latent Semantic Indexing & Neural Network Model) 2.7.2, September 27, 1999.
Introduction n IR systems usually adopt index terms to process queries n Index term: u a keyword or group of selected words u any word (more general) n.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
1 Boolean Model. 2 A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including.
Fuzzy Relations( 關係 ), Fuzzy Graphs( 圖 形 ), and Fuzzy Arithmetic( 運算 ) Chapter 4.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Section 14.2 Computing Partial Derivatives Algebraically
Why the interest in Queries?
CS 430: Information Discovery
Latent Semantic Indexing
Information Retrieval on the World Wide Web
Information Retrieval and Web Search
Representation of documents and queries
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Recuperação de Informação B
Models for Retrieval and Browsing - Fuzzy Set, Extended Boolean, Generalized Vector Space Models Berlin Chen 2003 Reference: 1. Modern Information Retrieval,
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Recuperação de Informação B
Berlin Chen Department of Computer Science & Information Engineering
Recuperação de Informação B
Recuperação de Informação B
Information Retrieval and Web Design
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Advanced information retrieval
Modeling in Information Retrieval - Fuzzy Set, Extended Boolean, Generalized Vector Space, Set-based Models, and Best Match Models Berlin Chen Department.
Berlin Chen Department of Computer Science & Information Engineering
CS 430: Information Discovery
Presentation transcript:

Recuperação de Informação B Cap. 02: Modeling (Extended Boolean Model) 2.6.2 September 13, 1999

Extended Boolean Model Boolean model is simple and elegant. But, no provision for a ranking As with the fuzzy model, a ranking can be obtained by relaxing the condition on set membership Extend the Boolean model with the notions of partial matching and term weighting Combine characteristics of the Vector model with properties of Boolean algebra

The Idea The extended Boolean model (introduced by Salton, Fox, and Wu, 1983) is based on a critique of a basic assumption in Boolean algebra Let, q = kx  ky wxj = fxj * idf(x) associated with [kx,dj] max(idf(i)) Further, wxj = x and wyj = y

The Idea: qand = kx  ky; wxj = x and wyj = y (1,1) ky dj+1 AND y = wyj dj (0,0) x = wxj kx sim(qand,dj) = 1 - sqrt( (1-x) + (1-y) ) 2

The Idea: qor = kx  ky; wxj = x and wyj = y (1,1) ky dj+1 OR dj y = wyj (0,0) x = wxj kx sim(qor,dj) = sqrt( x + y ) 2

Generalizing the Idea We can extend the previous model to consider Euclidean distances in a t-dimensional space This can be done using p-norms which extend the notion of distance to include p-distances, where 1  p   is a new parameter A generalized conjunctive query is given by qor = k1 k2 . . . kt A generalized disjunctive query is given by qand = k1 k2 . . . kt p  p  p   p  p  p

Generalizing the Idea sim(qor,dj) = (x1 + x2 + . . . + xm ) m p 1 p 1 p p p sim(qand,dj) = 1 - ((1-x1) + (1-x2) + . . . + (1-xm) ) m

Properties sim(qor,dj) = (x1 + x2 + . . . + xm ) m If p = 1 then (Vector like) sim(qor,dj) = sim(qand,dj) = x1 + . . . + xm m If p =  then (Fuzzy like) sim(qor,dj) = max (wxj) sim(qand,dj) = min (wxj) p 1

Properties By varying p, we can make the model behave as a vector, as a fuzzy, or as an intermediary model This is quite powerful and is a good argument in favor of the extended Boolean model (k1 k2) k3 k1 and k2 are to be used as in a vectorial retrieval while the presence of k3 is required.    2

Properties   q = (k1 k2) k3 sim(q,dj) = ( (1 - ( (1-x1) + (1-x2) ) ) + x3 ) 2 2  2 1 p 1 p p p p

Conclusions Model is quite powerful Properties are interesting and might be useful Computation is somewhat complex However, distributivity operation does not hold for ranking computation: q1 = (k1  k2)  k3 q2 = (k1  k3)  (k2  k3) sim(q1,dj)  sim(q2,dj)