Modern Information Retrieval Chapter 2 Modeling. Probabilistic model the appearance or absent of an index term in a document is interpreted either as.

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Boolean and Vector Space Retrieval Models
Chapter 5: Introduction to Information Retrieval
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Indexing. Efficient Retrieval Documents x terms matrix t 1 t 2... t j... t m nf d 1 w 11 w w 1j... w 1m 1/|d 1 | d 2 w 21 w w 2j... w 2m 1/|d.
Basic IR: Modeling Basic IR Task: Slightly more complex:
INSTRUCTOR: DR.NICK EVANGELOPOULOS PRESENTED BY: QIUXIA WU CHAPTER 2 Information retrieval DSCI 5240.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
Introduction to Information Retrieval (Manning, Raghavan, Schutze) Chapter 6 Scoring term weighting and the vector space model.
IR Models: Overview, Boolean, and Vector
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Database Management Systems, R. Ramakrishnan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides.
SLIDE 1IS 240 – Spring 2010 Logistic Regression The logistic function: The logistic function is useful because it can take as an input any.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) IR Queries.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
Ch 4: Information Retrieval and Text Mining
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Modeling Modern Information Retrieval
Modern Information Retrieval Chapter 5 Query Operations.
1 CS 430 / INFO 430 Information Retrieval Lecture 3 Vector Methods 1.
Vector Space Model CS 652 Information Extraction and Integration.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
Modern Information Retrieval Chapter 2 Modeling. Can keywords be used to represent a document or a query? keywords as query and matching as query processing.
Automatic Indexing (Term Selection) Automatic Text Processing by G. Salton, Chap 9, Addison-Wesley, 1989.
IR Models: Review Vector Model and Probabilistic.
CS246 Basic Information Retrieval. Today’s Topic  Basic Information Retrieval (IR)  Bag of words assumption  Boolean Model  Inverted index  Vector-space.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 CS 430: Information Discovery Lecture 12 Extending the Boolean Model.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Term Frequency. Term frequency Two factors: – A term that appears just once in a document is probably not as significant as a term that appears a number.
1 Computing Relevance, Similarity: The Vector Space Model.
CSE3201/CSE4500 Term Weighting.
CPSC 404 Laks V.S. Lakshmanan1 Computing Relevance, Similarity: The Vector Space Model Chapter 27, Part B Based on Larson and Hearst’s slides at UC-Berkeley.
Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
University of Malta CSA3080: Lecture 6 © Chris Staff 1 of 20 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Alternative IR models DR.Yeni Herdiyeni, M.Kom STMIK ERESHA.
Chapter 23: Probabilistic Language Models April 13, 2004.
Vector Space Models.
1 Data Mining: Text Mining. 2 Information Retrieval Techniques Index Terms (Attribute) Selection: Stop list Word stem Index terms weighting methods Terms.
Ranked Retrieval INST 734 Module 3 Doug Oard. Agenda Ranked retrieval  Similarity-based ranking Probability-based ranking.
Term Weighting approaches in automatic text retrieval. Presented by Ehsan.
Basics of Databases and Information Retrieval1 Databases and Information Retrieval Lecture 1 Basics of Databases and Information Retrieval Instructor Mr.
Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework n Given a user query, there is an ideal answer set n Querying.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
1 CS 430: Information Discovery Lecture 8 Collection-Level Metadata Vector Methods.
1 Boolean Model. 2 A document is represented as a set of keywords. Queries are Boolean expressions of keywords, connected by AND, OR, and NOT, including.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Automated Information Retrieval
Plan for Today’s Lecture(s)
Tutorial#3.
INFORMATION RETRIEVAL TECHNIQUES BY DR. ADNAN ABID
Multimedia Information Retrieval
Basic Information Retrieval
موضوع پروژه : بازیابی اطلاعات Information Retrieval
CS 430: Information Discovery
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Recuperação de Informação B
Boolean and Vector Space Retrieval Models
Recuperação de Informação B
Information Retrieval and Web Design
CS 430: Information Discovery
Presentation transcript:

Modern Information Retrieval Chapter 2 Modeling

Probabilistic model the appearance or absent of an index term in a document is interpreted either as evidence that the document is relevant or that it is irrelevant to a query  establish a weight for each term

a collection of N documents  R of which are relevant R t of which contain term t  f t of which contain t  these values can be obtained from a training set with relevance judgments

computing probabilities  P r [relevant t]=R t f t  P r [irrelevant t]=(f t -R t ) f t  P r [relevant t ]=(R-R t )/(N-f t )  P r [irrelevant t ]=(N-f t -(R-R t ))/(N-f t )

computing weight W t for t W t = P r [relevant t]  P r [irrelevant t ] P r [irrelevant t] P r [relevant t ] = R t /f t  (N-f t -(R-R t ))/(N-f t ) (f t -R t )/f t (R-R t )/(N-f t ) = R t /(R-R t ) (f t -R t )/(N-f t -(R-R t ))

 W t >1 indicates that the appearance of t supports the document is relevant  W t <1 indicates that the appearance of t suggests the document is irrelevant  N=20, R=13, R t =11, f t =12  W t =33  N=20, R=13, R t =4, f t =7  W t =0.59  W t =1 indicates that t is neutral

 negative weight indicates that the document is predicted to be irrelevant  zero weight indicates that the document is neutral

Comparison the Boolean model is the weakest model  no partial matching the vector model and probabilistic model are comparative while the vector model is more popular  term frequency is not considered in the probabilistic model