Probabilistic Ranking Principle Hongning Wang

Slides:



Advertisements
Similar presentations
Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
Advertisements

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.
1 Language Models for TR (Lecture for CS410-CXZ Text Info Systems) Feb. 25, 2011 ChengXiang Zhai Department of Computer Science University of Illinois,
Language Models Hongning Wang
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Introduction to Information Retrieval Information Retrieval and Data Mining (AT71.07) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor:
CpSc 881: Information Retrieval
Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
Visual Recognition Tutorial
IR Models: Overview, Boolean, and Vector
Chapter 7 Retrieval Models.
Hinrich Schütze and Christina Lioma
Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Modeling Modern Information Retrieval
Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.
Vector Space Model CS 652 Information Extraction and Integration.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
IR Models: Review Vector Model and Probabilistic.
Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.
Language Modeling Approaches for Information Retrieval Rong Jin.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
2008 © ChengXiang Zhai China-US-France Summer School, Lotus Hill Inst., Statistical Models for Information Retrieval and Text Mining ChengXiang.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR Principles Probabilistic IR with Term Independence Probabilistic.
Language Models Hongning Wang Two-stage smoothing [Zhai & Lafferty 02] c(w,d) |d| P(w|d) = +  p(w|C) ++ Stage-1 -Explain unseen words -Dirichlet.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
IR Models J. H. Wang Mar. 11, The Retrieval Process User Interface Text Operations Query Operations Indexing Searching Ranking Index Text quer y.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Language Models Hongning Wang Recap: document generation model 4501: Information Retrieval Model of relevant docs for Q Model of non-relevant.
Boolean Model Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Web-Mining Agents Probabilistic Information Retrieval Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
CpSc 881: Information Retrieval. 2 Using language models (LMs) for IR ❶ LM = language model ❷ We view the document as a generative model that generates.
Information Retrieval Models: Vector Space Models
Relevance Feedback Hongning Wang
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
A Study of Poisson Query Generation Model for Information Retrieval
Statistical Language Models Hongning Wang CS 6501: Text Mining1.
Essential Probability & Statistics (Lecture for CS397-CXZ Algorithms in Bioinformatics) Jan. 23, 2004 ChengXiang Zhai Department of Computer Science University.
2010 © University of Michigan Probabilistic Models in Information Retrieval SI650: Information Retrieval Winter 2010 School of Information University of.
KNN & Naïve Bayes Hongning Wang
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval Chengxiang Zhai, John Lafferty School of Computer Science Carnegie.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Essential Probability & Statistics
Probabilistic Retrieval Models
Language Models for Text Retrieval
Information Retrieval Models: Probabilistic Models
Relevance Feedback Hongning Wang
John Lafferty, Chengxiang Zhai School of Computer Science
CS 4501: Information Retrieval
Probabilistic Ranking Principle
Language Models Hongning Wang
CS 430: Information Discovery
Language Models for TR Rong Jin
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Probabilistic Ranking Principle Hongning Wang

Recap: latent semantic analysis Solution – Low rank matrix approximation Imagine this is our observed term-document matrix Imagine this is *true* concept-document matrix Random noise over the word selection in each document Information Retrieval2

Recap: latent semantic analysis Map to a lower dimensional space Information Retrieval3

Recap: latent semantic analysis Visualization Information Retrieval4

Recap: LSA for retrieval Information Retrieval5

Recap: LSA for retrieval q: “human computer interaction” HCI Graph theory Information Retrieval6

Notion of relevance 4501: Information Retrieval7 Relevance  (Rep(q), Rep(d)) Similarity P(r=1|q,d) r  {0,1} Probability of Relevance P(d  q) or P(q  d) Probabilistic inference Different rep & similarity Vector space model (Salton et al., 75) Prob. distr. model (Wong & Yao, 89) … Generative Model Regression Model (Fox 83) Classical prob. Model (Robertson & Sparck Jones, 76) Doc generation Query generation LM approach (Ponte & Croft, 98) (Lafferty & Zhai, 01a) Prob. concept space model (Wong & Yao, 95) Different inference system Inference network model (Turtle & Croft, 91) Today’s lecture

Basic concepts in probability Random experiment – An experiment with uncertain outcome (e.g., tossing a coin, picking a word from text) Sample space (S) – All possible outcomes of an experiment, e.g., tossing 2 fair coins, S={HH, HT, TH, TT} Event (E) – E  S, E happens iff outcome is in S, e.g., E={HH} (all heads), E={HH,TT} (same face) – Impossible event ({}), certain event (S) Probability of event – 0 ≤ P(E) ≤ : Information Retrieval8

Essential probability concepts 4501: Information Retrieval9

Essential probability concepts 4501: Information Retrieval10

Interpretation of Bayes’ rule 4501: Information Retrieval11

Theoretical justification of ranking As stated by William Cooper – Rank by probability of relevance leads to the optimal retrieval effectiveness 4501: Information Retrieval12 “If a reference retrieval system’s response to each request is a ranking of the documents in the collections in order of decreasing probability of usefulness to the user who submitted the request, where the probabilities are estimated as accurately as possible on the basis of whatever data made available to the system for this purpose, then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data.”

Justification 4501: Information Retrieval13 Your decision criterion?

Justification 4501: Information Retrieval14

According to PRP, what we need is A relevance measure function F(q,d) – For all q, d 1, d 2, F(q,d 1 ) > F(q,d 2 ) iff. p(Rel|q,d 1 ) >p(Rel|q,d 2 ) – Assumptions Independent relevance Sequential browsing 4501: Information Retrieval15 Most existing research on IR models so far has fallen into this line of thinking… (Limitations?)

Probability of relevance Three random variables – Query Q – Document D – Relevance R  {0,1} Goal: rank D based on P(R=1|Q,D) – Compute P(R=1|Q,D) – Actually, one only needs to compare P(R=1|Q,D 1 ) with P(R=1|Q,D 2 ), i.e., rank documents Several different ways to define P(R=1|Q,D) 4501: Information Retrieval16

Conditional models for P(R=1|Q,D) Basic idea: relevance depends on how well a query matches a document – P(R=1|Q,D)=g(Rep(Q,D)|  ) Rep(Q,D): feature representation of query-doc pair – E.g., #matched terms, highest IDF of a matched term, docLen – Using training data (with known relevance judgments) to estimate parameter  – Apply the model to rank new documents Special case: logistic regression 4501: Information Retrieval a functional form 17

Regression for ranking? 4501: Information Retrieval18

Features/Attributes for ranking Typical features considered in ranking problems Average Absolute Query Frequency Query Length Average Absolute Document Frequency Document Length Average Inverse Document Frequency Number of Terms in common between query and document 4501: Information Retrieval19

Regression for ranking 4501: Information Retrieval20 Optimal regression model x y Y is discrete in a ranking problem! What if we have an outlier?

Regression for ranking 4501: Information Retrieval21 x P(y|x) What if we have an outlier? Sigmoid function

Conditional models for P(R=1|Q,D) Pros & Cons Advantages – Absolute probability of relevance available – May re-use all the past relevance judgments Problems – Performance heavily depends on the selection of features – Little guidance on feature selection Will be covered with more details in later learning-to-rank discussions 4501: Information Retrieval22

Recap: interpretation of Bayes’ rule 4501: Information Retrieval23

Recap: probabilistic ranking principle 4501: Information Retrieval24

Recap: conditional models for P(R=1|Q,D) Basic idea: relevance depends on how well a query matches a document – P(R=1|Q,D)=g(Rep(Q,D)|  ) Rep(Q,D): feature representation of query-doc pair – E.g., #matched terms, highest IDF of a matched term, docLen – Using training data (with known relevance judgments) to estimate parameter  – Apply the model to rank new documents Special case: logistic regression 4501: Information Retrieval a functional form 25

Generative models for P(R=1|Q,D) Basic idea – Compute Odd(R=1|Q,D) using Bayes’ rule Assumption – Relevance is a binary variable Variants – Document “generation” P(Q,D|R)=P(D|Q,R)P(Q|R) – Query “generation” P(Q,D|R)=P(Q|D,R)P(D|R) 4501: Information Retrieval Ignored for ranking 26

Document generation model 4501: Information Retrieval Model of relevant docs for Q Model of non-relevant docs for Q Assume independent attributes of A 1 …A k ….(why?) Let D=d 1 …d k, where d k  {0,1} is the value of attribute A k (Similarly Q=q 1 …q k ) Terms occur in doc Terms do not occur in doc documentrelevant(R=1)nonrelevant(R=0) term present A i =1pipi uiui term absent A i =01-p i 1-u i Ignored for ranking 27 informationretrievalretrievedishelpfulforyoueveryone Doc Doc

Document generation model 4501: Information Retrieval Terms occur in doc Terms do not occur in doc documentrelevant(R=1)nonrelevant(R=0) term present A i =1pipi uiui term absent A i =01-p i 1-u i Assumption: terms not occurring in the query are equally likely to occur in relevant and nonrelevant documents, i.e., p t =u t Important tricks 28

Robertson-Sparck Jones Model (Robertson & Sparck Jones 76) 4501: Information Retrieval Two parameters for each term A i : p i = P(A i =1|Q,R=1): prob. that term A i occurs in a relevant doc u i = P(A i =1|Q,R=0): prob. that term A i occurs in a non-relevant doc (RSJ model) How to estimate these parameters? Suppose we have relevance judgments, “+0.5” and “+1” can be justified by Bayesian estimation as priors Per-query estimation! 29

Parameter estimation 4501: Information Retrieval30

Maximum likelihood vs. Bayesian Maximum likelihood estimation – “Best” means “data likelihood reaches maximum” – Issue: small sample size Bayesian estimation – “Best” means being consistent with our “prior” knowledge and explaining data well – A.k.a, Maximum a Posterior estimation – Issue: how to define prior? 4501: Information Retrieval31 ML: Frequentist’s point of view MAP: Bayesian’s point of view

Illustration of Bayesian estimation 4501: Information Retrieval32 Prior: p(  ) Likelihood: p(X|  ) X=(x 1,…,x N ) Posterior: p(  |X)  p(X|  )p(  )    : prior mode  ml : ML estimate  : posterior mode

Maximum likelihood estimation 4501: Information Retrieval33 Set partial derivatives to zero ML estimate Sincewe have Requirement from probability

Robertson-Sparck Jones Model (Robertson & Sparck Jones 76) 4501: Information Retrieval Two parameters for each term A i : p i = P(A i =1|Q,R=1): prob. that term A i occurs in a relevant doc u i = P(A i =1|Q,R=0): prob. that term A i occurs in a non-relevant doc (RSJ model) How to estimate these parameters? Suppose we have relevance judgments, “+0.5” and “+1” can be justified by Bayesian estimation as priors Per-query estimation! 34

RSJ Model without relevance info (Croft & Harper 79) 4501: Information Retrieval Suppose we do not have relevance judgments, - We will assume p i to be a constant - Estimate u i by assuming all documents to be non-relevant N: # documents in collection n i : # documents in which term A i occurs (RSJ model) IDF weighted Boolean model? Reminder: 35 informationretrievalretrievedishelpfulforyoueveryone Doc Doc

RSJ Model: summary The most important classic probabilistic IR model Use only term presence/absence, thus also referred to as Binary Independence Model – Essentially Naïve Bayes for doc ranking – Designed for short catalog records When without relevance judgments, the model parameters must be estimated in an ad-hoc way Performance isn’t as good as tuned VS models 4501: Information Retrieval36

Improving RSJ: adding TF 4501: Information Retrieval Let D=d 1 …d k, where d k is the frequency count of term A k 2-Poisson mixture model for TF Many more parameters to estimate! (how many exactly?) Compound with document length! Eliteness: if the term is about the concept asked in the query 37

BM25/Okapi approximation (Robertson et al. 94) Idea: model p(D|Q,R) with a simpler function that approximates 2-Possion mixture model Observations: – log O(R=1|Q,D) is a sum of term weights occurring in both query and document – Term weight W i = 0, if TF i =0 – W i increases monotonically with TF i – W i has an asymptotic limit The simple function is 4501: Information Retrieval38

Adding doc. length Incorporating doc length – Motivation: the 2-Poisson model assumes equal document length – Implementation: penalize long doc 4501: Information Retrieval where Pivoted document length normalization 39

Adding query TF Incorporating query TF – Motivation Natural symmetry between document and query – Implementation: a similar TF transformation as in document TF The final formula is called BM25, achieving top TREC performance 4501: Information Retrieval BM: best match 40

The BM25 formula 4501: Information Retrieval “Okapi TF/BM25 TF” becomes IDF when no relevance info is available 41

The BM25 formula 4501: Information Retrieval42 TF-IDF component for document TF component for query Vector space model with TF-IDF schema!

Extensions of “Doc Generation” models Capture term dependence [Rijsbergen & Harper 78] Alternative ways to incorporate TF [Croft 83, Kalt96] Feature/term selection for feedback [Okapi’s TREC reports] Estimate of the relevance model based on pseudo feedback [Lavrenko & Croft 01] 4501: Information Retrieval43 to be covered later

Query generation models 4501: Information Retrieval Assuming uniform document prior, we have Now, the question is how to compute ? Generally involves two steps: (1) estimate a language model based on D (2) compute the query likelihood according to the estimated model Language models, we will cover it in next lecture! Query likelihood p(q|  d ) Document prior 44

What you should know Essential concepts in probability Justification of ranking by relevance Derivation of RSJ model Maximum likelihood estimation BM25 formula 4501: Information Retrieval45

Today’s reading Chapter 11. Probabilistic information retrieval – 11.2 The Probability Ranking Principle – 11.3 The Binary Independence Model – Okapi BM25: a non-binary model 4501: Information Retrieval46