IRDM WS 2005 4-1 Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.1.1 Principles 4.1.2 Probabilistic IR with Term Independence 4.1.3 Probabilistic.

Slides:

Advertisements

Similar presentations

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.

Advertisements

A Tutorial on Learning with Bayesian Networks

Language Models Naama Kraus (Modified by Amit Gross) Slides are based on Introduction to Information Retrieval Book by Manning, Raghavan and Schütze.

The Probabilistic Model. Probabilistic Model n Objective: to capture the IR problem using a probabilistic framework; n Given a user query, there is an.

Probabilistic Information Retrieval Chris Manning, Pandu Nayak and

CpSc 881: Information Retrieval

Modern Information Retrieval by R. Baeza-Yates and B. Ribeiro-Neto

Probabilistic Ranking Principle

Information Retrieval in Practice

Information Retrieval Models: Probabilistic Models

Learning for Text Categorization

Chapter 7 Retrieval Models.

Hinrich Schütze and Christina Lioma

Incorporating Language Modeling into the Inference Network Retrieval Framework Don Metzler.

Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.

Modeling Modern Information Retrieval

Language Models for TR Rong Jin Department of Computer Science and Engineering Michigan State University.

. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.

Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”

1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.

Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.

Today Logistic Regression Decision Trees Redux Graphical Models

IR Models: Review Vector Model and Probabilistic.

Other IR Models Non-Overlapping Lists Proximal Nodes Structured Models Retrieval: Adhoc Filtering Browsing U s e r T a s k Classic Models boolean vector.

Language Modeling Approaches for Information Retrieval Rong Jin.

Chapter 7 Retrieval Models.

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.

Modeling (Chap. 2) Modern Information Retrieval Spring 2000.

A Brief Introduction to Graphical Models

Chapter III: Ranking Principles Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12.

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.

IRDM WS Chapter 2: Basics from Probability Theory and Statistics 2.1 Probability Theory Events, Probabilities, Random Variables, Distributions,

IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR 4.2 Statistical Language Models (LMs) Principles and Basic LMs Smoothing.

Text Classification, Active/Interactive learning.

A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.

Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Probabilistic Ranking Principle Hongning Wang

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):

Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.

Web-Mining Agents Probabilistic Information Retrieval Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)

Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.

Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.

Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Microsoft Research Cambridge,

Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.

Chapter 7. Classification and Prediction

Information Retrieval Models: Probabilistic Models

Relevance Feedback Hongning Wang

Language Models for Information Retrieval

Murat Açar - Zeynep Çipiloğlu Yıldız

Markov Random Fields Presented by: Vladan Radosavljevic.

Class #16 – Tuesday, October 26

Text Retrieval and Mining

Probabilistic Ranking Principle

Parameter Learning 2 Structure Learning 1: The good

Topic Models in Text Processing

CS 430: Information Discovery

CS590I: Information Retrieval

Recuperação de Informação B

INF 141: Information Retrieval

Language Models for TR Rong Jin

ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH

Presentation transcript:

IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR Principles Probabilistic IR with Term Independence Probabilistic IR with 2-Poisson Model (Okapi BM25) Extensions of Probabilistic IR 4.2 Statistical Language Models 4.3 Latent-Concept Models

IRDM WS Probabilistic Retrieval: Principles [Robertson and Sparck Jones 1976] Goal: Ranking based on sim(doc d, query q) = P[R|d] = P [ doc d is relevant for query q | d has term vector X1,..., Xm ] Assumptions: Relevant and irrelevant documents differ in their terms. Binary Independence Retrieval (BIR) Model: Probabilities for term occurrence are pairwise independent for different terms. Term weights are binary  {0,1}. For terms that do not occur in query q the probabilities for such a term occurring are the same for relevant and irrelevant documents.

IRDM WS Probabilistic IR with Term Independence: Ranking Proportional to Relevance Odds (Bayes‘ theorem) (odds for relevance) (independence or linked dependence) (Xi = 1 if d includes i-th term, 0 otherwise)

IRDM WS Probabilistic Retrieval: Ranking Proportional to Relevance Odds (cont.) (binary features) with estimators pi=P[Xi=1|R] and qi=P[Xi=1|  R]

IRDM WS Probabilistic Retrieval: Robertson / Sparck Jones Formula Estimate pi und qi based on training sample (query q on small sample of corpus) or based on intellectual assessment of first round‘s result (relevance feedback): Let N be #docs in sample, R be # relevant docs in sample ni #docs in sample that contain term i, ri # relevant docs in sample that contain term i Estimate:  or:   Weight of term i in doc d: (Lidstone smoothing with =0.5)

IRDM WS Probabilistic Retrieval: tf*idf Formula Assumptions (without training sample or relevance feedback): pi is the same for all i. Most documents are irrelevant. Each individual term i is infrequent. This implies: with constant c  Scalar product over the product of tf and dampend idf values for query terms

IRDM WS Example for Probabilistic Retrieval Documents with relevance feedback: t1 t2 t3 t4 t5 t6 R d d d d ni ri pi 5/6 1/2 1/2 5/6 1/2 1/6 qi 1/6 1/6 1/2 1/2 1/2 1/6 R=2, N=4 q: t1 t2 t3 t4 t5 t6 Score of new document d5 (with Lidstone smoothing with =0.5): d5  q:  sim(d5, q) = log 5 + log 1 + log log 5 + log 5 + log 5

IRDM WS Laplace Smoothing (with Uniform Prior) Probabilities pi and qi for term i are estimated by MLE for binomial distribution (repeated coin tosses for relevant docs, showing term i with pi, Repeated coin tosses for irrelevant docs, showing term i with qi) To avoid overfitting to feedback/training, the estimates should be smoothed (e.g. with uniform prior): Instead of estimating pi = k/n estimate (Laplace‘s law of succession): pi = (k+1) / (n+2) or with heuristic generalization (Lidstone‘s law of succession): pi = (k+ ) / ( n+2 ) with > 0 (e.g. =0.5) And for multinomial distribution (n times w-faceted dice) estimate: pi = (ki + 1) / (n + w)

IRDM WS Probabilistic IR with Poisson Model (Okapi BM25) Generalize term weight into with p j, q j denoting prob. that term occurs j times in rel./irrel. doc Postulate Poisson (or Poisson-mixture) distributions:

IRDM WS Okapi BM25 Approximation of Poisson model by similarly-shaped function: finally leads to Okapi BM25 (which achieved best TREC results): with  =avgdoclength and tuning parameters k 1, k 2, k 3, b, and non-linear influence of tf and consideration of doc length or in the most comprehensive, tunable form:

IRDM WS Poisson Mixtures for Capturing tf Distribution Source: Church/Gale 1995 distribution of tf values for term „said“ Katz‘s K-mixture:

IRDM WS Katz‘s K-Mixture Katz‘s K-mixture: e.g. with : with  (G)=1 if G is true, 0 otherwise Parameter estimation for given term: observed mean tf extra occurrences (tf>1)

IRDM WS Extensions of Probabilistic IR One possible approach: Tree Dependence Model: a) Consider only 2-dimensional probabilities (for term pairs) f ij (Xi, Xj)=P[Xi=..  Xj=..]= b) For each term pair estimate the error between independence and the actual correlation c) Construct a tree with terms as nodes and the m-1 highest error (or correlation) values as weighted edges Consider term correlations in documents (with binary Xi)  Problem of estimating m-dimensional prob. distribution P[X1=...  X2=... ...  Xm=...] =: f X (X1,..., Xm)

IRDM WS Considering Two-dimensional Term Correlation Variant 1: Error of approximating f by g (Kullback-Leibler divergence) with g assuming pairwise term independence: Variant 2: Correlation coefficient for term pairs: Variant 3: level-  values or p-values of Chi-square independence test

IRDM WS Example for Approximation Error  (KL Strength) m=2: given are documents: d1=(1,1), d2(0,0), d3=(1,1), d4=(0,1) estimation of 2-dimensional prob. distribution f: f(1,1) = P[X1=1  X2=1] = 2/4 f(0,0) = 1/4, f(0,1) = 1/4, f(1,0) = 0 estimation of 1-dimensional marginal distributions g1 and g2: g1(1) = P[X1=1] = 2/4, g1(0) = 2/4 g2(1) = P[X2=1] = 3/4, g2(0) = 1/4 estimation of 2-dim. distribution g with independent Xi: g(1,1) = g1(1)*g2(1) = 3/8, g(0,0) = 1/8, g(0,1) = 3/8, g(1,0) =1/8 approximation error  (KL divergence):  = 2/4 log 4/3 + 1/4 log 2 + 1/4 log 2/3 + 0

IRDM WS Constructing the Term Dependence Tree Given: complete graph (V, E) with m nodes Xi  V and m 2 undirected edges  E with weights  (or  ) Wanted: spanning tree (V, E‘) with maximal sum of weights Algorithm: Sort the m 2 edges of E in descending order of weight E‘ :=  Repeat until |E‘| = m-1 E‘ := E‘  {(i,j)  E | (i,j) has max. weight in E} provided that E‘ remains acyclic; E := E – {(i,j)  E | (i,j) has max. weight in E} Example:Web Internet Surf Swim Web InternetSurf Swim

IRDM WS Estimation of Multidimensional Probabilities with Term Dependence Tree Given is a term dependence tree (V = {X1,..., Xm}, E‘). Let X1 be the root, nodes are preorder-numbered, and assume that Xi and Xj are independent for (i,j)  E‘. Then: Example: Web InternetSurf Swim P[Web, Internet, Surf, Swim] =

IRDM WS A Bayesian network (BN) is a directed, acyclic graph (V, E) with the following properties: Nodes  V representing random variables and Edges  E representing dependencies. For a root R  V the BN captures the prior probability P[R =...]. For a node X  V with parents parents(X) = {P1,..., Pk} the BN captures the conditional probability P[X=... | P1,..., Pk]. Node X is conditionally independent of a non-parent node Y given its parents parents(X) = {P1,..., Pk}: P[X | P1,..., Pk, Y] = P[X | P1,..., Pk]. This implies: by the chain rule: by cond. independence: Bayesian Networks

IRDM WS Example of Bayesian Network (Belief Network) Cloudy SprinklerRain Wet C P[S] P[  S] F T P[C] P[  C] 0.5 C P[R] P[  R] F T S R P[W] P[  W] F F F T T F T T P[W | S,R]: P[C]: P[R | C]: P[S | C]:

IRDM WS Bayesian Inference Networks for IR d1djdN... t1titM... q tl P[dj]=1/N P[ti | dj  parents(ti)] = 1 if ti occurs in dj, 0 otherwise P[q | parents(q)] = 1 if  t  parents(q): t is relevant for q, 0 otherwise with binary random variables

IRDM WS Advanced Bayesian Network for IR d1djdN... t1titM... q tl c1ckcK... concepts / topics Problems: parameter estimation (sampling / training) (non-) scalable representation (in-) efficient prediction fully convincing experiments

IRDM WS Additional Literature for Chapter 4 Probabilistic IR: Grossman/Frieder Sections 2.2 and 2.4 S.E. Robertson, K. Sparck Jones: Relevance Weighting of Search Terms, JASIS 27(3), 1976 S.E. Robertson, S. Walker: Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval, SIGIR 1994 K.W. Church, W.A. Gale: Poisson Mixtures, Natural Language Engineering 1(2), 1995 C.T. Yu, W. Meng: Principles of Database Query Processing for Advanced Applications, Morgan Kaufmann, 1997, Chapter 9 D. Heckerman: A Tutorial on Learning with Bayesian Networks, Technical Report MSR-TR-95-06, Microsoft Research, 1995