Probabilistic Information Retrieval

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Information Retrieval and Organisation Chapter 11 Probabilistic Information Retrieval Dell Zhang Birkbeck, University of London.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
A Tutorial on Learning with Bayesian Networks
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probabilistic Information Retrieval Part I: Survey Alexander Dekhtyar department of Computer Science University of Maryland.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Probabilistic Information Retrieval Chris Manning, Pandu Nayak and
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Introduction of Probabilistic Reasoning and Bayesian Networks
Introduction to Information Retrieval Information Retrieval and Data Mining (AT71.07) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor:
CpSc 881: Information Retrieval
Lecture 11: Probabilistic Information Retrieval
Probabilistic Ranking Principle
Information Retrieval Models: Probabilistic Models
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Hinrich Schütze and Christina Lioma
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Introduction to Information Retrieval Introduction to Information Retrieval Hinrich Schütze and Christina Lioma Lecture 11: Probabilistic Information Retrieval.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic IR Models Based on probability theory Basic idea : Given a document d and a query q, Estimate the likelihood of d being relevant for the.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Probabilistic Information Retrieval.
Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.
Modeling Modern Information Retrieval
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
1 CS 430 / INFO 430 Information Retrieval Lecture 10 Probabilistic Information Retrieval.
Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.
CS276A Text Retrieval and Mining Lecture 10. Recap of the last lecture Improving search results Especially for high recall. E.g., searching for aircraft.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Probabilistic Models in IR Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata Using majority of the slides from.
Modeling (Chap. 2) Modern Information Retrieval Spring 2000.
A Brief Introduction to Graphical Models
IRDM WS Chapter 4: Advanced IR Models 4.1 Probabilistic IR Principles Probabilistic IR with Term Independence Probabilistic.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Introduction to Bayesian Networks
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
C.Watterscsci64031 Probabilistic Retrieval Model.
Language Modeling Putting a curve to the bag of words Courtesy of Chris Jordan.
Web-Mining Agents Probabilistic Information Retrieval Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
CS276A Text Information Retrieval, Mining, and Exploitation Lecture 6 22 Oct 2002.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Introduction on Graphic Models
Introduction to Information Retrieval Introduction to Information Retrieval Lecture Probabilistic Information Retrieval.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Introduction to Information Retrieval Probabilistic Information Retrieval Chapter 11 1.
Recuperação de Informação B Modern Information Retrieval Cap. 2: Modeling Section 2.8 : Alternative Probabilistic Models September 20, 1999.
Probabilistic Information Retrieval
Probabilistic Retrieval Models
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Web-Mining Agents Probabilistic Information Retrieval
Information Retrieval Models: Probabilistic Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
CS 188: Artificial Intelligence Fall 2007
Text Retrieval and Mining
CS 430: Information Discovery
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Recuperação de Informação B
ADVANCED TOPICS IN INFORMATION RETRIEVAL AND WEB SEARCH
Presentation transcript:

Probabilistic Information Retrieval [ ProbIR ] Suman K Mitra DAIICT, Gandhinagar suman_mitra@daiict.ac.in

Acknowledgment Alexander Dekhtyar, University of Maryland Mandar Mitra, ISI, Kolkata Prasenjit Majumder, DAIICT, Gandhinagar

Why use Probabilities? Information Retrieval deals with uncertain information Probability is a measure of uncertainty Probabilistic Ranking Principle provable minimization of risk Probabilistic Inference To justify your decision

Document Representation How good the representation is? How exact the representation is? How well is the query matched ? How relevant is the result to the query? 1 Document Representation Document Collection 3 Query 2 Query Representation 4 Basic IR System

Approaches and main Contributors Probability Ranking Principle – Robertson 1970 onwards Information Retrieval as Probabilistic Inference – Van Rijsbergen et al. 1970 onwards Probabilistic Indexing – Fuhr et al. 1980 onwards Bayesian Nets in Information Retrieval – Turtle, Croft 1990 onwards Probabilistic Logic Programming in Information Retrieval – Fuhr et al. 1990 onwards

Probability Ranking Principle Collection of documents Representation of documents User uses a query Representation of query A set of documents to return Question: In what order documents to present to user? Logically: Best document first and then next best and so on Requirement: A formal way to judge the goodness of documents with respect to query Possibility: Probability of relevance of the document with respect to query

Probability Ranking Principle If a retrieval system’s response to each request is a ranking of the documents in the collections in order of decreasing probability of goodness to the user who submitted the request ... … where the probabilities are estimated as accurately as possible on the basis of whatever data made available to the system for this purpose ... … then the overall effectiveness of the system to its users will be the best that is obtainable on the basis of that data. W. S. Cooper

Probability Basics Bayes’ Rule Let a and b are two events Odds of an event a is defined as

Conditional probability satisfies all axioms of probability (ii) (iii) If are mutually exclusive events then Hence (i) Hence (ii)

[ ‘s are all mutually exclusive] Hence (iii)

Probability Ranking Principle Let x be a document in the collection. Let R represent relevance of a document w.r.t. given (fixed) query and let NR represent non-relevance. p(R|x) - probability that a retrieved document x is relevant. p(NR|x) - probability that a retrieved document x is non-relevant. p(R),p(NR) - prior probability of retrieving a relevant and non-relevant document respectively p(x|R), p(x|NR) - probability that if a relevant (non-relevant) document is retrieved, it is x.

Probability Ranking Principle Ranking Principle (Bayes’ Decision Rule): If p(R|x) > p(NR|x) then x is relevant, otherwise x is not relevant

Probability Ranking Principle Is PRP minimizes the average probability of error? Actual X is R X is NR Decision 2 1 If we decide NR If we decide R p(error) is minimal when all p(error|x) are minimal. Bayes’ decision rule minimizes each p(error|x).

X is either in R or NR Constant

Minimization of p(error) Minimization of 2p(error) p(x) is same for any x Define are nothing but the decision for x to be in R and NR respectively and : The decision minimizes p(error) and Hence

Probability Ranking Principle Issues How do we compute all those probabilities? Cannot compute exact probabilities, have to use estimates from the (ground truth) data. (Binary Independence Retrieval) (Bayesian Networks??) Assumptions “Relevance” of each document is independent of relevance of other documents. Most applications are for Boolean model.

Probability Ranking Principle Actual X is R X is NR Decision Simple case: no selection costs. x is relevant iff p(R|x) > p(NR|x) (Bayes’ Decision Rule) PRP: Rank all documents by p(R|x).

Probability Ranking Principle Actual X is R X is NR Decision More complex case: retrieval costs. C - cost of retrieval of relevant document C’ - cost of retrieval of non-relevant document let d, be a document Probability Ranking Principle: if for all d’ not yet retrieved, then d is the next document to be retrieved

Binary Independence Model

Binary Independence Model Traditionally used in conjunction with PRP “Binary” = Boolean: documents are represented as binary vectors of terms: iff term i is present in document x. “Independence”: terms occur in documents independently Different documents can be modeled as same vector.

Binary Independence Model Queries: binary vectors of terms Given query q, for each document d, need to compute p(R|q,d). replace with computing p(R|q,x) where x is vector representing d Interested only in ranking Use odds:

Binary Independence Model Constant for each query Needs estimation Using Independence Assumption: So :

Binary Independence Model Since xi is either 0 or 1: Let Assume, for all terms not occurring in the query (qi=0)

Binary Independence Model All matching terms Non-matching query terms All matching terms All query terms

Only quantity to be estimated Binary Independence Model Only quantity to be estimated for rankings Constant for each query Retrieval Status Value:

So, how do we compute ci’s from our data ? Binary Independence Model All boils down to computing RSV. So, how do we compute ci’s from our data ?

Binary Independence Model Estimating RSV coefficients. For each term i look at the following table: Estimates:

PRP and BIR: The lessons Getting reasonable approximations of probabilities is possible. Simple methods work only with restrictive assumptions: term independence terms not in query do not affect the outcome boolean representation of documents/queries document relevance values are independent Some of these assumptions can be removed

Probabilistic weighting scheme Add 0.5 with each term log function of ratio of probabilities may lead to positive or negative of infinity

Probabilistic weighting scheme [S.Robertson] In general form, the weighting function is : are constants. q : within query frequency (wqf), f : within document frequency (wdf), n : number of documents in the collection indexed by this term, N : total number of documents in the collection, s : number of relevant documents indexed by this term, S : total number of relevant documents, L : normalised document length (i.e. the length of this document divided by the average length of documents in the collection).

Probabilistic weighting scheme [S.Robertson] BM11 Stephen Robertson's BM11 uses the general form for weights, but adds an extra item to the sum of term weights to give the overall document score This term is 0 when L=1 : number of terms in the query (the query length), : another constant.

Probabilistic weighting scheme BM15 BM15 is same as BM11 with term replaced by

Probabilistic weighting scheme BM25 BM25 combines the B11 and B15 with a scaling factor, b, which turns BM15 into BM11 as it moves from 0 to 1 b=1 BM11 b=0 BM15 General Form Default values used:

Bayesian Networks

A possible way could be Bayesian Networks 1. Independent Assumption Independent Can they be dependent? 2. Binary Assumption = 0 or 1 Can it be 0, 1, , …..n ? A possible way could be Bayesian Networks

Bayesian Network (BN) Basics Bayesian Network (BN) is a hybrid system of probability theory and graph theory. Graph theory provides the user to build an interface to model highly interacting variables. On the other hand probability theory ensures the system as a whole is consistent, and provides ways to interface models to data. Modeling of JPD in a compact way by using graph (s) to reflect conditional independence relationships BN BN : Representation Nodes Random variables Arcs Conditional independence (or causality) Undirected arcs  MRF Directed arcs  BNs BN : Advantages Arc from node A to B implies : A causes B Compact representation of JPD of nodes Easier to learn (Fit to data)

P(U | B) = i P(xi | Par(xi ), , Mi , G) G = Global structure (DAG – Directed Acyclic Graph) that contains a node for each variable xi U, i = 1,2, ….., n edges represent the probabilistic dependencies between nodes of variables xi U M = set of local structures {M1 , M2 , …., Mn }. n mappings for n variables. Mi maps each values of {xi , Par(xi )} to parameter . Here Par(xi ) denotes the set of parent nodes of xi in G. The joint probability distribution over U can be decomposed by the global structure G as P(U | B) = i P(xi | Par(xi ), , Mi , G) x4 x2 x3 x1

BN : An Example * By Kevin Murphy True False On Off Yes No Yes No Cloud True False p(C=T) p(C=F) 1/2 Sprinkler C p(S=On) p(S=Off) T 0.1 0.9 F 0.5 On Off Rain Yes No Yes No Wet Grass C p(R=Y) p(R=N) T 0.8 0.2 F S R p(W=Y) p(W=N) On Y 0.99 0.01 On N 0.9 0.1 Off Y Off N 0.0 1.0 p(C,S,R,W) = p(C ) p(S|C) p(R|C,S) p(W|C,S,R) p(C,S,R,W) = p(C ) p(S|C) p(R|C) p(W|S,R)

BN : As Model BN : Notations Hybrid system of probability theory and graph theory Graph theory provides the user to build an interface to model the highly interactive variables Probability theory provides ways to interface model to data BN : Notations : Structure of the Network : Set of parameters that encode local prob. dist. again has two parts G and M G is the global structure (DAG) M is the mapping for n variables (arcs)

BN : Learning Structure : DAG Parameter : CPD Known Structure Unknown Parameter can only be learnt if structure is either known or learnt earlier

BN : Learning Structure is known and full data is observed (nothing missing) Parameter learning MLE, MAP Structure is known and full data is NOT observed (data missing) Parameter learning EM

BN : Learning Learning Structure To find the best DAG that fits the data Objective Function: Constant and indep. of Search Algorithm : NP hard Performance criteria used : MIC, BIC

References (basics) S. E. Roberctson, “The Probability Ranking Principle in IR”, Journal of Documentation, 33, 294-304, 1977. K. S. Jones, S. Walker and S. E. Roberctson, “A Probabilistic model for information retrieval: development and comparative experiments-Part-1”, Information Processing and Management, 36, 779-808, 2000. K. S. Jones, S. Walker and S. E. Roberctson, “A Probabilistic model for information retrieval: development and comparative experiments-Part-2”, Information Processing and Management, 36, 809-840, 2000. S. E. Roberctson and H. Zaragoza, “The Probabilistic relevance framework: BM25 and beyond Ranking Principle in IR”, Foundation and Trends in Information Retrieval, 3, 333-389, 2009. S. E. Roberctson, C. J. Van Rijsbergen and M. F. Porter, “Probabilistic models of indexing and searching”, Information Retrieval Research, Oddy et al. (Ed.s), 36, 35-56, 1981. N. Fuhr and C. Buckley, “A probabilistic learning approach for document indexing”, ACM Tran. On Information systems, 9, 223-248, 1991. H. R. Turtle and W. B. Croft, “Evaluation of an inference network based retrieval model, ACM Tran. On Information systems, 7, 187-222, 1991.

Thanks You Discussions