Relevance Feedback Users learning how to modify queries Response list must have least some relevant documents Relevance feedback `correcting' the ranks to the user's taste automates the query refinement process Rocchio's method Folding-in user feedback To query vector Add a weighted sum of vectors for relevant documents D+ Subtract a weighted sum of the irrelevant documents D-.
Relevance Feedback (contd.) Pseudo-relevance feedback D+ and D- generated automatically E.g.: Cornell SMART system top 10 documents reported by the first round of query execution are included in D+ typically set to 0; D- not used Not a commonly available feature Web users want instant gratification System complexity Executing the second round query slower and expensive for major search engines
Basic IR Process How to represent text objects What similarity function should be used? How to refine query according to users’ feedbacks?
A Brief Review of Probability Probability space Random variable (discrete, continuous and mixed) Probability distributions (binomial, multinomial, Gaussian) Expectation, Variance Probability independent, conditional independent, conditional probability, Bayesian theorem …
Definition of Probability Experiment: toss a coin twice Sample space: possible outcomes of an experiment S = {HH, HT, TH, TT} Event: a subset of possible outcomes A={HH}, B={HT, TH} Probability of an event : an number assigned to an event Pr(A) Axiom 1: Pr(A) 0 Axiom 2: Pr(S) = 1 Axiom 3: For every sequence of disjoint events Example: Pr(A) = n(A)/N: frequentist statistics
Independence Two events A and B are independent in case Pr(A,B) = Pr(A)Pr(B) Pr(A,B): probability for both A and B Independence Disjoint Pr(A,B) = 0 when A and B are disjoint !
If A and B are events with Pr(A) > 0, the conditional probability of B given A is Example: Conditional Probability Pr(Drug1 = Succ|Women) = ? Pr(Drug2 = Succ|Women) = ? Pr(Drug1 = Succ|Men) = ? Pr(Drug2 = Succ|Men) = ?
Conditional Independence Event A and B are conditionally independent given C in case Pr(A,B|C)=Pr(A|C)Pr(B|C) Example A:success, B:women, C:drug I Will A and B conditional independent from C?
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Derived from the definition of conditional prob.
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Reversing the definition of conditional prob.
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Using the property
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Reversing the definition of conditional prob.
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Normalization Factor
Bayes ’ Rule Suppose that B 1, B 2, … B n form a partition of sample space S: Suppose that Pr(A) > 0. Then Pr(B i |A) ~ Pr( B i ) * Pr(A|B i )
Bayes ’ Rule: Example q: t, h, t, h, t, t C1: h, h, h, t, h,h bias b1 = 5/6 C2: t, t, h, t, h, h bias b2 = 1/2 C3: t, h, t, t, t, h bias b3 = 1/ p(C1) = p(C2) = p(C3) = 1/3 p(q|C1) 5.2*10 -4, p(q|C2) 0.015, p(q|C3) p(C1|q) = ?, p(C2|q) = ?, p(C3|q) = ?
Why Bayes ’ Rule is Important C1: bias = 5/6 C2: bias = 1/2C3: bias = 1/3O: t, h, t, h, t, t ?? ? Observations (O) Conclusion (C) Pr(C|O)
Why Bayes ’ Rule is Important C1: bias = 5/6 C2: bias = 1/2C3: bias = 1/3O: t, h, t, h, t, t ?? ? Observations (O) Conclusion (C) Pr(C|O) Pr(O|C) and Pr(C) It is easy to compute Pr(O|Ci). Bayes’ rule helps us convert the computation of Pr(C|O) to the computation of Pr(O|C) and Pr(C).
Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C)
Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) Prior First, you have prior knowledge about conclusions, i.e., Pr(C)
Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) LikelihoodPrior First, you have prior knowledge about conclusions, i.e., Pr(C) Then, based on your observation O, you estimate the likelihood Pr(O|C) for each possible conclusion C
Bayesian Learning Pr(C|O) ~ Pr(C) * Pr(O|C) LikelihoodPrior Posterior First, you have prior knowledge about conclusions, i.e., Pr(C) Then, based on your observation O, you estimate the likelihood Pr(O|C) for each possible conclusion C Finally, your expectation of conclusion C (i.e., Pr(C|O)) will be shaped by the product of prior and likelihood