Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Similar presentations


Presentation on theme: "Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker."— Presentation transcript:

1 Relevance Feedback Hongning Wang CS@UVa

2 What we have learned so far CS@UVaCS6501: Information Retrieval User results Query Rep Doc Rep (Index) Ranker Indexer Doc Analyzer Index Crawler (Query) Evaluation Feedback Indexed corpus Ranking procedure Research attention 2

3 User feedback An IR system is an interactive system Query Ranked documents Information need Inferred information need GAP! Feedback should be CS@UVaCS6501: Information Retrieval3

4 Relevance feedback Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment Retrieval Engine Document collection Results: d 1 3.5 d 2 2.4 … d k 0.5... CS@UVaCS6501: Information Retrieval4

5 Relevance feedback in real systems Google used to provide such functions – Guess why? Relevant Nonrelevant CS@UVaCS6501: Information Retrieval5

6 Relevance feedback in real systems Popularly used in image search systems CS@UVaCS6501: Information Retrieval6

7 Pseudo feedback What if the users are reluctant to provide any feedback Query Retrieval Engine Results: d 1 3.5 d 2 2.4 … d k 0.5... Judgments: d 1 + d 2 + d 3 + … d k -... Document collection Feedback Updated query top k CS@UVaCS6501: Information Retrieval7

8 Basic idea in feedback Query expansion – Feedback documents can help discover related query terms – E.g., query=“information retrieval” Relevant or pseudo-relevant docs may likely share very related words, such as “search”, “search engine”, “ranking”, “query” Expand the original query with such words will increase recall and sometimes also precision CS@UVaCS6501: Information Retrieval8

9 Basic idea in feedback Learning-based retrieval – Feedback documents can be treated as supervision for ranking model update – Will be covered in the lecture of “learning-to-rank” CS@UVaCS6501: Information Retrieval9

10 Feedback techniques Feedback as query expansion – Step 1: Term selection – Step 2: Query expansion – Step 3: Query term re-weighting Feedback as training signal – Will be covered later CS@UVaCS6501: Information Retrieval10

11 Relevance feedback in vector space models General idea: query modification – Adding new (weighted) terms – Adjusting weights of old terms The most well-known and effective approach is Rocchio [Rocchio 1971] CS@UVaCS6501: Information Retrieval11

12 + Illustration of Rocchio feedback q q + + + + ++ + + ++ + + + + + - - - - - - - - - - - - - - - - - - - - - - - - - - - +++ CS@UVaCS6501: Information Retrieval12

13 Standard operation in vector space Formula for Rocchio feedback Original query Rel docs Non-rel docs Parameters Modified query CS@UVaCS6501: Information Retrieval13

14 Rocchio in practice Negative (non-relevant) examples are not very important (why?) Efficiency concern – Restrict the vector onto a lower dimension (i.e., only consider highly weighted words in the centroid vector) Avoid “training bias” – Keep relatively high weight on the original query weights Can be used for relevance feedback and pseudo feedback Usually robust and effective CS@UVaCS6501: Information Retrieval14

15 Feedback in probabilistic models Classic Prob. Model Language Model Rel. doc model NonRel. doc model “Rel. query” model P(D|Q,R=1) P(D|Q,R=0) P(Q|D,R=1) (q 1,d 1,1) (q 1,d 2,1) (q 1,d 3,1) (q 1,d 4,0) (q 1,d 5,0) (q 3,d 1,1) (q 4,d 1,1) (q 5,d 1,1) (q 6,d 2,1) (q 6,d 3,0) Parameter Estimation Initial retrieval: - P(D|Q,R=1): query as rel doc - P(Q|D,R=1): doc as rel query Feedback: - P(D|Q,R=1) can be improved for the current query and future doc - P(Q|D,R=1) can be improved for the current doc and future query CS@UVaCS6501: Information Retrieval15

16 Document generation model CS@UVaCS6501: Information Retrieval Terms occur in doc Terms do not occur in doc documentrelevant(R=1)nonrelevant(R=0) term present A i =1pipi uiui term absent A i =01-p i 1-u i Assumption: terms not occurring in the query are equally likely to occur in relevant and nonrelevant documents, i.e., p t =u t Important tricks 16

17 Robertson-Sparck Jones Model (Robertson & Sparck Jones 76) CS@UVaCS6501: Information Retrieval Two parameters for each term A i : p i = P(A i =1|Q,R=1): prob. that term A i occurs in a relevant doc u i = P(A i =1|Q,R=0): prob. that term A i occurs in a non-relevant doc (RSJ model) How to estimate these parameters? Suppose we have relevance judgments, “+0.5” and “+1” can be justified by Bayesian estimation as priors Per-query estimation! P(D|Q,R=1) can be improved for the current query and future doc 17

18 Feedback in language models Document language model CS@UVaCS6501: Information Retrieval18

19 Feedback in language models Approach – Introduce a probabilistic query model – Ranking: measure distance between query model and document model – Feedback: query model update Q:Back to vector space model? A: Kind of, but in different perspective. CS@UVaCS6501: Information Retrieval19

20 Kullback-Leibler (KL) divergence based retrieval model Query-specific quality, ignored for ranking Query language model, need to be estimated Document language model, we know how to estimate CS@UVaCS6501: Information Retrieval20

21 Background knowledge CS@UVaCS6501: Information Retrieval21

22 Kullback-Leibler (KL) divergence based retrieval model same smoothing strategy CS@UVaCS6501: Information Retrieval22

23 Feedback as model interpolation Query Q Document D Results Feedback Docs F={d 1, d 2, …, d n } Generative model  =0 No feedback  =1 Full feedback Q: Rocchio feedback in vector space model? A: Very similar, but in different interpretation. Key: estimate the feedback model CS@UVaCS6501: Information Retrieval23

24 Feedback in language models CS@UVaCS6501: Information Retrieval24 Feedback documents protect passengers, accidental/malicious harm, crime, rules

25 Generative mixture model of feedback F={d 1, …, d n } P(w|  F ) P(w| C) 1- P(feedback) Background words Topic words = Noise ratio in feedback documents Maximum Likelihood CS@UVaCS6501: Information Retrieval25

26 How to estimate  F ? the 0.2 a 0.1 we 0.01 to 0.02 … flight 0.0001 company 0.00005 … Known Background p(w|C) … accident =? regulation =? passenger=? rules =? … Unknown query topic p(w|  F )=? “airport security” =0.7 =0.3 Feedback Doc(s) Suppose, we know the identity of each word ML Estimator fixed ; but we don’t... CS@UVaCS6501: Information Retrieval26

27 Appeal to EM algorithm Identity (“hidden”) variable: z i  {1 (background), 0(topic)} the paper presents a text mining algorithm the paper... z i 1 0 1 0... Suppose the parameters are all known, what’s a reasonable guess of z i ? - depends on (why?) - depends on p(w|C) and p(w|  F ) (how?) E-step M-step Why in Rocchio we did not distinguish a word’s identity? CS@UVaCS6501: Information Retrieval27

28 A toy example of EM computation Assume =0.5 Expectation-Step: Augmenting data by guessing hidden variables Maximization-Step With the “augmented data”, estimate parameters using maximum likelihood CS@UVaCS6501: Information Retrieval28

29 Example of feedback query model Query: “airport security” – Pesudo feedback with top 10 documents =0.9 =0.7 Open question: how do we handle negative feedback? CS@UVaCS6501: Information Retrieval29

30 Keep this in mind, we will come back the 0.2 a 0.1 we 0.01 to 0.02 … text 0.0001 mining 0.00005 … Known Background p(w|C) … text =? mining =? association =? word =? … Unknown query topic p(w|  F )=? “Text mining” =0.7 =0.3 Observed Doc(s) ML Estimator fixed CS@UVaCS6501: Information Retrieval30

31 What you should know Purpose of relevance feedback Rocchio relevance feedback for vector space models Query model based feedback for language models CS@UVaCS6501: Information Retrieval31


Download ppt "Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker."

Similar presentations


Ads by Google