Presentation is loading. Please wait.

Presentation is loading. Please wait.

2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai.

Similar presentations


Presentation on theme: "2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai."— Presentation transcript:

1 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai ( 翟成祥 ) Department of Computer Science Graduate School of Library & Information Science Institute for Genomic Biology, Statistics University of Illinois, Urbana-Champaign http://www-faculty.cs.uiuc.edu/~czhai, czhai@cs.uiuc.edu

2 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 2 What is Personalized Search? Use more user information than the user’s query in retrieval –“more information” = user’s interaction history  Implicit feedback –“more information” = user’s judgments or user’s answer to clarification questions  explicit feedback Personalization can be done in multiple ways: –Personalize the collection –Personalize ranking –Personalize result presentation –… Personalized search = user modeling + model exploitation

3 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 3 Why Personalized Search? The more we know about the user’s information need, the more likely we can get relevant documents, thus we should know as much as we can about the users When a query doesn’t work well, personalized search would be extremely helpful.

4 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 4 Client-Side vs. Server-Side Personalization Server-Side (most work, including commercial products): –Sees global information (all documents, all users) – Limited user information (can’t see activities outside search results) –Privacy issue Client-Side (UCAIR): –More information about the user, thus more accurate user modeling (complete interaction history + other user activities) –More scalable (“distributed personalization”) –Alleviate the problem of privacy Combination of server-side and client-side? How?

5 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 5 Outline A framework for optimal interactive retrieval Implicit feedback (no user effort) –Within a search session –For improving result organization Explicit feedback (with user effort) –Term feedback –Active feedback Improving search result organization

6 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 6 1. A Framework for Optimal Interactive Retrieval [Shen et al. 05]

7 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 7 IR as Sequential Decision Making UserSystem A 1 : Enter a query Which documents to present? How to present them? R i : results (i=1, 2, 3, …) Which documents to view? A 2 : View document Which part of the document to show? How? R’: Document content View more? A 3 : Click on “Back” button (Information Need) (Model of Information Need)

8 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 8 Retrieval Decisions User U: A 1 A 2 … … A t-1 A t System: R 1 R 2 … … R t-1 Given U, C, A t, and H, choose the best R t from all possible responses to A t History H={(A i,R i )} i=1, …, t-1 Document Collection C Query=“Jaguar” All possible rankings of C The best ranking for the query Click on “Next” button All possible rankings of unseen docs The best ranking of unseen docs R t  r(A t ) R t =?

9 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 9 A Risk Minimization Framework User: U Interaction history: H Current user action: A t Document collection: C Observed All possible responses: r(A t )={r 1, …, r n } User Model M=(S,  U …) Seen docs Information need L(r i,A t,M)Loss Function Optimal response: r* (minimum loss) ObservedInferred Bayes risk

10 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 10 Approximate the Bayes risk by the loss at the mode of the posterior distribution Two-step procedure –Step 1: Compute an updated user model M* based on the currently available information –Step 2: Given M*, choose a response to minimize the loss function A Simplified Two-Step Decision-Making Procedure

11 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 11 Optimal Interactive Retrieval User A1A1 UC M* 1 P(M 1 |U,H,A 1,C) L(r,A 1,M* 1 ) R1R1 A2A2 L(r,A 2,M* 2 ) R2R2 M* 2 P(M 2 |U,H,A 2,C) A3A3 … Collection IR system

12 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 12 Refinement of Risk Minimization r(A t ): decision space (A t dependent) –r(A t ) = all possible subsets of C (document selection) –r(A t ) = all possible rankings of docs in C –r(A t ) = all possible rankings of unseen docs –r(A t ) = all possible subsets of C + summarization strategies M: user model –Essential component:  U = user information need –S = seen documents –n = “Topic is new to the user” L(R t,A t,M): loss function –Generally measures the utility of R t for a user modeled as M –Often encodes retrieval criteria (e.g., using M to select a ranking of docs) P(M|U, H, A t, C): user model inference –Often involves estimating a unigram language model  U

13 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 13 Case 1: Context-Insensitive IR –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –p(M|U,H,At,C)=p(  U |Q)

14 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 14 Case 2: Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

15 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 15 Case 3: General Implicit Feedback –A t =“enter a query Q” or “Back” button, “Next” button –r(A t ) = all possible rankings of unseen docs in C –M= (  U, S), S= seen documents –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H)

16 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 16 Case 4: User-Specific Result Summary –A t =“enter a query Q” –r(A t ) = {(D,  )}, D  C, |D|=k,  {“snippet”,”overview”} –M= (  U, n), n  {0,1} “topic is new to the user” –p(M|U,H,At,C)=p(  U,n|Q,H), M*=(  *, n*) n*=1n*=0  i =snippet 10  i =overview 01 Choose k most relevant docs If a new topic (n*=1), give an overview summary; otherwise, a regular snippet summary

17 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 17 What You Should Know Disadvantages and advantages of client-side vs. server-side personalization The optimal interactive retrieval framework provides a general way to model personalized search –Maximum user modeling –Immediate benefit (“eager feedback”) Personalization can be potentially done for all the components and steps in a retrieval system

18 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 18 2. Implicit Feedback [Shen et al. 05, Tan et al. 06]

19 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 19 “Jaguar” Example Car Software Animal Suppose we know: 1.Previous query = “racing cars” vs. “Apple OS” 2.“car” occurs far more frequently than “Apple” in pages browsed by the user in the last 20 days 3. User just viewed an “Apple OS” document

20 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 20 How can we exploit such implicit feedback information that already naturally exists to improve ranking accuracy?

21 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 21 Risk Minimization for Implicit Feedback –A t =“enter a query Q” –r(A t ) = all possible rankings of docs in C –M=  U, unigram language model (word distribution) –H={previous queries} + {viewed snippets} –p(M|U,H,At,C)=p(  U |Q,H) Need to estimate a context-sensitive LM

22 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 22 Scenario 1: Use Information in one Session [Shen et al. 05] Q2Q2 C 2 ={C 2,1, C 2,2,C 2,3, … } … C 1 ={C 1,1, C 1,2,C 1,3, …} User Clickthrough QkQk Q1Q1 User Query e.g., Apple software e.g., Apple - Mac OS X Apple - Mac OS X The Apple Mac OS X product page. Describes features in the current version of Mac OS X, … e.g., Jaguar User Model: Query HistoryClickthrough

23 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 23 Method1: Fixed Coeff. Interpolation (FixInt) QkQk Q1Q1 Q k-1 … C1C1 C k-1 … Average user query history and clickthrough Linearly interpolate history models Linearly interpolate current query and history model

24 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 24 Method 2: Bayesian Interpolation (BayesInt) Q1Q1 Q k-1 … C1C1 C k-1 … Average user query and clickthrough history Intuition: trust the current query Q k more if it’s longer QkQk Dirichlet Prior

25 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 25 Method 3: Online Bayesian Updating (OnlineUp) QkQk C2C2 Q1Q1 Intuition: incremental updating of the language model C1C1 Q2Q2

26 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 26 Method 4: Batch Bayesian Update (BatchUp) C2C2 … C k-1 Intuition: all clickthrough data are equally useful QkQk Q1Q1 C1C1 Q2Q2

27 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 27 TREC Style Evaluation Data collection: TREC AP88-90 Topics: 30 hard topics of TREC topics 1-150 System: search engine + RDBMS Context: Query and clickthrough history of 3 participants ( http://sifaka.cs.uiuc.edu/ir/ucair/QCHistory.zip )

28 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 28 Example of a Hard Topic 2 (283 relevant docs in 242918 documents) Acquisitions Document discusses a currently proposed acquisition involving a U.S. company and a foreign company. To be relevant, a document must discuss a currently proposed acquisition (which may or may not be identified by type, e.g., merger, buyout, leveraged buyout, hostile takeover, friendly acquisition). The suitor and target must be identified by name; the nationality of one of the companies must be identified as U.S. and the nationality of the other company must be identified as NOT U.S.

29 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 29 Performance of the Hard Topic Q1: acquisition u.s. foreign company MAP: 0.004; Pr@20: 0.000 Q2: acquisition merge takeover u.s. foreign company MAP: 0.026; Pr@20: 0.100 Q3: acquire merge foreign abroad international MAP: 0.004; Pr@20: 0.050 Q4: acquire merge takeover foreign european japan MAP: 0.027; Pr@20: 0.200

30 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 30 Overall Effect of Search Context Query FixInt (  =0.1,  =1.0) BayesInt (  =0.2, =5.0) OnlineUp (  =5.0, =15.0) BatchUp (  =2.0, =15.0) MAPpr@20MAPpr@20MAPpr@20MAPpr@20 Q3Q3 0.04210.14830.04210.14830.04210.14830.04210.1483 Q 3 +H Q +H C 0.07260.19670.08160.20670.07060.17830.08100.2067 Improve 72.4%32.6%93.8%39.4%67.7%20.2%92.4%39.4% Q4Q4 0.05360.19330.05360.19330.05360.19330.05360.1933 Q 4 +H Q +H C 0.08910.22330.09550.23170.07920.20670.09500.2250 Improve 66.2%15.5%78.2%19.9%47.8%6.9%77.2%16.4% Short-term context helps system improve retrieval accuracy BayesInt better than FixInt; BatchUp better than OnlineUp

31 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 31 Using Clickthrough Data Only QueryMAPpr@20 Q3Q3 0.04210.1483 Q 3 +H C 0.07660.2033 Improve81.9%37.1% Q4Q4 0.05360.1930 Q 4 +H C 0.09250.2283 Improve72.6%18.1% BayesInt (  =0.0, =5.0) Clickthrough is the major contributor 13.9% 67.2%Improve 0.1880.0739Q 4 +H C 0.1650.0442Q4Q4 42.4%99.7%Improve 0.1780.0661Q 3 +H C 0.1250.0331Q3Q3 pr@20MAPQuery Performance on unseen docs -4.1%15.7%Improve 0.18500.0620Q 4 +H C 0.19300.0536Q4Q4 23.0%23.8%Improve 0.18200.0521Q 3 +H C 0.14830.0421Q3Q3 pr@20MAPQuery Snippets for non-relevant docs are still useful!

32 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 32 Sensitivity of BatchUp Parameters BatchUp is stable with different parameter settings Best performance is achieved when  =2.0; =15.0

33 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 33 A User Study of Implicit Feedback UCAIR toolbar (a client-side personalized search agent using implicit feedback) is used in this study 6 participants use UCAIR toolbar to do web search 32 topics are selected from TREC Web track and Terabyte track Participants evaluate explicitly the relevance of top 30 search results from Google and UCAIR

34 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 34 UCAIR Outperforms Google: Precision at N Docs Ranking Method prec@5prec@10prec@20prec@30 Google0.5380.4720.3770.308 UCAIR0.5810.5560.4530.375 Improvement8.0%17.8%20.2%21.8% More user interactions  better user models  better retrieval accuracy

35 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 35 UCAIR Outperforms Google: PR Curve

36 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 36 Scenario 2: Use the Entire History of a User [Tan et al. 06] Challenge: Search log is noisy –How do we handle the noise? –Can we still improve performance? Solution: –Assign weights to the history data (Cosine, EM algorithm) Conclusions: –All the history information is potentially useful –Most helpful for recurring queries –History weighting is crucial (EM better than Cosine)

37 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 37 Algorithm Illustration

38 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 38 Sample Results: EM vs. Baseline History is helpful and weighting is important

39 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 39 Sample Results: Different Weighting Methods EM is better than Cosine; hybrid is feasible

40 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 40 What You Should Know All search history information helps Clickthrough information is especially useful; it’s useful even when the actual document is non- relevant Recurring queries get more help, but fresh queries can also benefit from history information

41 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 41 3. Explicit Feedback [Shen et al. 05, Tan et al. 07]

42 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 42 Term Feedback for Information Retrieval with Language Models Bin Tan, Atulya Velivelli, Hui Fang, ChengXiang Zhai University of Illinois at Urbana-Champaign

43 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 43 Problems with Doc-Based Feedback A relevant document may contain non-relevant parts None of the top-ranked documents is relevant User indirectly controls the learned query model

44 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 44 What about Term Feedback? Present a list of terms to a user and asks for judgments  –More direct contribution to estimating  q –Works even when no relevant document on top Challenges: –How do we select terms to present to a user? –How do we exploit term feedback to improve our estimate of  q ?

45 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 45 Improve  q with Term Feedback Query Retrieval Engine d 1 3.5 d 2 2.4... User Document collection Term Feedback Models Improved estimate of  q Term Judgments Term Extraction Terms

46 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 46 Feedback Term Selection General (old) idea: –The original query is used for an initial retrieval run –Feedback terms are selected from top N documents New idea: –Model subtopics –Select terms to represent every subtopic well –Benefits Avoid bias in term feedback Infer relevant subtopics, thus achieve subtopic feedback

47 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 47 ored expl un User-Guided Query Model Refinement User Explored area Document space Inferred topic preference direction Most promising new topic areas to move to T1 T2 T3 t11 t12 t21 t22 t31 t32 … ++-+--++-+--

48 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 48 Collaborative Estimation of  q qq d1d2d3…dNd1d2d3…dN top N docs ranked by D(  q ||  d ) t1t2t3…tLt1t2t3…tL judged feedback terms C 1 :0.2 C 2 :0.1 C 3 :0.3 … C K :0.1 weighted clusters q rank docs by D(  q ’ ||  d ) q’q’ q refined query model t1t2t3…tLt1t2t3…tL feedback terms C1C2C3…CKC1C2C3…CK Subtopic clusters P(w|  1 ) P(w|  2 ) P(w|  k ) TFB P(t 1 |  TFB )=0.2 … P(t3|  TFB )=0.1 … Original  q P(w|  q ) CFB P(w|  CFB )=0.2*P(w|  1 )+ 0.1*P(w|  2 )+ … TCFB

49 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 49 Discovering Subtopic Clusters with PLSA [Hofmann 99, Zhai et al. 04] Document d Theme  1 Theme  k Theme  2 … Background B traffic 0.3 railway 0.2.. Tunnel 0.1 fire 0.05 smoke 0.02.. tunnel 0.2 amtrack 0.1 train 0.05.. Is 0.05 the 0.04 a 0.03.. kk 11 22 B B 1 - B  d,1  d, k  d,2 W “Generating” word w in doc d in the collection Query = “transportation tunnel disaster” Maximum Likelihood Estimator (EM Algorithm)

50 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 50 Selecting Representative Terms –Original query terms excluded –Shared terms assigned to most likely clusters Cluster 1Cluster 2Cluster 3 tunnel transport 1. traffic 2. railwai 3. harbor 4. rail 5. bridg 6. kilomet truck 7. construct …… 0.0768 0.0364 0.0206 0.0186 0.0146 0.0140 0.0139 0.0136 0.0133 0.0131 1. tunnel 2. fire 3. truck 4. french 5. smoke 6. car 7. italian 8. firefight 9. blaze 10. blanc …… 0.0935 0.0295 0.0236 0.0220 0.0157 0.0154 0.0152 0.0144 0.0127 0.0121 tunnel 1. transport 2. toll 3. amtrak 4. train 5. airport 6. turnpik 7. lui 8. jersei 9. pass …… 0.0454 0.0406 0.0166 0.0153 0.0129 0.0122 0.0105 0.0095 0.0093 0.0087 L

51 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 51 User Interface for Term Feedback Cluster 1Cluster 3Cluster 2Cluster 1Cluster 2Cluster 3Cluster 1Cluster 2

52 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 52 Experiment Setup TREC 2005 HARD Track AQUAINT corpus (3GB) 50 hard query topics NIST assessors spend up to 3 min on each topic providing feedback using Clarification Form (CF) Submitted CFs: 1x48, 3x16, 6x8 Baseline: KL-divergence retrieval method with 5 pseudo- feedback docs 48 terms generated from top 60 docs of baseline

53 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 53 Retrieval Accuracy Comparison 1C: 1x48 3C: 3x16 6C: 6x8 (except for CFB1C) Baseline < TFB < CFB < TCFB CFB1C: user feedback plays no role Base -line TFBCFBTCFB 1C3C6C1C3C6C1C3C6C MAP0.2190.288 0.2780.2540.3050.3010.2740.3090.304 PR@30 0.3930.4670.4750.4570.3990.4800.4730.4310.4910.473 RR4339475347624740460049074872476749474906 MAP% 0%31.5% 26.9%16.0%39.3%37.4%25.1%41.1%38.8%

54 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 54 Reduction of # Terms Presented TFBCFBTCFB #terms1C3C6C3C6C3C6C 60.2450.2400.2270.279 0.2810.274 120.261 0.2420.2990.2860.2970.281 180.2750.2740.2560.3010.2820.3000.286 240.2760.2810.2650.3030.2920.3050.292 300.2800.2850.2700.3040.2960.3070.296 360.2820.2880.2720.3070.2970.3090.297 420.2830.2880.2750.3060.2980.3090.300 480.288 0.2780.3050.3010.3090.303 #terms=12: 1x12/3x4/6x2

55 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 55 Clarification Form Completion Time More than half completed in just 1 min

56 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 56 Term Relevance Judgment Quality CF Type1x483x166x8 #checked terms14.813.311.2 #rel. terms15.012.611.2 #rel. checked terms7.96.95.9 precision0.5340.5190.527 recall0.5260.5480.527 [Zaragoza et al. 04] Term relevance

57 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 57 Had the User Checked all “ Relevant Terms ”… TFB10.288 -> 0.354 TFB30.288 -> 0.354 TFB60.278 -> 0.346 CFB30.305 -> 0.325 CFB60.301 -> 0.326 TCFB30.309 -> 0.345 TCFB60.304 -> 0.341

58 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 58 Comparison to Relevance Feedback # FB DocsMAPPr@30RelRet 50.3020.5864779 100.3450.6704916 200.3890.7725004 TCFB3C0.3090.4914947 MAP equivalence: TCFB3C = Rel FB with 5 docs

59 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 59 Term Feedback Help Difficult Topics No rel docs In top 5

60 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 60 Related Work Early work: [Harman 88], [Spink 94], [Koenemann & Belkin 96]… More recent: [Ruthven03], [Anick03], … Main differences: –Language model –Consistently effective

61 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 61 Conclusions and Future Work A novel way of improving query model estimation through term feedback –active feedback based on subtopics –user-system collaboration –achieves large performance improvement over non-feedback baseline with small amount of user effort –can compete with relevance feedback, esp. in a situation when the latter is unable to help To explore more complex interaction processes – Combination of term feedback and relevance feedback – Incremental feedback

62 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 62 What You Should Know Term feedback can be quite useful when the query is difficult and relevance feedback isn’t feasible Language models can address weighting well in term feedback

63 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 63 Active Feedback in Ad Hoc IR Xuehua Shen, ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

64 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 64 Normal Relevance Feedback (RF) Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Top K Results d 1 3.5 d 2 2.4 … d k 0.5 User Document Collection

65 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 65 Document Selection in RF Feedback Judgments: d 1 + d 2 - … d k - Query Retrieval System Which k docs to present ? User Document Collection Can we do better than just presenting top-K? (Consider diversity…)

66 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 66 Active Feedback (AF) An IR system actively selects documents for obtaining relevance judgments If a user is willing to judge K documents, which K documents should we present in order to maximize learning effectiveness?

67 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 67 Outline Framework and specific methods Experiment design and results Summary and future work

68 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 68 A Framework for Active Feedback Consider active feedback as a decision problem –Decide K documents (D) for relevance judgment Formalize it as an optimization problem –Optimize the expected learning benefits (loss) by requesting relevance judgments on D from the user Consider two cases of loss function according to the interaction between documents –Independent loss: value of each judged document for learning is independent on each other –Dependent loss

69 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 69 Independent Loss Rank docs according to expected loss of each individual doc and then select top K docs Top K Constant loss for any relevant and non-relevant docs Smaller loss for relevant docs A doc is more useful for learning if the prediction of relevance is more uncertain Uncertainty Sampling

70 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 70 Dependent Loss Heuristics: consider relevance first, then diversity First select Top N docs of baseline retrieval Cluster N docs into K clusters K Cluster Centroid MMR … Model diversity and relevance Gapped Top K Pick one doc every G+1 docs

71 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 71 Illustration of Three AF Methods Top-K (normal feedback) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … Gapped Top-K K-Cluster Centroid Aiming at high diversity …

72 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 72 Evaluating Active Feedback Query Select K Docs K docs Judgment File + Judged Docs ++ + - - Initial Results No Feedback (Top-k, Gapped, Clustering) Feedback Results

73 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 73 Retrieval Methods (Lemur toolkit) Query Q Document D Results KL Divergence Feedback Docs F={d 1, …, d n } Active Feedback Default parameter settings unless otherwise stated Mixture Model Feedback Only learn from relevant docs

74 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 74 Comparison of Three AF Methods CollectionActive FB Method #AFRe l per topic Include judged docs MAPPr@10doc HARD Baseline/0.3010.501 Pseudo FB/0.3200.515 Top-K3.00.3250.527 Gapped2.60.3300.548 Clustering2.40.3320.565 AP88-89 Baseline/0.2010.326 Pseudo FB/0.2180.343 Top-K2.20.2280.351 Gapped1.50.2340.389 Clustering1.30.2370.393 Top-K is the worst! Clustering uses fewest relevant docs

75 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 75 Appropriate Evaluation of Active Feedback New DB (AP88-89, AP90) Original DB with judged docs (AP88-89, HARD) + - + Original DB without judged docs + - + Can’t tell if the ranking of un-judged documents is improved Different methods have different test documents See the learning effect more explicitly But the docs must be similar to original docs

76 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 76 Retrieval Performance on AP90 Dataset Metho d Baselin e Pseudo FB Top KGapped Top K K Cluster Centroid MAP0.2030.220 0.2220.223 pr@100.2950.3170.3210.3260.325 Top-K is consistently the worst!

77 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 77 Feedback Model Parameter  Factor  parameter can amplify the effect of feedback

78 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 78 Summary Introduce the active feedback problem Propose a preliminary framework and three methods (Top-k, Gapped Top-k, Clustering) Study the evaluation strategy Experiment results show that –Presenting the top-k is not the best strategy –Clustering can generate fewer, higher quality feedback examples

79 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 79 Future Work Explore other methods for active feedback Develop a general framework Combine pseudo feedback and active feedback

80 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 80 What You Should Know What is active feedback Top-k isn’t a good strategy for active feedback; diversifying the results is beneficial

81 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 81 Learn from Web Search Logs to Organize Search Results Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign

82 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 82 Motivation Search engine utility = Ranking accuracy + Result presentation + … Lots of research on improving ranking accuracy Relatively little work on improving result presentation What’s the best way to present search results?

83 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 83 Ranked List Presentation

84 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 84 However, when the query is ambiguous… Query = Jaguar Car Software Animal Unlikely optimal for any particular user!

85 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 85 Cluster Presentation (e.g., [Hearst et al. 96, Zamir & Etzioni 99]) From http://vivisimo.com

86 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 86 Deficiencies of Data-Driven Clustering Different users may prefer different ways to group the results. E.g., query=“area codes” –“phone codes” vs “zip codes” –“international codes” vs “local codes” Cluster labels may not be informative to help a user choose the right cluster. E.g., label = “panthera onca” Need to group search results from a user’s perspective

87 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 87 Our Idea: User-Oriented Clustering User-oriented clustering: –Partition search results according to the aspects interesting to users –Label each aspect with words meaningful to users Exploit search logs to do both –Partitioning Learn “interesting aspects” of an arbitrary query Classify results into these aspects –Labeling Learn “representative queries” of the identified aspects Use representative queries to label the aspects

88 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 88 Rest of the Talk General Approach Technical Details Experiment Results

89 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 89 Illustration of the General Idea query =“ car” car rental car pricing used car hertz car rental car accidents car audio car crash … Retrieval (over log) 1.{car rental, hertz car rental…} 2.{car pricing, used car,…} 3.{car accidents, car crash, …} 4.{car audio, car stereo, …} 5.… Clustering www.avis.com www.hertz.com www.cars.com … Results 1.{car rental, hertz car rental,…} www.avis.com www.hertz.com … 2.{car pricing, used car,…} www.cars.com... 3.{car accidents, car crash,…} … Categorization Car rental Used cars Car accidents

90 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 90 Query Search History Collection Query pseudo doc1 Query pseudo doc2 Clustering Query Aspect 1 Query Aspect k … Results … Retrieval Similar Queries Categorization … Labeling … Label 1 Label 2 User-Oriented Clustering via Log Mining

91 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 91 Query Search History Collection Query pseudo doc1 Query pseudo doc2 … Results … Implementation Strategy Retrieval Similar Queries BM25 Lemur query+clicked snippets Pooling identical queries Clustering Query Aspect 1 Query Aspect k Star clustering Labeling … Label 1 Label 2 Center query Categorization … Centroid-based

92 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 92 More Details: Search Engine Log Record user activities (queries, clicks) Reflect user information needs Valuable resources for learning to improve search engine utility sessions

93 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 93 Recover snippets More Details: Build History Collection Pooling “car rental” … “jaguar car”… … session n For every query (e.g., car rental) “car rental”“Car rental, rental cars” …,“National car rental” …, … “jaguar car”“jaguar, car, parts”…, … …… History Collection Clicked urls U1, U2, …

94 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 94 More Details: Star Clustering [Aslam et al. 04] 6 2 4 1 1 2 1 2 3 2 1 1. Form a similarity graph -TF-IDF weight vectors -Cosine similarity -Thresholding 2. Iteratively identify a “star center” and its “satellites” “Star center” query serves as a label for a cluster

95 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 95 Centroid-Based Classifier Represent each query doc as a term vector (TF-IDF weighting) Compute a centroid vector for each cluster/aspect Assign a new result vector to the cluster whose centroid is the closest to the new vector Aspect 1 Aspect 2 Aspect 3

96 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 96 Evaluation: Data Preparation Log data: May 2006 search log released by Microsoft Live Labs First 2/3 to simulate history; last 1/3 to simulate future queries History collection (169,057 queries;3.5 clicked URLs/query) “Future” collection is further split into two sets for validation and testing Test case: a session with more than 4 clicks and at least 100 matching queries in history (172 and 177 test cases in two test sets) Use clicked URLs to approximate relevant documents [Joachims, 2002]

97 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 97 Experiment Design Baseline method –the original search engine ranking Cluster-based method –Traditional method solely based on content Log-based method –Our method based on search logs Evaluation –Based on a user’s perceived ranking accuracy –A user is assumed to first view the cluster with largest number of relevant docs –Measures Precision@5 documents (P@5) Mean Reciprocal Rank (MRR) of the first relevant document

98 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 98 Overall Comparison Log-based >> baseline Log-based >> cluster-based

99 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 99 Diversity Analysis Do queries with diverse results benefit more? Bin by size ratios of the two largest clusters Queries with diverse results benefit more Primary/Secondary cluster size ratio more diverse

100 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 100 Query Difficulty Analysis Do difficult queries benefit more? Bin by Mean Average Precisions (MAPs) Difficult queries benefit more more difficult

101 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 101 Effectiveness of Learning more history information P@5

102 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 102 Sample Results: Partitioning Log-based method and regular clustering partition the results differently Query: “area codes” “International codes” or “local codes” “Phone codes” or “zip codes”

103 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 103 Sample Results: Labeling Query: apple Query: jaguar

104 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 104 Related Work Categorization-based (e.g.,Chen & Dumais 00]) –Labels are meaningful to users –Partitioning may not match a user’s perspective Faceted search and browsing (e.g.,[Yee et al. 03]) –Labels are meaningful to users –Partitioning is generally useful for a user –Need faceted metadata Rather than pre-specify fixed categories/metadata, we learn them dynamically from search log

105 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 105 Conclusions and Future Work Proposed a general strategy for organizing search results based on interesting topic aspects learned from search log Experimented with a way to implement the strategy Results show that –User-oriented clustering is better than data-oriented clustering –Particularly help difficult topics and topics with diverse results Future directions –Mixture of data-driven and user-driven clustering –Study user interaction/feedback with cluster interface –Use general search log to “smooth” personal search log –Query-sensitive result presentation

106 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 106 What You Should Know Search history for multiple users can be combined to benefit a particular user’s search Difference between user-oriented result organization and data-oriented organization and their advantages and disadvantages How to evaluate clustering results indirectly based on the perceived precision

107 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 107 Future Research Directions in Personalized Search Robust personalization: –Optimization framework for progressive personalization (gradually become more and more aggressive in using context/history information) More in-depth analysis of implicit feedback information –Why does a user add a query term and then drop it after viewing a particular document? More computer-user dialogue to help bridging the vocabulary gap Generally, aim at helping improve performance for difficult topics What’s the right architecture for supporting personalized search?

108 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 108 Roadmap This lecture: Personalized search (understanding users) Next lecture: NLP for IR (understanding documents)

109 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 109 User-Centered Search Engine “java” Personalized search agent WEB Search Engine Email Search Engine Desktop Files Personalized search agent “java”... Viewed Web pages Query History A search agent can know about a particular user very well

110 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 110 User-Centered Adaptive IR (UCAIR) A novel retrieval strategy emphasizing – user modeling (“user-centered”) – search context modeling (“adaptive”) – interactive retrieval Implemented as a personalized search agent that –sits on the client-side (owned by the user) –integrates information around a user (1 user vs. N sources as opposed to 1 source vs. N users) –collaborates with each other –goes beyond search toward task support

111 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 111 A Simple UCAIR System (UCAIR Toolbar) Search Engine Search History Log (e.g.,past queries, clicked results) Query Modification Result Re-Ranking User Modeling Result Buffer UCAIR User query results clickthrough…

112 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 112 Challenges in UCAIR What’s an appropriate retrieval framework for UCAIR? How do we optimize retrieval performance in interactive retrieval? How do we develop robust and accurate retrieval models to exploit user information and search context? How do we evaluate UCAIR methods? ……

113 2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 113 The Rest of the Talk Part I: A risk minimization framework for UCAIR Part II: Improve document ranking with implicit feedback Part III: User-specific summarization of search results Joint work with Xuehua Shen, Bin Tan, and Qiaozhu Mei


Download ppt "2008 © ChengXiang Zhai Dragon Star Lecture at Beijing University, June 21-30, 2008 1 龙星计划课程 : 信息检索 Personalized Search & User Modeling ChengXiang Zhai."

Similar presentations


Ads by Google