Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie.

Similar presentations


Presentation on theme: "Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie."— Presentation transcript:

1 Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA @SIGIR 2012, Portland, OR

2 Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 2

3 Average term mismatch rate: 30-40% [Zhao10] A common cause of search failure [Harman03, Zhao10] Frequent user frustration [Feild10] Here: 50% - 300% gain in retrieval accuracy Term Mismatch Problem 3 Relevant docs not returned Web, short queries, stemmed, inlinks included

4 Term Mismatch Problem Example query (TREC 2006 Legal discovery task): approval of (cigarette company) logos on television watched by children 4 approvallogostelevisionwatchedchildren Mismatch94%86%79%90%82% Highest mismatch rate High mismatch rate for all query terms in this query

5 The Traditional Solution: Boolean Conjunctive Normal Form (CNF) Expansion Keyword query: approval of logos on television watched by children Manual CNF (TREC Legal track 2006): (approval OR guideline OR strategy) AND (logos OR promotion OR signage OR brand OR mascot OR marque OR mark) AND (television OR TV OR cable OR network) AND (watched OR view OR viewer) AND (children OR child OR teen OR juvenile OR kid OR adolescent) – Expressive & compact (1 CNF == 100s alternatives) – Highly effective (this work: 50-300% over base keyword) 5

6 The Potential Query: approval logos television watched children 6 50-300% Recall approval 6.49% logos 14.1% television 21.3% watched 10.4% children 18.0% Overall 2.04% The Potential ? Recall +guideline+strategy == 12.8% +promotion+signage... == 19.7% +tv+cable+network == 22.4% +view+viewer == 19.5% +child+teen+kid... == 19.3% == 8.74%

7 CNF Expansion Widely used in practice – Librarians [Lancaster68, Harter86] – Lawyers [Lawlor62, Blair85, Baron07] – Search experts [Clarke95, Hearst96, Mitra98] Less well studied in research – Users do not create effective free form Boolean queries ([Hearst09] cites many studies). Question: How to guide user effort in productive directions – restricting to CNF expansion (to the mismatch problem) – focusing on problem terms when expanding 7 WikiQuery [Open Source IR Workshop] Ad

8 Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 8

9 Diagnostic Intervention Goal – Least amount user effort  near optimal performance – E.g. expand 2 terms  90% of total improvement 9 approval of logos on television watched by children High idf (rare) terms CNF (approval OR guideline OR strategy) AND logos AND television AND (watch OR view OR viewer) AND children (approval OR guideline OR strategy) AND logos AND (television OR tv OR cable OR network) AND watch AND children Query: approval of logos on television watched by children Diagnosis: Expansion:CNF

10 Diagnostic Intervention 10 [ 0.9 (approval logos television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 view 0.4 viewer)] [ 0.9 (approval cigar television watch children) 0.1 (0.4 guideline 0.3 strategy 0.5 tv 0.4 cable 0.2 network) ] Diagnosis: Expansion query Bag of word Expansion:Bag of word Original query High idf (rare) terms approval of logos on television watched by children Query: approval of logos on television watched by children Goal – Least amount user effort  near optimal performance – E.g. expand 2 terms  90% of total improvement approval of logos on television watched by children

11 Diagnostic Intervention Diagnosis methods – Baseline: rareness (high idf) – High predicted term mismatch P(t | R) [Zhao10] Intervention methods – Baseline: bag of word (Relevance Model [Lavrenko01]) w/ manual expansion terms w/ automatic expansion terms – CNF expansion (probabilistic Boolean ranking) E.g. 11 _ (approval OR guideline OR strategy) AND P logos AND P television AND P (watch OR view OR viewer) AND P children

12 Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Evaluation: Simulated user studies Diagnosis & intervention effectiveness 12

13 User Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation (child AND cigar) (child > cigar) (child  teen) (child OR teen) AND cigar Diagnostic Intervention (We Hope to) 13

14 Diagnostic Intervention (We Hope to) User Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation 14 (child AND cigar) (child > cigar) (child OR teen) AND cigar (child  teen)

15 Expert user Keyword query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation Online simulation We Ended up Using Simulation 15 (child  teen) (child OR teen) AND cigar (child OR teen) AND (cigar OR tobacco) Full CNF Offline (child AND cigar) (child > cigar)

16 Diagnostic Intervention Datasets Document sets – TREC 2007 Legal track, 7 million tobacco corp., train on 2006 – TREC 4 Ad hoc track, 0.5 million newswire, train on TREC 3 CNF Queries – TREC 2007 by lawyers, TREC 4 by Univ. Waterloo [ Clarke95 ] – 50 topics each, 2-3 keywords per query Relevance Judgments – TREC 2007 sparse, TREC 4 dense Evaluation measures – TREC 2007 statAP, TREC 4 MAP 16

17 Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 17

18 P(t | R) vs. idf diagnosis Results – Diagnosis 18 Diagnostic CNF expansion on TREC 4 and 2007 8%-50% No Expansion Full Expansion

19 Results – Expansion Intervention CNF vs. bag-of-word expansion 19 P(t | R) guided expansion on TREC 4 and 2007 50% to 300% gain Similar level of gain in top precision

20 Main Points An important problem – term mismatch & a traditional solution New diagnostic intervention approach Simulated user studies Diagnosis & intervention effectiveness 20

21 Conclusions One of the most effective ways to engage user interactions – CNF queries gain 50-300% over keyword baseline. Mismatch diagnosis  simple & effective interactions – Automatic diagnosis saves user effort by 33%. Expansion in CNF easier and better than in bag of word – Bag of word requires balanced expansion of all terms. New research questions: – How to learn from manual CNF queries to improve automatic CNF expansion – get ordinary users to create effective CNF expansions (with the help of interfaces or search tools) 21

22 Acknowledgements Chengtao Wen, Grace Hui Yang, Jin Young Kim, Charlie Clarke, SIGIR Reviewers Helpful discussions & feedback Charlie Clarke, Gordon Cormack, Ellen Voorhees, NIST Access to data NSF grant IIS-1018317 Opinions are solely the authors’. 22

23 END 23

24 The Potential Query: approval logos television watched children 24 logos+promotion+signage+brandAll Mismatch85.9%81.1%80.9%80.3% Recall 14.1%18.9%19.1%19.7% 50-300% Recall Recall logos 14.1% +promotion+signage... == 19.7% approval 6.49% +guideline+strategy == 12.8% television 21.3% +tv+cable+network == 22.4% watched 10.4% +view+viewer == 19.5% children 18.0% +child+teen+kid... == 19.3% Overall 2.04% == 8.74% The Potential ?

25 Failure Analysis (vs. baseline) Diagnosis: 4 topics: wrong P(t | R) prediction, lower MAP Intervention: 3 topics: right diagnosis, but lower MAP 2 of the 3: no manual expansion for the selected term – Users do not always recognize which terms need help. 1 of the 3: wrong expansion terms by expert – “apatite rocks” in nature, not “apatite” chemical – CNF expansion can be difficult w/o looking at retrieval results. 25

26 Failure Analysis -- Comparing diagnosis methods: P(t | R) vs. idf 26 User didn’t expand Wrong expansion Legend query query with unexpanded term(s)

27 Term Mismatch Diagnosis Predicting term recall - P(t | R) [Zhao10] – Query dependent features (model causes of mismatch) Synonyms of term t based on query q’s context How likely these synonyms occur in place of t Whether t is an abstract term How rare t occurs in the collection C – Regression prediction: f i (t, q, C)  P(t | R) – Used in term weighting for long queries Lower predicted P(t | R)  higher likelihood of mismatch  t more problematic 27

28 Online or Offline Study? Controlling confounding variables – Quality of expansion terms – User’s prior knowledge of the topic – Interaction effectiveness & effort Enrolling many users Offline simulations can avoid all these and still make reasonable observations 28

29 Simulation Assumptions Full expansion to simulate partial expansions 3 assumptions about user expansion process – Independent expansion of query terms A1: same set of expansion terms for a given query term, no matter which subset of query terms gets expanded A2: same sequence of expansion terms, no matter … – A3: Re-constructing keyword query from CNF Procedure to ensure vocabulary faithful to that of the original keyword description Highly effective CNF queries ensure reasonable kw baseline 29

30 Results – Level of Expansion More expansion per query term, better retrieval Result of expansion terms being effective Queries with significant gain in retrieval after expanding more than 4 terms: – Topic 84, cigarette sales in James Bond movies 30

31 Expert User Keyword Query Diagnosis system (P(t | R) or idf) Problem query terms User expansion Expansion terms Query formulation (CNF or Keyword) Retrieval engineEvaluation Online simulation 31 (child OR youth) AND (cigar OR tobacco) (child AND cigar) (child --> youth) (child OR youth) AND cigar (child > cigar) Online simulation Offline Full CNF Query Idf (rare) 1.221.921.691.871.40 Most infrequent


Download ppt "Automatic Term Mismatch Diagnosis for Selective Query Expansion Le Zhao and Jamie Callan Language Technologies Institute School of Computer Science Carnegie."

Similar presentations


Ads by Google