Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin.

Similar presentations


Presentation on theme: "Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin."— Presentation transcript:

1 Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin

2 Outline Introduction Related Work Reformulation Strategies Reformulation Effectiveness Metrics Discussion And Conclusion

3 Introduction Query reformulation (refinement) – Users frequently modify a previous search query in hope of retrieving better results Goal: – Look at the types of query reformulation users perform – Evaluate them using effectiveness metrics such as click data

4 Related Work Computer-Generated Reformulations

5 Related Work Query Session Boundary Detection – Automatic new topic identification using multiple linear regression (Information Processing & Management 2006) using time and common words – Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) using hierarchical clustering to find better timeout value

6 Procedure 1. Create taxonomy of query reformulation strategies defined by formal language 2. An unsupervised rule-based classifier in detecting the different query reformulation strategies 3. Analysis of correlations between query reformulation strategies and effectiveness metrics

7 Reformulation Strategies Definitions:  _ : space character  P = {',−,.} : punctuation  λ : empty string  Σ = {[a - z],[0 - 9]} U P : alphabet  c i ∈ Σ : character  w i ∈ Σ ∗ : word  z i ∈ ( Σ U {_} ) ∗ : any string

8 REFORM. 1: WORD REORDER – seattle pizza palace  pizza seattle palace REFORM. 2: WHITESPACE AND PUNCTUATION – wal mart, tomatoprices  walmart tomato prices Reformulation Strategies

9 REFORM. 3: REMOVE WORDS – yahoo stock price  price yahoo REFORM. 4: ADD WORDS – eastlake home  eastlake home price index REFORM. 5: URL STRIPPING – http www.yahoo.com  yahoo

10 Reformulation Strategies REFORM. 6: STEMMING – running over bridges  run over bridge REFORM. 7: FORM ACRONYM – personal computer  pc REFORM. 8: EXPAND ACRONYM – pda  personal digital assistant

11 Reformulation Strategies REFORM. 9: SUBSTRING – is there spyware on my computer  is there spywa REFORM. 10: SUPERSTRING – nevada police rec  nevada police records 2008 REFORM. 11: ABBREVIATION – shortened dict --> short dictionary

12 Reformulation Strategies REFORM. 12: WORD SUBSTITUTION Synonym: easter egg search  easter egg hunt Hyponym: crimson scarf  red scarf Hypernym: personal computer  laptop Meronym: finger  hand Holonym: automobile  wheel REFORM. 13: SPELLING CORRECTION – reformualtion  reformulation

13 Undetected Reformulations Categories of reformulations which are not included in taxonomy: – Semantic Rephrasing how to calculate nutritional values  weight watchers calculator – Multi-Reformulations lane county gabrage  lane county garbage disposal (add words and spelling correction) – Classifier Rule Limitations spelling correction used a Levenshtein edit distance of 2 Wordnet database limitation

14 Undetected Reformulations

15 The Rule-based Classifier

16 Measures For Session Boundary Detection Test data: – 100 users in the AOL query logs for evaluation – Same queries were removed (40.8% of queries) – 9,091 query pairs – 2,483 reformulations and 6,608 new queries (27.3% reformulations)

17 Measures For Session Boundary Detection Hope high precision but not necessarily high recall – interested in inter-reformulation rather than intra- reformulation

18 Reformulation Effectiveness Metrics Data: AOL query logs (released on 08/03/2006) Queries: 36,389,567 – 16,069,421 new queries – 14,861,326 same queries – 3,411,706 reformulations Metrics – Click Pattern – Click URL – Rank Change of Clicked Results

19 Click Pattern

20 (SkipSkip + ClickSkip) v.s (SkipClick + ClickClick) (SkipSkip) v.s (SkipClick)

21 Click URL

22 Rank Change and Median Time between Queries

23 Discussion different reformulation strategies were effective depending on the action from the initial query – Word substitution Skip  Skip Click  Click – spelling correction Skip  Click Click  Skip

24 Limitations Lack of Context Normalized Query Logs Ambiguous Queries – ‘american airlines’, ‘delta airlines’ Search Engine Effects

25 CONCLUSIONS Describes the human side of query reformulation and contributes to our understanding of users in search interaction add/remove words, word substitution, acronym expansion, and spelling correction seem most effective acronym formation and reordering words may be less beneficial to the user


Download ppt "Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin."

Similar presentations


Ads by Google