Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin
Outline Introduction Related Work Reformulation Strategies Reformulation Effectiveness Metrics Discussion And Conclusion
Introduction Query reformulation (refinement) – Users frequently modify a previous search query in hope of retrieving better results Goal: – Look at the types of query reformulation users perform – Evaluate them using effectiveness metrics such as click data
Related Work Computer-Generated Reformulations
Related Work Query Session Boundary Detection – Automatic new topic identification using multiple linear regression (Information Processing & Management 2006) using time and common words – Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) using hierarchical clustering to find better timeout value
Procedure 1. Create taxonomy of query reformulation strategies defined by formal language 2. An unsupervised rule-based classifier in detecting the different query reformulation strategies 3. Analysis of correlations between query reformulation strategies and effectiveness metrics
Reformulation Strategies Definitions: _ : space character P = {',−,.} : punctuation λ : empty string Σ = {[a - z],[0 - 9]} U P : alphabet c i ∈ Σ : character w i ∈ Σ ∗ : word z i ∈ ( Σ U {_} ) ∗ : any string
REFORM. 1: WORD REORDER – seattle pizza palace pizza seattle palace REFORM. 2: WHITESPACE AND PUNCTUATION – wal mart, tomatoprices walmart tomato prices Reformulation Strategies
REFORM. 3: REMOVE WORDS – yahoo stock price price yahoo REFORM. 4: ADD WORDS – eastlake home eastlake home price index REFORM. 5: URL STRIPPING – http yahoo
Reformulation Strategies REFORM. 6: STEMMING – running over bridges run over bridge REFORM. 7: FORM ACRONYM – personal computer pc REFORM. 8: EXPAND ACRONYM – pda personal digital assistant
Reformulation Strategies REFORM. 9: SUBSTRING – is there spyware on my computer is there spywa REFORM. 10: SUPERSTRING – nevada police rec nevada police records 2008 REFORM. 11: ABBREVIATION – shortened dict --> short dictionary
Reformulation Strategies REFORM. 12: WORD SUBSTITUTION Synonym: easter egg search easter egg hunt Hyponym: crimson scarf red scarf Hypernym: personal computer laptop Meronym: finger hand Holonym: automobile wheel REFORM. 13: SPELLING CORRECTION – reformualtion reformulation
Undetected Reformulations Categories of reformulations which are not included in taxonomy: – Semantic Rephrasing how to calculate nutritional values weight watchers calculator – Multi-Reformulations lane county gabrage lane county garbage disposal (add words and spelling correction) – Classifier Rule Limitations spelling correction used a Levenshtein edit distance of 2 Wordnet database limitation
Undetected Reformulations
The Rule-based Classifier
Measures For Session Boundary Detection Test data: – 100 users in the AOL query logs for evaluation – Same queries were removed (40.8% of queries) – 9,091 query pairs – 2,483 reformulations and 6,608 new queries (27.3% reformulations)
Measures For Session Boundary Detection Hope high precision but not necessarily high recall – interested in inter-reformulation rather than intra- reformulation
Reformulation Effectiveness Metrics Data: AOL query logs (released on 08/03/2006) Queries: 36,389,567 – 16,069,421 new queries – 14,861,326 same queries – 3,411,706 reformulations Metrics – Click Pattern – Click URL – Rank Change of Clicked Results
Click Pattern
(SkipSkip + ClickSkip) v.s (SkipClick + ClickClick) (SkipSkip) v.s (SkipClick)
Click URL
Rank Change and Median Time between Queries
Discussion different reformulation strategies were effective depending on the action from the initial query – Word substitution Skip Skip Click Click – spelling correction Skip Click Click Skip
Limitations Lack of Context Normalized Query Logs Ambiguous Queries – ‘american airlines’, ‘delta airlines’ Search Engine Effects
CONCLUSIONS Describes the human side of query reformulation and contributes to our understanding of users in search interaction add/remove words, word substitution, acronym expansion, and spelling correction seem most effective acronym formation and reordering words may be less beneficial to the user