Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011.

Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011

Background Query misspellings are common (>10%) 2 Typing quickly exxit mis[s]pell Inconsistent rules concieve conceirge Keyboard adjacency imporyant Ambiguous word breaking silver_light New words kinnect Typing quickly exxit mis[s]pell Inconsistent rules concieve conceirge Keyboard adjacency imporyant Ambiguous word breaking silver_light New words kinnect

Spelling Correction Goal: Help users formulate their intent 3 Offline: After entering query Online: While entering query Inform users of potential errors Help express information needs Reduce effort to input query Online: While entering query Inform users of potential errors Help express information needs Reduce effort to input query

Motivation 4 Existing search engines offer limited online spelling correction Offline Spelling Correction (see paper) Model: (Weighted) edit distance Data: Query similarity, click log, … Auto Completion with Error Tolerance (Chaudhuri & Kaushik, 09) Poor model for phonetic and transposition errors Fuzzy search over trie with pre-specified max edit distance Linear lookup time not sufficient for interactive use Goal: Improve error model & Reduce correction time

Outline Introduction Model Search Evaluation Conclusion 5

Offline Spelling Correction 6 Query Histogram Query Histogram Query Correction Pairs Query Correction Pairs elefnatelephant Training Decoding faecbok ← facebook kinnect ← kinect … facebook0.01 kinect0.005 … ec ← ec0.1 nn ← n0.2 … a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

Online Spelling Correction 7 Query Histogram Query Histogram Query Correction Pairs Query Correction Pairs elefnelephant faecbok ← facebook kinnect ← kinect … facebook0.01 kinect0.005 … ae ← ea0.1 nn ← n0.2 … Training Decoding a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

Joint-sequence modeling (Bisani & Ney, 08) Learn common error patterns from spelling correction pairs without segmentation labels Adjust correction likelihood by interpolating model with identity transformation model 9 Expectation Maximization E-step M-step Pruning Smoothing

Estimate from empirical query frequency Add future score for A* search 10 QueryProb a0.4 ab0.2 ac0.2 abc0.1 abcc0.1 a a b b c c $ 0.4 $ 0.4 $ 0.2 $ 0.2 c c c c Query Log Query Log a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

12 a a b b c c $ 0.4 $ 0.4 $ 0.2 $ 0.2 c c c c a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1 b 0.2 b 0.2 c 0.1 c 0.1

Data Sets 14 Correctly SpelledMisspelledTotal Unique101,640 (70%)44,226 (30%)145,866 Total1,126,524 (80%)283,854 (20%)1,410,378 Correctly Spelled MisspelledTotal Unique7585(76%)2374(24%)9959

MinKeyStrokes (MKS) – # characters + # arrow keys + 1 enter key Penalized MKS (PMKS) – MKS + 0.1 × # suggested queries MinKeyStrokes (MKS) – # characters + # arrow keys + 1 enter key Penalized MKS (PMKS) – MKS + 0.1 × # suggested queries Recall@K – #Correct in Top K / #Queries Precision@K – (#Correct / #Suggested) in Top K Recall@K – #Correct in Top K / #Queries Precision@K – (#Correct / #Suggested) in Top K Metrics 15 Offline Online

All QueriesMisspelled Queries R@1R@10MKSR@1R@10MKS Proposed 0.918*0.976 11.86* 0.677* 0.900* 11.96* Edit Dist0.8990.97313.390.5790.88714.53 Results 16 Baseline: Weighted edit distance (Chaudhuri and Kaushik, 09) Outperforms baseline in all metrics (p < 0.05) except R@10 Google Suggest (August 10) Google Suggest saves users 0.4 keystrokes over baseline Proposed system further reduces user keystrokes by 1.1 1.5 keystroke savings for misspelled queries! GoogleN/A 13.01N/A 13.49

Risk Pruning 17 Apply threshold to preserve suggestion relevance Risk = geometric mean of transformation probability per character in input query Prune suggestions with many high risk words Pruning high risk suggestions lowers recall and MKS slightly, but improves precision and PMKS significantly All Queries R@1R@10P@1P@10MKSPMKS No Pruning0.9180.9760.9200.26211.8619.60 With Pruning0.9160.9690.9270.30411.8719.42

Beam Pruning 18 Prune search paths to speed up correction Absolute – Limit max paths expanded per query position Relative – Keep only paths within probability threshold of best path per query position

Example 19

Summary Modeled transformations using unsupervised joint-sequence model trained from spelling correction pairs Proposed efficient A* search algorithm with modified trie data structure and beam pruning techniques Applied risk pruning to preserve suggestion relevance Defined metrics for evaluating online spelling correction Future Work Explore additional sources of spelling correction pairs Utilize n-gram language model as query prior Extend technique to other applications 21

Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011.

Similar presentations

Presentation on theme: "Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011.

Similar presentations

Presentation on theme: "Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011."— Presentation transcript:

Similar presentations

About project

Feedback