Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011.

Slides:

Advertisements

Similar presentations

Re-organization of IR/CSC team Hongchao He Hongchao He Conf. follow up TREC-10, NTCIR Conf. follow up TREC-10, NTCIR Paper follow up ICCLP, SIGIR paper.

Advertisements

Context-Sensitive Query Auto-Completion AUTHORS:NAAMA KRAUS AND ZIV BAR-YOSSEF DATE OF PUBLICATION:NOVEMBER 2010 SPEAKER:RISHU GUPTA 1.

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Learning to Suggest: A Machine Learning Framework for Ranking Query Suggestions Date: 2013/02/18 Author: Umut Ozertem, Olivier Chapelle, Pinar Donmez,

An Interactive-Voting Based Map Matching Algorithm

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:

Ziv Bar-YossefMaxim Gurevich Google and Technion Technion TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA.

Introduction to Information Retrieval

Retrieval Evaluation J. H. Wang Mar. 18, Outline Chap. 3, Retrieval Evaluation –Retrieval Performance Evaluation –Reference Collections.

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.

Large-Scale Entity-Based Online Social Network Profile Linkage.

Effective Phrase Prediction Arnab Nandi, H. V. Jagadish Dept. of EECS, University of Michigan, Ann Arbor VLDB Sep 2011 IDB Lab Seminar.

Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,

DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.

Context-aware Query Suggestion by Mining Click-through and Session Data Authors: H. Cao et.al KDD 08 Presented by Shize Su 1.

Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.

A Novel Scheme for Video Similarity Detection Chu-Hong Hoi, Steven March 5, 2003.

Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Cmput 650 Final Project Probabilistic Spelling Correction for Search Queries.

Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.

Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.

FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.

The Relevance Model  A distribution over terms, given information need I, (Lavrenko and Croft 2001). For term r, P(I) can be dropped w/o affecting the.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs ReporterHsan-Yu Lin.

MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.

Suggesting Friends using the Implicit Social Graph Maayan Roth et al. (Google, Inc., Israel R&D Center) KDD’10 Hyewon Lim 1 Oct 2014.

Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.

1 Cross-Lingual Query Suggestion Using Query Logs of Different Languages SIGIR 07.

Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.

Ruirui Li, Ben Kao, Bin Bi, Reynold Cheng, Eric Lo Speaker: Ruirui Li 1 The University of Hong Kong.

Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor ： Jia Ling, Koh Speaker ： SHENG HONG, CHUNG.

11 Learning to Suggest Questions in Online Learning to Suggest Questions in Online Forums Tom Chao Zhou, Chin-Yew Lin, Irwin King Michael R.

An Effective Approach for Searching Closest Sentence Translations from The Web Ju Fan, Guoliang Li, and Lizhu Zhou Database Research Group, Tsinghua University.

Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.

Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.

Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.

Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.

Web Image Retrieval Re-Ranking with Relevance Model Wei-Hao Lin, Rong Jin, Alexander Hauptmann Language Technologies Institute School of Computer Science.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Positional Relevance Model for Pseudo–Relevance Feedback Yuanhua Lv & ChengXiang Zhai Department of Computer Science, UIUC Presented by Bo Man 2014/11/18.

Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.

August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:

Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.

Generating Query Substitutions Alicia Wood. What is the problem to be solved?

Query Suggestions in the Absence of Query Logs Sumit Bhatia, Debapriyo Majumdar,Prasenjit Mitra SIGIR’11, July 24–28, 2011, Beijing, China.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

Autumn Web Information retrieval (Web IR) Handout #3:Dictionaries and tolerant retrieval Mohammad Sadegh Taherzadeh ECE Department, Yazd University.

Relevance Feedback Hongning Wang

User-Friendly Systems Instead of User-Friendly Front-Ends Present user interfaces are not accepted because the underlying systems are too difficult to.

A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.

WNSpell: A WordNet-Based Spell Corrector BILL HUANG PRINCETON UNIVERSITY Global WordNet Conference 2016Bucharest, Romania.

Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.

Chapter 6 Queries and Interfaces. Keyword Queries n Simple, natural language queries were designed to enable everyone to search n Current search engines.

Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,

CS791 - Technologies of Google Spring A Webbased Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.

Learning to Personalize Query Auto-Completion

Relevance Feedback Hongning Wang

Basic Information Retrieval

Learning Literature Search Models from Citation Behavior

Presentation transcript:

Online Spelling Correction for Query Completion Huizhong Duan, UIUC Bo-June (Paul) Hsu, Microsoft WWW 2011 March 31, 2011

Background Query misspellings are common (>10%) 2 Typing quickly exxit mis[s]pell Inconsistent rules concieve conceirge Keyboard adjacency imporyant Ambiguous word breaking silver_light New words kinnect Typing quickly exxit mis[s]pell Inconsistent rules concieve conceirge Keyboard adjacency imporyant Ambiguous word breaking silver_light New words kinnect

Spelling Correction Goal: Help users formulate their intent 3 Offline: After entering query Online: While entering query Inform users of potential errors Help express information needs Reduce effort to input query Online: While entering query Inform users of potential errors Help express information needs Reduce effort to input query

Motivation 4 Existing search engines offer limited online spelling correction Offline Spelling Correction (see paper) Model: (Weighted) edit distance Data: Query similarity, click log, … Auto Completion with Error Tolerance (Chaudhuri & Kaushik, 09) Poor model for phonetic and transposition errors Fuzzy search over trie with pre-specified max edit distance Linear lookup time not sufficient for interactive use Goal: Improve error model & Reduce correction time

Outline Introduction Model Search Evaluation Conclusion 5

Offline Spelling Correction 6 Query Histogram Query Histogram Query Correction Pairs Query Correction Pairs elefnatelephant Training Decoding faecbok ← facebook kinnect ← kinect … facebook0.01 kinect0.005 … ec ← ec0.1 nn ← n0.2 … a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

Online Spelling Correction 7 Query Histogram Query Histogram Query Correction Pairs Query Correction Pairs elefnelephant faecbok ← facebook kinnect ← kinect … facebook0.01 kinect0.005 … ae ← ea0.1 nn ← n0.2 … Training Decoding a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

8

Joint-sequence modeling (Bisani & Ney, 08) Learn common error patterns from spelling correction pairs without segmentation labels Adjust correction likelihood by interpolating model with identity transformation model 9 Expectation Maximization E-step M-step Pruning Smoothing

Estimate from empirical query frequency Add future score for A* search 10 QueryProb a0.4 ab0.2 ac0.2 abc0.1 abcc0.1 a a b b c c $ 0.4 $ 0.4 $ 0.2 $ 0.2 c c c c Query Log Query Log a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1

Outline Introduction Model Search Evaluation Conclusion 11

12 a a b b c c $ 0.4 $ 0.4 $ 0.2 $ 0.2 c c c c a 0.4 a 0.4 b 0.2 b 0.2 c 0.2 c 0.2 $ 0.4 $ 0.4 $ 0.2 $ 0.2 c 0.1 c 0.1 c 0.1 c 0.1 b 0.2 b 0.2 c 0.1 c 0.1

Outline Introduction Model Search Evaluation Conclusion 13

Data Sets 14 Correctly SpelledMisspelledTotal Unique101,640 (70%)44,226 (30%)145,866 Total1,126,524 (80%)283,854 (20%)1,410,378 Correctly Spelled MisspelledTotal Unique7585(76%)2374(24%)9959

MinKeyStrokes (MKS) – # characters + # arrow keys + 1 enter key Penalized MKS (PMKS) – MKS × # suggested queries MinKeyStrokes (MKS) – # characters + # arrow keys + 1 enter key Penalized MKS (PMKS) – MKS × # suggested queries – #Correct in Top K / #Queries – (#Correct / #Suggested) in Top K – #Correct in Top K / #Queries – (#Correct / #Suggested) in Top K Metrics 15 Offline Online

All QueriesMisspelled Queries Proposed 0.918* * 0.677* 0.900* 11.96* Edit Dist Results 16 Baseline: Weighted edit distance (Chaudhuri and Kaushik, 09) Outperforms baseline in all metrics (p < 0.05) except Google Suggest (August 10) Google Suggest saves users 0.4 keystrokes over baseline Proposed system further reduces user keystrokes by keystroke savings for misspelled queries! GoogleN/A 13.01N/A 13.49

Risk Pruning 17 Apply threshold to preserve suggestion relevance Risk = geometric mean of transformation probability per character in input query Prune suggestions with many high risk words Pruning high risk suggestions lowers recall and MKS slightly, but improves precision and PMKS significantly All Queries No Pruning With Pruning

Beam Pruning 18 Prune search paths to speed up correction Absolute – Limit max paths expanded per query position Relative – Keep only paths within probability threshold of best path per query position

Example 19

Outline Introduction Model Search Evaluation Conclusion 20

Summary Modeled transformations using unsupervised joint-sequence model trained from spelling correction pairs Proposed efficient A* search algorithm with modified trie data structure and beam pruning techniques Applied risk pruning to preserve suggestion relevance Defined metrics for evaluating online spelling correction Future Work Explore additional sources of spelling correction pairs Utilize n-gram language model as query prior Extend technique to other applications 21