A Unified and Discriminative Model for Query Refinement Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China.

Slides:



Advertisements
Similar presentations
A Support Vector Method for Optimizing Average Precision
Advertisements

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.
Face Alignment by Explicit Shape Regression
Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
1 Fuchun Peng Microsoft Bing 7/23/  Query is often treated as a bag of words  But when people are formulating queries, they use “concepts” as.
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
1 A Balanced Introduction to Computer Science, 2/E David Reed, Creighton University ©2008 Pearson Prentice Hall ISBN Chapter 17 JavaScript.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Constructing Popular Routes from Uncertain Trajectories Ling-Yin Wei 1, Yu Zheng 2, Wen-Chih Peng 1 1 National Chiao Tung University, Taiwan 2 Microsoft.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Scalable Text Mining with Sparse Generative Models
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin The Chinese.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Constrained Optimization for Validation-Guided Conditional Random Field Learning Minmin Chen , Yixin Chen , Michael Brent , Aaron Tenney Washington University.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars Kewei TuVasant Honavar Departments of Statistics and Computer Science University.
Short Text Understanding Through Lexical-Semantic Analysis
Graphical models for part of speech tagging
 An important problem in sponsored search advertising is keyword generation, which bridges the gap between the keywords bidded by advertisers and queried.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
Probabilistic Query Expansion Using Query Logs Hang Cui Tianjin University, China Ji-Rong Wen Microsoft Research Asia, China Jian-Yun Nie University of.
Effective Query Formulation with Multiple Information Sources
TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Query Segmentation Using Conditional Random Fields Xiaohui and Huxia Shi York University KEYS’09 (SIGMOD Workshop) Presented by Jaehui Park,
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
Institute of Computing Technology, Chinese Academy of Sciences 1 A Unified Framework of Recommending Diverse and Relevant Queries Speaker: Xiaofei Zhu.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Pairwise Preference Regression for Cold-start Recommendation Speaker: Yuanshuai Sun
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
More Than Relevance: High Utility Query Recommendation By Mining Users' Search Behaviors Xiaofei Zhu, Jiafeng Guo, Xueqi Cheng, Yanyan Lan Institute of.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
NTU & MSRA Ming-Feng Tsai
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Personalizing Web Search Jaime Teevan, MIT with Susan T. Dumais and Eric Horvitz, MSR.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
An Empirical Study of Learning to Rank for Entity Search
Chinese Academy of Sciences, Beijing, China
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
A Markov Random Field Model for Term Dependencies
Learning to Rank with Ties
Ranking using Multiple Document Types in Desktop Search
Presentation transcript:

A Unified and Discriminative Model for Query Refinement Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research Asia, China

Outline Motivation Our Approach Experimental Results Conclusion

Outline Motivation Our Approach Experimental Results Conclusion

Introduction Information Retrieval Search Query university of california kelley blue book prison break best movie download free music free online games ny times …… Word Mismatch New York Times match

Cont’ Query Refinement Spelling Error CorrectionWord Stemming Phrase Segmentation Word SplittingWord Merging Acronym Expansion Words to be stemmed: data mine  data mining Misspelled words: sytem number  system number Phrase to be quoted: the office show  “the office” show Mistakenly merged words: nypark  ny park Mistakenly split words: on line game  online game Acronym to be Expanded: nfs  need for speed ill-formed queries

Previous Work Query Refinement: –Spelling error correction: [1] Exploring distributional similarity based query spelling correction (Li et al. ACL ’06) [2] Spelling correction as an iterative process that exploits the collective knowledge of web users (Cucerzan et al. EMNLP ‘04) [3] Learning a spelling error model from search query logs (Ahmad et al. EMNLP ‘05) [4] Improving query spelling correction using web search results (Chen et al. EMNLP ‘07) –Word stemming: [5] Context sensitive stemming stemming for web search (Peng et al. SIGIR ‘07) –Query segmentation: [6] Query segmentation for web search (Risvik et al. WWW ‘03) [7] Learning noun phrase query segmentation (Bergsma et al. EMNLP ‘07) WorkTaskApproach [1][2][3]spelling correctiongenerative [1][3]spelling correctiondiscriminative [5]word stemminggenerative [6]phrase segmentationgenerative [7]phrase segmentationdiscriminative Separate Tasks Generative Models A unified framework Discriminative Model Our GoalExisting

Cont’ Incorporate different tasks easily Address tasks simultaneously to boost accuracy  Mutual dependencies between tasks Cascaded Model ? Why unified framework?  Various query refinement tasks Ignore the dependencies between the tasks Accumulate errors through the processes on A case of Query Refinement machinlearnPapers on“machinelearning”Papers Spelling error correction word stemming Phrase segmentation Refined: Original:

Cont’ Enjoy all the merits of discriminative learning A direct application of existing models would not work  By nature a structured prediction problem Why discriminative model? Conditional Random Fields for Query Refinement (CRF-QR) on A case of Query Refinement machinlearnPapers on“machinelearning”Papers Spelling error correction word stemming Phrase segmentation Refined: Original:

Outline Motivation Our Approach Experimental Results Conclusion

Our Approach on Structured Prediction problem A case of Query Refinement machinlearnPapers on“machinelearning”Papers Spelling error correction word stemming Phrase segmentation Refined: Original:

Conventional CRF y i-1 yiyi y i+1 x i-1 xixi x i+1 Conventional CRF theonlinepapermp3bookthink harryfreejournaluniversitynet download lyrics newpccom ……………………………… theonlinepapermp3bookthink harryfreejournaluniversitynet download lyrics newpccom ……………………………… Space of y Space of x Learning is Intractable ! Query words Refined query words Conditional Probability Model

CRF-QR Basic Model y i-1 yiyi y i+1 x i-1 xixi x i+1 o i-1 oioi o i+1 Introducing Refinement Operations

Refinement Operations TaskOperationDescription Spelling Error Correction DeletionDelete a letter in the word InsertionInsert a letter into the word SubstitutionReplace a letter in the word with another letter TranspositionSwitch two letters in the word Word Splitting SplittingSplit one word into two words Word Merging MergingMerge two words into one word Phrase Segmentation BeginMark a word as beginning of phrase MiddleMark a word as middle of phrase EndMark a word as end of phrase OutMark a word as out of phrase Word Stemming +s/-sAdd or Remove suffix `-s' +ed/-edAdd or Remove suffix `-ed' +ing/-ingAdd or Remove suffix `-ing' Acronym Expansion ExpansionExpand acronym

Conditional Function y i-1 yiyi y i+1 x i-1 xixi x i+1 o i-1 oioi o i+1 Conditional Function Potential Function Basic CRF-QR model

Function of Operations machinlearn …………………………………… leanwalkmachinedsupersoccermachiningdata thelearningpapermp3bookthinkmacin machinalyricslearned machi newpccomlear harrymachinejournaluniversitynet blearn clearn course …… operations Insertion+edDeletion+ingInsertion+edDeletion+ing 1. o constrains the mapping from x's to y's (Reduce Space) 2. o indexes the mapping from x's to y's (Common Property) Learning becomes efficient! x y

Learning and Prediction Learning: –Labeled data (x, y, o) –Maximize the regularized log-likelihood function –Quasi-Newton Method –Global optimal is guaranteed Prediction: –Viterbi algorithm

Features y i-1 yiyi y i+1 x i-1 xixi x i+1 o i-1 oioi o i+1 Feature Type 1: Feature Type 2: Lexicon-based feature Position-based feature Word-based feature Corpus-based feature Query-based feature

CRF-QR Extended model multiple refinement tasks needed bopk book booking Original QueryExpected Query hotel bopkhotel booking hotel book Basic Extended hotel book y i-1 yiyi y i+1 x i-1 xixi x i+1

Outline Motivation Our Approach Experimental Results Conclusion

Experimental Result Data Set –Random select 10,000 queries –Average length: 2.8 words –Four human annotators –Four refinement types: Spelling error correction Word merging Word splitting Phrase segmentation –Training 7000 Testing 3000

Baseline Method Cascaded approach –Build one sub-model for each task –Same structure and feature set for each sub-model –Sequentially connect the sub-models in different orders Generative approach –Source channel model for spelling error correction, word splitting and word merging Channel model: Assume equal translation probabilities Source model: Language probabilities –Mutual Information for phrase segmentation (cf. [5]) Phrase Segmentation Word Merging Word Splitting Spelling Error Correction Phrase Segmentation Word Merging Word Splitting Spelling Error Correction Phrase Segmentation Word Merging Word Splitting Spelling Error Correction …

Experiment on Query Refinement Comparisons between CRF-QR and Baselines on Query Refinement at Query level (%) Relative Improvement: F1 Score 2.26% Accuracy 1.21%

Cont’ Comparisons between CRF-QR and Baselines on Query Refinement Tasks (%) CRF-QR performs best!

Case Study Why CRF-QR can outperform the Baseline methods? –Cascaded approach suffers from the neglect of mutual dependencies between tasks E.g. nypark hitel  ny “park hotel” –Cascaded approach accumulate errors E.g. bankin las vegas  banking “las vegas” (bank in “las vegas”) –Generative approach produces more incorrect results E.g. pick up stix  pick up six door to door  “door to” door

Error Analysis (1) Errors were mainly made by one of the refinement tasks –E.g. parnell roberts  pernell roberts –Adding new features –Increasing data size for language model training (2) Competition between refinement tasks –E.g. skate board dudes  “skate board” dudes (skateboard dudes) –Adding new features –Increasing training data size (3) Some queries were difficult to refine even for humans –E.g. ohio buckeye card  “ohio buckeye” card (ohio “buckeye card”)

Experiment on Relevance Search Results on Relevance Search with Entire Query Set Results on Relevance Search with Refined Queries Measure: NDCG

Cont’ Results on Relevance Search by Query Refinement Tasks

Outline Motivation Our Approach Experimental Results Conclusion

Query Refinement –Automatically reformulate ill-formed queries –Better represent users’ search needs CRF-QR model –Unified –Discriminative Experimental results –Query Refinement –Relevance Search

Thank You!