Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li Presentation by Gonçalo Simões Course: Recuperação de Informação SIGIR 2009.
Chapter 5: Introduction to Information Retrieval
Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Named Entity Mining From Click-Through Data Using Weakly Supervised LDA Gu Xu 1, Shuang-Hong Yang 1,2, Hang Li 1 1 Microsoft Research Asia, China 2 College.
Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.
1 Fuchun Peng Microsoft Bing 7/23/  Query is often treated as a bag of words  But when people are formulating queries, they use “concepts” as.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Evaluating Search Engine
Information Retrieval in Practice
Search Engines and Information Retrieval
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
1 The Four Dimensions of Search Engine Quality Jan Pedersen Chief Scientist, Yahoo! Search 19 September 2005.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Named Entity Recognition in an Intranet Query Log Richard Sutcliffe 1, Kieran White 1, Udo Kruschwitz University of Limerick, Ireland 2 - University.
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Sigir’99 Inside Internet Search Engines: Search Jan Pedersen and William Chang.
1 Web Query Classification Query Classification Task: map queries to concepts Application: Paid advertisement 问题:百度 /Google 怎么赚钱?
Overview of Search Engines
Cohort Modeling for Enhanced Personalized Search Jinyun YanWei ChuRyen White Rutgers University Microsoft BingMicrosoft Research.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Search Engines and Information Retrieval Chapter 1.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
1 A Discriminative Approach to Topic- Based Citation Recommendation Jie Tang and Jing Zhang Presented by Pei Li Knowledge Engineering Group, Dept. of Computer.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
Fan Guo 1, Chao Liu 2 and Yi-Min Wang 2 1 Carnegie Mellon University 2 Microsoft Research Feb 11, 2009.
Support.ebsco.com EBSCOhost Basic Searching for Academic Libraries Tutorial.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Features and Algorithms Paper by: XIAOGUANG QI and BRIAN D. DAVISON Presentation by: Jason Bender.
Autumn Web Information retrieval (Web IR) Handout #0: Introduction Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Chapter 6: Information Retrieval and Web Search
Detecting Dominant Locations from Search Queries Lee Wang, Chuang Wang, Xing Xie, Josh Forman, Yansheng Lu, Wei-Ying Ma, Ying Li SIGIR 2005.
Autumn Web Information retrieval (Web IR) Handout #1:Web characteristics Ali Mohammad Zareh Bidoki ECE Department, Yazd University
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Personalization with user’s local data Personalizing Search via Automated Analysis of Interests and Activities 1 Sungjick Lee Department of Electrical.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Jiafeng Guo(ICT) Xueqi Cheng(ICT) Hua-Wei Shen(ICT) Gu Xu (MSRA) Speaker: Rui-Rui Li Supervisor: Prof. Ben Kao.
A Unified and Discriminative Model for Query Refinement Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Post-Ranking query suggestion by diversifying search Chao Wang.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Why Decision Engine Bing Demos Search Interaction model Data-driven Research Problems Q & A.
CS791 - Technologies of Google Spring A Web­based Kernel Function for Measuring the Similarity of Short Text Snippets By Mehran Sahami, Timothy.
Information Retrieval in Practice
Information Retrieval in Practice
LEARNING SERVICES. LEARNING SERVICES Learning Resources As a student of Edge Hill University you have a wealth of resources available to help you complete.
Search Engine Architecture
Information Retrieval (in Practice)
Online Multiscale Dynamic Topic Models
Web News Sentence Searching Using Linguistic Graph Similarity
Chinese Academy of Sciences, Beijing, China
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
Probabilistic Data Management
Detecting Online Commercial Intention (OCI)
The Four Dimensions of Search Engine Quality
ISWC 2013 Entity Recommendations in Web Search
Michal Rosen-Zvi University of California, Irvine
Ping LUO*, Fen LIN^, Yuhong XIONG*, Yong ZHAO*, Zhongzhi SHI^
Presentation transcript:

Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia

Mismatching Problem Mismatching is Fundamental Problem in Search – Examples: NY ↔ New York, game cheats ↔ game cheatcodes Search Engine Challenges – Head or frequent queries Rich information available: clicks, query sessions, anchor texts, and etc. – Tail or infrequent queries Information becomes sparse and limited Our Proposal – Enrich both queries and documents and conduct matching on the enriched representation.

Matching at Different Semantic Levels Structure Term Sense Topic Level of Semantics Match exactly same terms NYNew York diskdisc Match terms with same meanings NYNew York motherboardmainboard utubeyoutube Match topics of query and documents Microsoft Office … working for Microsoft … my office is in … Topic: PC SoftwareTopic: Personal Homepage Match intent with answers (structures of query and document) Microsoft Office homefind homepage of Microsoft Office 21 moviefind movie named 21 buy laptop less than 1000find online dealers to buy laptop with less than 1000 dollars

System View Query Representation Search Log Data Web Data Offline Query Processing Offline Document Processing Ranked Documents Document Representations Query Query Knowledge Online Query Processing Matching Microsoft Online Offline Query Index Document Index

Enrich Query Representation Term Level michael jordan berkele michael jordan berkele Sense Level berkeley michael I. jordan berkeley Topic Level Structure Level academic michael jordan berkeley Tokenization C# C 1, MAX_PATH MAX PATH Query Refinement Alternative Query Finding ill-formed well-formed Ambiguity: msil or mail Equivalence (or dependency): department or dept, login or sign on Query Classification Definition of classes Accuracy & efficiency Query Parsing Named entity segmentation and disambiguation Large-scale knowledge base Representation Understanding

QUERY REFINEMENT USING CRF-QR (SIGIR’08)

Query Refinement Papers on Machin Learn Papers on Spelling Error Correction Inflection Machine Learning “ ” Phrase Segmentation Operations are mutually dependant: Spelling Error CorrectionInflectionPhrase Segmentation

Conventional CRF X Y x0x0 x1x1 x2x2 x3x3 ………… y10y10 y30y30 y00y00 y01y01 y11y11 y20y20 y21y21 y31y31 Intractable papersmachin learn on …… on papersmachinlearn machine machines learning learnspaperin upon ……

h h CRF for Query Refinement X Y O OperationDescription DeletionDelete a letter in a word InsertionInsert a letter into a word SubstitutionReplace one letter with another ExchangeSwitch two letters in a word

CRF for Query Refinement X Y O x2x2 x3x3 y2y2 y3y3 … … … … … … … … … … … leanwalkmachinedsupersoccermachiningdata thelearningpapermp3bookthinkmacin machinalyricslearned machi newpccomlear harrymachinejournaluniversitynet blearn clearn course 1. O constrains the mapping from X to Y (Reduce Space) o2o2 o3o3

CRF for Query Refinement X Y O x2x2 x3x3 … … … … … … … … … … … lean walk machined supersoccer machining data the learning papermp3bookthink macin machina lyrics learnedmachi newpccom lear harry machine journaluniversitynet blearn clearn course 1. O constrains the mapping from X to Y (Reduce Space) 2. O indexes the mapping from X to Y (Sharing Parameters) y3y3 y2y2 y2y2 y2y2 y2y2 y3y3 y3y3 y3y3 Deletion Insertion +ed +ing Deletion Insertion +ed +ing

NAMED ENTITY RECOGNITION IN QUERY (SIGIR’09, SIGKDD’09)

Named Entity Recognition in Query harry potter film harry potter harry potter author harry potter – Movie (0.5) harry potter – Book (0.4) harry potter – Game (0.1) harry potter – Movie (0.5) harry potter – Book (0.4) harry potter – Game (0.1) harry potter film harry potter – Movie (0.95) harry potter film harry potter – Movie (0.95) harry potter author harry potter – Book (0.95) harry potter author harry potter – Book (0.95)

Challenges Named Entity Recognition in Document Challenges – Queries are short (2-3 words on average) Less context features – Queries are not well-formed (typos, lower cased, …) Less content features Knowledge Database – Coverage and Freshness – Ambiguity

Our Approach to NERQ Goal of NERQ becomes to find the best triple (e, t, c)* for query q satisfying Harry Potter Walkthrough “Harry Potter” (Named Entity) + “# Walkthrough” (Context) te “Game” Class c q

Training With Topic Model Ideal Training Data T = {(e i, t i, c i )} Real Training Data T = {(e i, t i, * )} – Queries are ambiguous (harry potter, harry potter review) – Training data are a relatively few

Training With Topic Model (cont.) harry potter kung fu panda iron man …………………… …………………… harry potter kung fu panda iron man …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # is a placeholder for name entity. Here # means “harry potter” Movie Game Book …………………… Movie Game Book …………………… Topics etc

Weakly Supervised Topic Model Introducing Supervisions – Supervisions are always better – Alignment between Implicit Topics and Explicit Classes Weak Supervisions – Label named entities rather than queries (doc. class labels) – Multiple class labels (binary Indicator) Kung Fu Panda MovieGameBook ? ? Distribution Over Classes

WS-LDA LDA + Soft Constraints (w.r.t. Supervisions) Soft Constraints LDA Probability Soft Constraints Document Probability on i -th Class Document Probability on i -th Class Document Binary Label on i -th Class Document Binary Label on i -th Class 1 1 0

System Flow Chat OnlineOffline Set of named entities with labels Create “context” documents for each seed and train WS-LDA Contexts Find new named entities by using obtained contexts and estimate p(c|e) (WS- LDA) and p(e) Entities Input Query Evaluate each possible triple (e, t, c) Results

Extension: Leveraging Clicks # wallpapers # movies # walkthrough # book price …………………… # wallpapers # movies # walkthrough # book price …………………… t Movie Game Book cheats.ign.com …………………… cheats.ign.com …………………… t’ Clicked Host Name Context URL words Title words Snippet words Content words Other features URL words Title words Snippet words Content words Other features

Summary The goal of query understanding is to enrich query representation and essentially solve the problem of term mismatching.

THANKS!