Download presentation
Presentation is loading. Please wait.
Published byTabitha Hutchcraft Modified over 10 years ago
1
Enrich Query Representation by Query Understanding Gu Xu Microsoft Research Asia
2
Mismatching Problem Mismatching is Fundamental Problem in Search – Examples: NY ↔ New York, game cheats ↔ game cheatcodes Search Engine Challenges – Head or frequent queries Rich information available: clicks, query sessions, anchor texts, and etc. – Tail or infrequent queries Information becomes sparse and limited Our Proposal – Enrich both queries and documents and conduct matching on the enriched representation.
3
Matching at Different Semantic Levels Structure Term Sense Topic Level of Semantics Match exactly same terms NYNew York diskdisc Match terms with same meanings NYNew York motherboardmainboard utubeyoutube Match topics of query and documents Microsoft Office … working for Microsoft … my office is in … Topic: PC SoftwareTopic: Personal Homepage Match intent with answers (structures of query and document) Microsoft Office homefind homepage of Microsoft Office 21 moviefind movie named 21 buy laptop less than 1000find online dealers to buy laptop with less than 1000 dollars
4
System View Query Representation Search Log Data Web Data Offline Query Processing Offline Document Processing Ranked Documents Document Representations Query Query Knowledge Online Query Processing Matching Microsoft Online Offline Query Index Document Index
5
Enrich Query Representation Term Level michael jordan berkele michael jordan berkele Sense Level berkeley michael I. jordan berkeley Topic Level Structure Level academic michael jordan berkeley Tokenization C# C 1,000 1 000 MAX_PATH MAX PATH Query Refinement Alternative Query Finding ill-formed well-formed Ambiguity: msil or mail Equivalence (or dependency): department or dept, login or sign on Query Classification Definition of classes Accuracy & efficiency Query Parsing Named entity segmentation and disambiguation Large-scale knowledge base Representation Understanding
6
QUERY REFINEMENT USING CRF-QR (SIGIR’08)
7
Query Refinement Papers on Machin Learn Papers on Spelling Error Correction Inflection Machine Learning “ ” Phrase Segmentation Operations are mutually dependant: Spelling Error CorrectionInflectionPhrase Segmentation
8
Conventional CRF X Y x0x0 x1x1 x2x2 x3x3 ………… y10y10 y30y30 y00y00 y01y01 y11y11 y20y20 y21y21 y31y31 Intractable papersmachin learn on …… on papersmachinlearn machine machines learning learnspaperin upon ……
9
h h CRF for Query Refinement X Y O OperationDescription DeletionDelete a letter in a word InsertionInsert a letter into a word SubstitutionReplace one letter with another ExchangeSwitch two letters in a word
10
CRF for Query Refinement X Y O x2x2 x3x3 y2y2 y3y3 … … … … … … … … … … … leanwalkmachinedsupersoccermachiningdata thelearningpapermp3bookthinkmacin machinalyricslearned machi newpccomlear harrymachinejournaluniversitynet blearn clearn course 1. O constrains the mapping from X to Y (Reduce Space) o2o2 o3o3
11
CRF for Query Refinement X Y O x2x2 x3x3 … … … … … … … … … … … lean walk machined supersoccer machining data the learning papermp3bookthink macin machina lyrics learnedmachi newpccom lear harry machine journaluniversitynet blearn clearn course 1. O constrains the mapping from X to Y (Reduce Space) 2. O indexes the mapping from X to Y (Sharing Parameters) y3y3 y2y2 y2y2 y2y2 y2y2 y3y3 y3y3 y3y3 Deletion Insertion +ed +ing Deletion Insertion +ed +ing
12
NAMED ENTITY RECOGNITION IN QUERY (SIGIR’09, SIGKDD’09)
13
Named Entity Recognition in Query harry potter film harry potter harry potter author harry potter – Movie (0.5) harry potter – Book (0.4) harry potter – Game (0.1) harry potter – Movie (0.5) harry potter – Book (0.4) harry potter – Game (0.1) harry potter film harry potter – Movie (0.95) harry potter film harry potter – Movie (0.95) harry potter author harry potter – Book (0.95) harry potter author harry potter – Book (0.95)
14
Challenges Named Entity Recognition in Document Challenges – Queries are short (2-3 words on average) Less context features – Queries are not well-formed (typos, lower cased, …) Less content features Knowledge Database – Coverage and Freshness – Ambiguity
15
Our Approach to NERQ Goal of NERQ becomes to find the best triple (e, t, c)* for query q satisfying Harry Potter Walkthrough “Harry Potter” (Named Entity) + “# Walkthrough” (Context) te “Game” Class c q
16
Training With Topic Model Ideal Training Data T = {(e i, t i, c i )} Real Training Data T = {(e i, t i, * )} – Queries are ambiguous (harry potter, harry potter review) – Training data are a relatively few
17
Training With Topic Model (cont.) harry potter kung fu panda iron man …………………… …………………… harry potter kung fu panda iron man …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # wallpapers # movies # walkthrough # book price …………………… …………………… # is a placeholder for name entity. Here # means “harry potter” Movie Game Book …………………… Movie Game Book …………………… Topics etc
18
Weakly Supervised Topic Model Introducing Supervisions – Supervisions are always better – Alignment between Implicit Topics and Explicit Classes Weak Supervisions – Label named entities rather than queries (doc. class labels) – Multiple class labels (binary Indicator) Kung Fu Panda MovieGameBook ? ? Distribution Over Classes
19
WS-LDA LDA + Soft Constraints (w.r.t. Supervisions) Soft Constraints LDA Probability Soft Constraints Document Probability on i -th Class Document Probability on i -th Class Document Binary Label on i -th Class Document Binary Label on i -th Class 1 1 0
20
System Flow Chat OnlineOffline Set of named entities with labels Create “context” documents for each seed and train WS-LDA Contexts Find new named entities by using obtained contexts and estimate p(c|e) (WS- LDA) and p(e) Entities Input Query Evaluate each possible triple (e, t, c) Results
21
Extension: Leveraging Clicks # wallpapers # movies # walkthrough # book price …………………… # wallpapers # movies # walkthrough # book price …………………… t Movie Game Book www.imdb.com www.wikipedia.com www.gamespot.com www.sparknotes.com cheats.ign.com …………………… www.imdb.com www.wikipedia.com www.gamespot.com www.sparknotes.com cheats.ign.com …………………… t’ Clicked Host Name Context URL words Title words Snippet words Content words Other features URL words Title words Snippet words Content words Other features
22
Summary The goal of query understanding is to enrich query representation and essentially solve the problem of term mismatching.
23
THANKS!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.