Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.

Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

Outline Introduction to NERQ NERQ Problem Implementation WSLDA Experimental Results Conclusion and Future work 2009/10/222

Introduction to NERQ Named entity recognition (NER)is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. information extraction 2009/10/223

Introduction to NERQ NERQ involves 2 tasks: – 1. Detection of the named entity in a given query – 2. Classification of the named entity into predefined classes. – Example: mine movie titles – Applications: Web search, etc. Challenges – Queries are usually very short – Queries are not necessarily in standard form 2009/10/224

Query Data New data source for NER – About 70% of search queries contain named entities. – Rich context for determining the classes of entities. Query Context – “harry potter walkthrough”→“harry potter cheats” (context in the same class) Wisdom-of-crowds Very Large-scale data and keep on growing Frequent update with emerging named entities 2009/10/225

NERQ Problem A query having one named entity is represented as a triple (e, t, c), – e : named entity, – t : context of e α#β – c : class of e 2009/10/226

Topic Model for NERQ T = {(e i,t i,c i ) | i = 1..N}, the learning problem can be formalized as : 2009/10/228

Implementation Offline Training Online Prediction 2009/10/229

Offline Training 2009/10/2210 ……………….. Harry Potter ……………….. Harry Potter ……………….. Seeds Scan the query log with the seed name entity and collect the queries contain them ……………….. Harry Potter trail Harry Potter walk through Harry Potter cheats ……………….. Harry Potter trail Harry Potter walk through Harry Potter cheats ……………….. Query log

movie Offline Training Pr(e) : the total frequency of queries containing e in the query log 2009/10/2211 Harry PottertrailsNew Moon Name entityContextClass Query Pr(c|e) : estimated by WS-LDA Pr(c|t) : fixed

Online Prediction harry 2009/10/2212 trailspotter Find the most likely triple (e,t,c) in G(q)

WSLDA 2009/10/2213

WSLDA Introduce Weak Supervision – LDA log likelihood + soft constraints – Soft Constraints 2009/10/2214 LDA Probability Soft Constraints Document Probability on i -th Class Document Probability on i -th Class Document Binary Label on i -th Class Document Binary Label on i -th Class

WSLDA Objective Fuction : 2009/10/2215

Experiments A real data set consisting of 6 billion queries 930 million unique queries Four semantic classes,“Movie”, “Game”, “Book”, and “Music”. 4 human annotators. 180 named entities were selected from the web sites of Amazon, GameSpot, and Lyrics. 120 for training and 60 for test. Finally, we obtain 432,304 contexts and about 1.5 millions name entities. 2009/10/2216

Experiments Randomly sampled 400 queries from the recognition results(0.14 millions) for evaluation. 2009/10/2217 Example Queries pics of fight clubbraveheart quote watch gladiator onlineamerican beauty company 12 angry men charactersmario kart guide pc mass effectcrysis mods mother teresa imagescondemned screenshots 4 minutes lyricking kong the black swan summaryblackwater novel new moonrehab the song nineteen minutes synopsisumbrella chords all summer long videogirlfriend lyrics

Experiments The performance of NERQ is evaluated in terms of Top N accuracy. 2009/10/2218

Experiments We performed experiments to make comparison between the WS-LDA approach and two baseline methods: Determ and LDA. Determ learns the contexts of a certain class by simply aggregating all the contexts of named entities belonging to that class. LDA and WS-LDA take a probabilistic approach 2009/10/2219

Experiments 2009/11/1620 Movie ContextsGame Contexts Book Contexts Music Contexts DetermLDAWS-LDADetermLDAWS-LDA DetermLDAWS-LDA DetermLDAWS-LDA

Table 5: Comparisons on Learned Named Entities of Each Class (P@N) 2009/11/1621 MovieGameBookMusicAverage-Class

Experiments Comparisons between WS-LDA and LDA 2009/10/2222

Conclusion Formalized the Problem of NERQ Proposed a novel method for NERQ Develop a new topic model called WSLDA Future Works: – We plan to add more classes and conduct the experiments. – The proposed method focuses on single named entity queries. – Some queries contained the named entity out of predefined classes. (e.g. American beauty company) – Some contexts were not learned in our approach since they are uncommon. (e.g lyrics for # by chris brown ) 2009/10/2223

Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.

Similar presentations

Presentation on theme: "Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.

Similar presentations

Presentation on theme: "Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16."— Presentation transcript:

Similar presentations

About project

Feedback