Presentation is loading. Please wait.

Presentation is loading. Please wait.

Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16.

Similar presentations


Presentation on theme: "Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16."— Presentation transcript:

1 Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16

2 Outline Introduction to NERQ NERQ Problem Implementation WSLDA Experimental Results Conclusion and Future work 2009/10/222

3 Introduction to NERQ Named entity recognition (NER)is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. information extraction 2009/10/223

4 Introduction to NERQ NERQ involves 2 tasks: – 1. Detection of the named entity in a given query – 2. Classification of the named entity into predefined classes. – Example: mine movie titles – Applications: Web search, etc. Challenges – Queries are usually very short – Queries are not necessarily in standard form 2009/10/224

5 Query Data New data source for NER – About 70% of search queries contain named entities. – Rich context for determining the classes of entities. Query Context – “harry potter walkthrough”→“harry potter cheats” (context in the same class) Wisdom-of-crowds Very Large-scale data and keep on growing Frequent update with emerging named entities 2009/10/225

6 NERQ Problem A query having one named entity is represented as a triple (e, t, c), – e : named entity, – t : context of e α#β – c : class of e 2009/10/226

7 Probabilistic Approach (e,t,c)* = argmax (e,t,c) Pr(q,e,t,c) = argmax (e,t,c) Pr(q|e,t,c) Pr(e,t,c) = argmax (e,t,c) Pr(e,t,c) (1) Pr(e,t,c) = Pr(e) Pr(c|e) Pr(t|e,c) = Pr(e) Pr(c|e) Pr(t|c) (2) 2009/10/227 Make an assumption here

8 Topic Model for NERQ T = {(e i,t i,c i ) | i = 1..N}, the learning problem can be formalized as : 2009/10/228

9 Implementation Offline Training Online Prediction 2009/10/229

10 Offline Training 2009/10/2210 ……………….. Harry Potter ……………….. Harry Potter ……………….. Seeds Scan the query log with the seed name entity and collect the queries contain them ……………….. Harry Potter trail Harry Potter walk through Harry Potter cheats ……………….. Harry Potter trail Harry Potter walk through Harry Potter cheats ……………….. Query log

11 movie Offline Training Pr(e) : the total frequency of queries containing e in the query log 2009/10/2211 Harry PottertrailsNew Moon Name entityContextClass Query Pr(c|e) : estimated by WS-LDA Pr(c|t) : fixed

12 Online Prediction harry 2009/10/2212 trailspotter Find the most likely triple (e,t,c) in G(q)

13 WSLDA 2009/10/2213

14 WSLDA Introduce Weak Supervision – LDA log likelihood + soft constraints – Soft Constraints 2009/10/2214 LDA Probability Soft Constraints Document Probability on i -th Class Document Probability on i -th Class Document Binary Label on i -th Class Document Binary Label on i -th Class

15 WSLDA Objective Fuction : 2009/10/2215

16 Experiments A real data set consisting of 6 billion queries 930 million unique queries Four semantic classes,“Movie”, “Game”, “Book”, and “Music”. 4 human annotators. 180 named entities were selected from the web sites of Amazon, GameSpot, and Lyrics. 120 for training and 60 for test. Finally, we obtain 432,304 contexts and about 1.5 millions name entities. 2009/10/2216

17 Experiments Randomly sampled 400 queries from the recognition results(0.14 millions) for evaluation. 2009/10/2217 Example Queries pics of fight clubbraveheart quote watch gladiator onlineamerican beauty company 12 angry men charactersmario kart guide pc mass effectcrysis mods mother teresa imagescondemned screenshots 4 minutes lyricking kong the black swan summaryblackwater novel new moonrehab the song nineteen minutes synopsisumbrella chords all summer long videogirlfriend lyrics

18 Experiments The performance of NERQ is evaluated in terms of Top N accuracy. 2009/10/2218

19 Experiments We performed experiments to make comparison between the WS-LDA approach and two baseline methods: Determ and LDA. Determ learns the contexts of a certain class by simply aggregating all the contexts of named entities belonging to that class. LDA and WS-LDA take a probabilistic approach 2009/10/2219

20 Experiments 2009/11/1620 Movie ContextsGame Contexts Book Contexts Music Contexts DetermLDAWS-LDADetermLDAWS-LDA DetermLDAWS-LDA DetermLDAWS-LDA

21 Table 5: Comparisons on Learned Named Entities of Each Class (P@N) 2009/11/1621 MovieGameBookMusicAverage-Class

22 Experiments Comparisons between WS-LDA and LDA 2009/10/2222

23 Conclusion Formalized the Problem of NERQ Proposed a novel method for NERQ Develop a new topic model called WSLDA Future Works: – We plan to add more classes and conduct the experiments. – The proposed method focuses on single named entity queries. – Some queries contained the named entity out of predefined classes. (e.g. American beauty company) – Some contexts were not learned in our approach since they are uncommon. (e.g lyrics for # by chris brown ) 2009/10/2223


Download ppt "Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16."

Similar presentations


Ads by Google