Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.

Slides:



Advertisements
Similar presentations
Multi-Document Person Name Resolution Michael Ben Fleischman (MIT), Eduard Hovy (USC) From Proceedings of ACL-42 Reference Resolution workshop 2004.
Advertisements

Sequential Minimal Optimization Advanced Machine Learning Course 2012 Fall Semester Tsinghua University.
Knowledge Base Completion via Search-Based Question Answering
Date : 2013/05/27 Author : Anish Das Sarma, Lujun Fang, Nitin Gupta, Alon Halevy, Hongrae Lee, Fei Wu, Reynold Xin, Gong Yu Source : SIGMOD’12 Speaker.
Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Improved TF-IDF Ranker
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro.
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
Context-Aware Query Classification Huanhuan Cao 1, Derek Hao Hu 2, Dou Shen 3, Daxin Jiang 4, Jian-Tao Sun 4, Enhong Chen 1 and Qiang Yang 2 1 University.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Copyright © 2010 Accenture All Rights Reserved. Accenture, its logo, and High Performance Delivered are trademarks of Accenture. Multiple Ontologies in.
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Short Text Understanding Through Lexical-Semantic Analysis
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Random Walks and Semi-Supervised Learning Longin Jan Latecki Based on : Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis. CMU-LTI ,
Presented by: Apeksha Khabia Guided by: Dr. M. B. Chandak
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Algorithmic Detection of Semantic Similarity WWW 2005.
DOCUMENT UPDATE SUMMARIZATION USING INCREMENTAL HIERARCHICAL CLUSTERING CIKM’10 (DINGDING WANG, TAO LI) Advisor: Koh, Jia-Ling Presenter: Nonhlanhla Shongwe.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
August 17, 2005Question Answering Passage Retrieval Using Dependency Parsing 1/28 Question Answering Passage Retrieval Using Dependency Parsing Hang Cui.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Ranking Related Entities Components and Analyses CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Context-Aware Query Classification Huanhuan Cao, Derek Hao Hu, Dou Shen, Daxin Jiang, Jian-Tao Sun, Enhong Chen, Qiang Yang Microsoft Research Asia SIGIR.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
A Novel Relational Learning-to- Rank Approach for Topic-focused Multi-Document Summarization Yadong Zhu, Yanyan Lan, Jiafeng Guo, Pan Du, Xueqi Cheng Institute.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Contextual Search and Name Disambiguation in Using Graphs Einat Minkov, William W. Cohen, Andrew Y. Ng Carnegie Mellon University and Stanford University.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Exploiting Wikipedia as External Knowledge for Document Clustering
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
Summarization for entity annotation Contextual summary
Text Annotation: DBpedia Spotlight
Enriching Taxonomies With Functional Domain Knowledge
Topic: Semantic Text Mining
Presentation transcript:

Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao Zhou July 17, /24/20151

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 2/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 3/34 5/24/2015

Motivation Many large scale knowledge bases have emerged DBpedia, YAGO, Freebase, and etc. 4/34 5/24/2015

Motivation Many large scale knowledge bases have emerged DBpedia, YAGO, Freebase, and etc. As world evolves New facts come into existence Digitally expressed on the Web Maintaining and growing the existing knowledge bases Integrating the extracted facts with knowledge base Challenge Name variations “National Basketball Association”  “NBA” “New York City”  “Big Apple” Entity ambiguity “Michael Jordan” … NBA player Berkeley professor 5/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 6/34 5/24/2015

Problem Definition Entity linking task Input: A textual named entity mention m, already recognized in the unstructured text Output: The corresponding real world entity e in the knowledge base If the matching entity e for entity mention m does not exist in the knowledge base, we should return NIL for m 7/34 5/24/2015

Entity linking task Source: From Information to Knowledge:Harvesting Entities and Relationships from Web Sources. PODS’10. German Chancellor Angela Merkel and her husband Joachim Sauer went to Ulm, Germany. NIL Figure 1: An example of YAGO 8/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 9/34 5/24/2015

Previous Methods Essential step of entity linking Define a similarity measure between the text around the entity mention and the document associated with the entity Bag of words model Represent the context as a term vector Measure the co-occurrence statistics of terms Cannot capture the semantic knowledge Example: Text: Michael Jordan wins NBA champion. The bag of words model cannot work well! 10/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 11/34 5/24/2015

LINDEN Framework Candidate Entity Generation For each named entity mention m Retrieve the set of candidate entities E m Named Entity Disambiguation For each candidate entity e ∈ E m Define a scoring measure Give a rank to E m Unlinkable Mention Prediction For each e top which has the highest score in E m Validate whether the entity e top is the target entity for mention m 12/34 5/24/2015

Candidate Entity Generation Intuitively, the candidates in E m should have the name of the surface form of m. We build a dictionary that contains vast amount of information about the surface forms of entities Like name variations, abbreviations, confusable names, spelling variations, nicknames, etc. Leverage the four structures of Wikipedia Entity pages Redirect pages Disambiguation pages Hyperlinks in Wikipedia articles 13/34 5/24/2015

Candidate Entity Generation (Cont’) For each mention m Search it in the field of surface forms If a hit is found, we add all target entities of that surface form m to the set of candidate entities E m Table 1: An example of the dictionary 14/34 5/24/2015

Named Entity Disambiguation Goal: Give a rank to candidate entities according to their scores Define four features Feature 1: Link probability Based on the count information in the dictionary Semantic network based features Feature 2: Semantic associativity Based on the Wikipedia hyperlink structure Feature 3: Semantic similarity Derived from the taxonomy of YAGO Feature 4: Global coherence Global document-level topical coherence among entities 15/34 5/24/2015

Link Probability Feature 1: link probability LP(e|m) for candidate entity e where count m (e) is the number of links which point to entity e and have the surface form m Table 1: An example of the dictionary LP 16/34 5/24/2015

Semantic Network Construction Recognize all the Wikipedia concepts Γ d in the document d The open source toolkit Wikipedia-Miner 1 Example: The Chicago Bulls’ player Michael Jordan won his first NBA championship in Set of entity mentions: {Michael Jordan, NBA} Candidate entities: Michael Jordan  {Michael J. Jordan, Michael I. Jordan} NBA  {National Basketball Association, Nepal Basketball Association} Γ d : {NBA All-Star Game, David Joel Stern, Charlotte Bobcats, Chicago Bulls} Hyperlink structure of Wikipedia articles Taxonomy of concepts in YAGO 1 Figure 2: An example of the constructed semantic network 17/34 5/24/2015

Semantic Associativity Feature 2: semantic associativity SA(e) for each candidate entity e Figure 2: An example of the constructed semantic network 18/34 5/24/2015

Semantic Associativity (Cont’) Given two Wikipedia concepts e 1 and e 2 Wikipedia Link-based Measure (WLM) [1] Semantic associativity between them where E1 and E2 are the sets of Wikipedia concepts that hyperlink to e 1 and e 2 respectively, and W is the set of all concepts in Wikipedia [1] D. Milne and I. H. Witten. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links. In Proceedings of WIKIAI, /34 5/24/2015

Semantic Similarity Feature 3: semantic similarity SS(e) for each candidate entity e where Θ k is the set of k context concepts in Γ d which have the highest semantic similarity with entity e Figure 2: An example of the constructed semantic network k=2 20/34 5/24/2015

Semantic Similarity (Cont’) Given two Wikipedia concepts e 1 and e 2 Assume the sets of their super classes are Φ e1 and Φ e2 For each class C 1 in the set Φ e1 Assign a target class ε(C 1 ) in another set Φ e2 as Where sim(C1, C2) is the semantic similarity between two classes C1 and C2 To compute sim(C1, C2) Adopt the information-theoretic approach introduced in [2] Where C0 is the lowest common ancestor node for class nodes C1 and C2 in the hierarchy, P(C) is the probability that a randomly selected object belongs to the subtree with the root of C in the taxonomy. [2] D. Lin. An information-theoretic definition of similarity. In Proceedings of ICML, pages 296–304, /34 5/24/2015

Semantic Similarity (Cont’) Calculate the semantic similarity from one set of classes Φ e1 to another set of classes Φ e2 Define the semantic similarity between Wikipedia concepts e 1 and e 2 22/34 5/24/2015

Global Coherence Feature 4: global coherence GC(e) for each candidate entity e Measured as the average semantic associativity of candidate entity e to the mapping entities of the other mentions where e m’ is the mapping entity of mention m’ Substitute the most likely assigned entity for the mapping entity in Formula 9 The most likely assigned entity e’ m’ for mention m is defined as the candidate entity which has the maximum link probability in E m 23/34 5/24/2015

Global Coherence (Cont’) Figure 2: An example of the constructed semantic network 24/34 5/24/2015

Candidates Ranking To generate a feature vector F m (e) for each e ∈ E m To calculate Score m (e) for each candidate e where is the weight vector which gives different weights for each feature element in F m (e) Rank the candidates and pick the top candidate as the predicted mapping entity for mention m To learn, we use a max-margin technique based on the training data set Assume Score m (e ∗ ) is larger than any other Score m (e) with a margin We minimize over ξ m ≥ 0 and the objective 25/34 5/24/2015

Unlinkable Mention Prediction Predict mention m as an unlinkable mention If the size of E m generated in the Candidate Entities Generation module is equal to zero If Score m (e top ) is smaller than the learned threshold τ 26/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 27/34 5/24/2015

Experiment Setup Data sets CZ data set: newswire data used by Cucerzan [3] TAC-KBP2009 data set: used in the track of Knowledge Base Population (KBP) at the Text Analysis Conference (TAC) 2009 Parameters learning: 10-fold cross validation [3] S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In Proceedings of EMNLP-CoNLL, pages 708–716, /34 5/24/2015

Results over the CZ data set 29/34 5/24/2015

Results over the CZ data set 30/34 5/24/2015

Results on the TAC-KBP2009 data set 31/34 5/24/2015

Results on the TAC-KBP2009 data set 32/34 5/24/2015

Outline Motivation Problem Definition Previous Methods LINDEN Framework Experiments Conclusion 33/34 5/24/2015

Conclusion LINDEN A novel framework to link named entities in text with YAGO Leveraging the rich semantic knowledge derived from the Wikipedia and the taxonomy of YAGO Significantly outperforms the state-of-the-art methods in terms of accuracy 34/34 5/24/2015

Thanks! Q&A 35/34 5/24/2015