Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.

Slides:



Advertisements
Similar presentations
Understanding Tables on the Web Jingjing Wang. Problem to Solve A wealth of information in the World Wide Web Not easy to access or process by machine.
Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Large-Scale Entity-Based Online Social Network Profile Linkage.
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
Engeniy Gabrilovich and Shaul Markovitch American Association for Artificial Intelligence 2006 Prepared by Qi Li.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
1 Entity Ranking Using Wikipedia as a Pivot (CIKM 10’) Rianne Kaptein, Pavel Serdyukov, Arjen de Vries, Jaap Kamps 2010/12/14 Yu-wen,Hsu.
1 Question Answering in Biomedicine Student: Andreea Tutos Id: Supervisor: Diego Molla.
Using Encyclopedic Knowledge for Named Entity Disambiguation Razvan Bunescu Machine Learning Group Department of Computer Sciences University of Texas.
Unsupervised Information Extraction from Unstructured, Ungrammatical Data Sources on the World Wide Web Mathew Michelson and Craig A. Knoblock.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Toward Semantic Web Information Extraction B. Popov, A. Kiryakov, D. Manov, A. Kirilov, D. Ognyanoff, M. Goranov Presenter: Yihong Ding.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Text Analytics And Text Mining Best of Text and Data
Some studies on Vietnamese multi-document summarization and semantic relation extraction Laboratory of Data Mining & Knowledge Science 9/4/20151 Laboratory.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
1 The BT Digital Library A case study in intelligent content management Paul Warren
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Attribute Extraction and Scoring: A Probabilistic Approach Taesung Lee, Zhongyuan Wang, Haixun Wang, Seung-won Hwang Microsoft Research Asia Speaker: Bo.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Survey of Semantic Annotation Platforms
ONTOLOGY LEARNING AND POPULATION FROM FROM TEXT Ch8 Population.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
1 Named Entity Recognition based on three different machine learning techniques Zornitsa Kozareva JRC Workshop September 27, 2005.
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Péter Schönhofen – Ad Hoc Hungarian → English – CLEF Workshop 20 Sep 2007 Performing Cross-Language Retrieval with Wikipedia Participation report for Ad.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
1 Learning Sub-structures of Document Semantic Graphs for Document Summarization 1 Jure Leskovec, 1 Marko Grobelnik, 2 Natasa Milic-Frayling 1 Jozef Stefan.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Event-Centric Summary Generation Lucy Vanderwende, Michele Banko and Arul Menezes One Microsoft Way, WA, USA DUC 2004.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Algorithmic Detection of Semantic Similarity WWW 2005.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Bo Lin Kevin Dela Rosa Rushin Shah.  As part of our research, we are working on a cross- document co-reference resolution system  Co-reference Resolution:
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi.
Link Distribution on Wikipedia [0407]KwangHee Park.
Semantic Web Course - Semantic Annotation
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Concept-based Short Text Classification and Ranking
A Personalized Search Engine Based on Web Snippet Hierarchical Clustering Paolo Ferragina, Antonio Gulli Presented by Bin Tan.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Ontology Evaluation Outline Motivation Evaluation Criteria Evaluation Measures Evaluation Approaches.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Exploiting Wikipedia as External Knowledge for Document Clustering
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Social Knowledge Mining
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Intent-Aware Semantic Query Annotation
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Text Annotation: DBpedia Spotlight
Enriching Taxonomies With Functional Domain Knowledge
Presentation transcript:

Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh City University of Technology, Vietnam International IEEE Conference - RIVF’08

2 Outline Introduction Background Approach Evaluation Conclusion

3 Introduction No explicit semantic information about data and objects are presented in most of the Web pages. Semantic Web aim at solving this problem by making semantic metadata available in web page content –Ex: the entity “John McCarthy” pointing to the homepage of the inventor of Lisp programming –Entity disambiguation

4 Introduction- Entity disambiguation Entity disambiguation is the process of identifying when different references correspond to the same real world entity (Jorge Cardoso and Amit Sheth) Our work aim at detecting named entities in a text and linking them to a given ontology

5 Introduction - What are Named Entities? Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc. Example “Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”

6 Introduction – Basic problem in NE Many NEs share the same name –Ambiguity of NE types: John Smith (company vs. person) –May (person vs. month) –Washington (person vs. location) –etc. –Ambiguity of referent (e.g. Paris may be the capital of French, or a small town in Texas )

7 Introduction - Our contribution are two-fold Utilizing ontological concepts, and properties of instances in a specific KB, to automatically generate a corpus of labeled training data Exploiting Wikipedia to enrich the training data with new and informative features. Exploring a range of features extracted from texts, a KB, and Wikipedia

8 Background - Ontology Ontology schema defines taxonomy of classes and properties (relations and attributes) Knowledge base contains semantic descriptions, including attributes and relations, of named entities in real world

9 Background - Wikipedia Each article defines an entity or a concept Four sources of information –Title –Redirect titles –Categories –Hyperlinks Outlinks vs. Inlinks

10 Background - Wikipedia

11 Approach Expoiting terms (i.e. base noun phrases) and named entities coocurring with ambiguous name for disambiguation Casting the problem as ranking problem –Using TFIDF to calculate similarity and choose the candidate with the highest score

12 Approach Constructing corpus –Utilizing classes and properties to generate a snippet for each instance in an ontology –Feature generation for enriching representation of those instances Analyzing a text for disambiguation and identification of NEs occurring therein

13 Approach - Construct corpus

14 Approach- Construct corpus

15 Approach – Disambiguation process For each ambiguous name –Looking up candidates –Extracting base noun phrases in the same sentence an in the headline –Extracting named entities in the whole text –Using TFIDF to rank and choose the candidate with the highest score

16 Approach – An example

17 Evaluation Using KIM Ontology 140 texts of news articles in some news agencies Focusing on four names: John McCarthy, John Wiliams, Georgia, and Columbia Measure accuracy as the total number of correctly assignment NEs (in text)/ontology instances divided by the total number of assignment

18 Evaluation

19 Conclusion Our approach is quite natural and similar to the way humans do, relying on co-occurring NEs and terms to resolve other ambiguous entities in a given context. Currently Wikipedia editions are available for approximately 200 languages, so our method can be used to build NE disambiguation systems for a large number of languages The features from Wikipedia, and NEs in the whole text are meaningful evidence for disambiguation In the future: detecting NEs out of the ontology, and investigating other similarity metrics

20 Thanks for your attention !