Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi.

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
WWW 2014 Seoul, April 8 th SNOW 2014 Data Challenge Two-level message clustering for topic detection in Twitter Georgios Petkos, Symeon Papadopoulos, Yiannis.
Drawing Samples in “Observational Studies” Sample vs. the Population How to Draw a Random Sample What Determines the “Margin of Error” of a Poll?
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Search Engines and Information Retrieval
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
Using Encyclopedic Knowledge for Named Entity Disambiguation Razvan Bunescu Machine Learning Group Department of Computer Sciences University of Texas.
Time-dependent Similarity Measure of Queries Using Historical Click- through Data Qiankun Zhao*, Steven C. H. Hoi*, Tie-Yan Liu, et al. Presented by: Tie-Yan.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Information Retrieval
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Temporal Event Map Construction For Event Search Qing Li Department of Computer Science City University of Hong Kong.
The Problem Finding information about people in huge text collections or on-line repositories on the Web is a common activity Person names, however, are.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Search Engines and Information Retrieval Chapter 1.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
Survey of Semantic Annotation Platforms
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
« Pruning Policies for Two-Tiered Inverted Index with Correctness Guarantee » Proceedings of the 30th annual international ACM SIGIR, Amsterdam 2007) A.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
11 A Hybrid Phish Detection Approach by Identity Discovery and Keywords Retrieval Reporter: 林佳宜 /10/17.
Minor Thesis A scalable schema matching framework for relational databases Student: Ahmed Saimon Adam ID: Award: MSc (Computer & Information.
A Language Independent Method for Question Classification COLING 2004.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
IAT Text ______________________________________________________________________________________ SCHOOL OF INTERACTIVE ARTS + TECHNOLOGY [SIAT]
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
Faculty Faculty Richard Fikes Edward Feigenbaum (Director) (Emeritus) (Director) (Emeritus) Knowledge Systems Laboratory Stanford University “In the knowledge.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
TWinner : Understanding News Queries with Geo-content using Twitter Satyen Abrol,Latifur Khan University of Texas at Dallas,Department of Computer Science.
Predicting User Interests from Contextual Information R. W. White, P. Bailey, L. Chen Microsoft (SIGIR 2009) Presenter : Jae-won Lee.
Using Semantic Relations to Improve Information Retrieval
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
Automatically Labeled Data Generation for Large Scale Event Extraction
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Reading Report on Hybrid Question Answering System
USA Presidential Elections
Information Retrieval
A Machine Learning Approach to Coreference Resolution of Noun Phrases
A Machine Learning Approach to Coreference Resolution of Noun Phrases
Text Annotation: DBpedia Spotlight
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
The Electoral College 5/26/2019 9:00 AM
Entity Linking Survey
Presentation transcript:

Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi Minh City University of Technology, Vietnam) Semantic Web Group (VN-KIM) Faculty of Computer Science & Engineering Ho Chi Minh City University of Technology BK TP.HCM *

Outline Introduction Wikipedia Algorithm Experimental results Concluding remarks

Introduction: Named Entities Named Entities (NE) are considered: people, organizations, locations, date, time, money, measures, percentage, etc. Example “Ms. Washington's candidacy is being championed by several powerful lawmakers including her boss, Chairman John Dingell (D., Mich.) of the House Energy and Commerce Committee.”

Introduction: Problem Different NEs may have the same name. “John McCarthy has been a staple of the Ultimate Fighting Championship since its second event on March 11, 1994.” John McCarthy  John McCarthy (referee) “John McCarthy, professor of computer science at Stanford University, who developed LISP.” John McCarthy  John McCarthy (computer scientist) “John McCarthy, Britain's longest-held hostage in Lebanon, has been set free after more than five years in captivity.” John McCarthy  John McCarthy (journalist)

Introduction: Motivation Web searches Queries about Named Entities (NEs) constitute a significant portion of popular web queries (Bunescu et al., EACL 2006). ~ 30% of search engine queries include person names (R. Guha et al., WWW 2004) Named entity disambiguation can lead to improve effectiveness of search results on the web for popular named entities. Web-based Information Extraction Identifying exactly NEs in web pages can improve accuracy in IE tasks (e.g. extracting relationships between NEs). Question & Answering Identifying exactly NEs in questions can improve accuracy of answers

Introduction: NE Disambiguation Mapping entity names (in a text) to actual entities in a KB of discourse (e.g. Wikipedia). An ambiguous entity names are out of the KB An ambiguous entity names occur in the KB, but they refer to named entities out of the KB An ambiguous entity names refer to two or more than named entities in the KB

Introduction: NE disambiguation But much like the first presidential debate held two weeks ago in Oxford, Mississippi, a draw for Obama would be considered a win.

Introduction: NE disambiguation Gamsakhurdia is seen as a national hero by those who mourn him Zviad Gamsakhurdia, Georgia's first president after independence from the USSR, has been buried in the capital Tbilisi 14 years after his death.

NE disambiguation John McCarthy, 'great man' of computer science, wins major award

Introduction: Approach Disambiguation based on context Co-occurring entity names Co-occurring NE identifiers Tokens in a window context centered at a name in consideration Disambiguation based on a KB We view that instances in the KB have two in formation Attributes Relations We represent those instances by their attributes and relations

Introduction: Approach All keywords in the window text centred around the ambiguous name The whole text is extended with page titles of the previously identified NEs enclosed Entity page titles Redirecting page titles Category labels Hyperlink labels Text containing ambiguous names Wikipedia article Heuristics +TF-IDF vector similarity

Wikipedia Wikipedia is a free encyclopedia written by a collaborative effort of global community of more than 150,000 volunteers These volunteers have contributed more than 11 million articles in 265 languages More than 275 million people visit Wikipedia site every month 2,697,848 articles in English version (visiting Jan 14 th, 2009) 2,697,848English

Wikipedia – Pages &Titles Page Titles (ID)

Wikipedia – Pages &Titles Disambiguation text

Wikipedia – Category Category

Wikipedia – Redirect pages Redirect page titles

Wikipedia – Hyperlinks Hyperlinks

Wikipedia – Hyperlinks Hyperlinks

Algorithm Hybrid statistical and rule-based incremental algorithm: Rule-based NE disambiguation Utilizing Wikipedia disambiguation texts E.g. “… Rockville, Maryland …”, disambiguation text Maryland helps identifying Rockville is an area in Maryland

Algorithm Rule-based NE disambiguation (cont.) Exploiting coreference relationship between referents: Propagation of the identified NE, if any, along its coreference chain E.g. Extension of the whole text with the Wikipedia entity page titles of the identified NEs On Thursday morning, Sen. Barack Obama warned supporters not to get "cocky," while a few hours later McCain pledged to Pennsylvania voters he would erase Obama's lead by Election Day.

Algorithm After Rule-based stage, for remaining ambiguous names, matching the whole text vector with Wikipedia candidate entity pages All keywords in the window text centred around the ambiguous name The whole text is extended with page titles of the previously identified NEs enclosed Entity page titles Redirecting page titles Category labels Hyperlink labels The extracted context surrounding ambiguous names Wikipedia article TF-IDF vector similarity

Algorithm

Experimental results Experiments: 10 news from CNN on Travel, Entertainment, World, World Business, and Americas

Experimental results D1 obtained after running GATE D2 obtained after GATE’s errors corrected

Experimental results We measure accuracy as the total number of right assignments NE (in text)/Wiki NE divided by the total number of assignments

Experimental results Results:

Concluding remarks The proposed method is a hybrid and incremental process that utilizes previously identified NEs and related terms co- occurring with ambiguous names in a text for entity disambiguation Work under investigation: Disambiguating ambiguous cases when ambiguous names occur in a KB, but they refers to named entities out of the KB.

Thanks for your attention VN-KIM Group Contact author: or