Lecture 24: NER & Entity Linking

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.

Search in Source Code Based on Identifying Popular Fragments Eduard Kuric and Mária Bieliková Faculty of Informatics and Information.

Diversified Retrieval as Structured Prediction Redundancy, Diversity, and Interdependent Document Relevance (IDR ’09) SIGIR 2009 Workshop Yisong Yue Cornell.

October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.

Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.

Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.

Relational Entity Linking with Cross Document Coreference Xiao Cheng, Bingling Chen, Rajhans Samdani, Kai-Wei Chang, Zhiye Fei and Dan Roth University.

Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.

Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.

Pattern Recognition and Machine Learning

Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.

CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.

Information Retrieval in Practice

Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.

Relational Inference for Wikification

Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )

Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

Querying Structured Text in an XML Database By Xuemei Luo.

1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.

Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.

Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.

Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

Date: 2013/10/23 Author: Salvatore Oriando, Francesco Pizzolon, Gabriele Tolomei Source: WWW’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang SEED:A Framework.

A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.

GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011

哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.

Of 24 lecture 11: ontology – mediation, merging & aligning.

PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.

Cross-Lingual Named Entity Recognition via Wikification

Automatically Labeled Data Generation for Large Scale Event Extraction

Lecture 7: Constrained Conditional Models

Graphcut Textures:Image and Video Synthesis Using Graph Cuts

Concept Grounding to Multiple Knowledge Bases via Indirect Supervision

Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.

Cross-lingual Knowledge Linking Across Wiki Knowledge Bases

Lecture 24: Relation Extraction

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Statistical NLP: Lecture 13

Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.

GLOW- Global and Local Algorithms for Disambiguation to Wikipedia

X Ambiguity & Variability The Challenge The Wikifier Solution

Roberto Battiti, Mauro Brunato

Lecture 23: More on Word Embeddings

Presentation 王睿.

Information Retrieval

Relational Inference for Wikification

CSc4730/6730 Scientific Visualization

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Citation-based Extraction of Core Contents from Biomedical Articles

Introduction to Information Retrieval

Lecture 8 Information Retrieval Introduction

Probabilistic Databases

Introducing the GSBPM Steven Vale UNECE

CIS-700 Spring 2019 Commonsense Reasoning

Entity Linking Survey

Dan Roth Department of Computer Science

Topic: Semantic Text Mining

Presentation transcript:

Lecture 24: NER & Entity Linking Kai-Wei Chang CS @ University of Virginia kw@kwchang.net Couse webpage: http://kwchang.net/teaching/NLP16 CS6501-NLP

Organizing knowledge It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Slides are adapted from Dan Roth

Cross-document co-reference resolution It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Reference resolution: (disambiguation to Wikipedia) It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

The “Reference” Collection has Structure It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Is_a Is_a Used_In Released Succeeded

Analysis of Information Networks It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Wikipedia as a knowledge resource …. Is_a Is_a Used_In Released Succeeded

Wikification: The Reference Problem Cycles of Knowledge: Grounding for/using Knowledge Wikification: The Reference Problem Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Blumenthal (D) is a candidate for the U.S. Senate seat now held by Christopher Dodd (D), and he has held a commanding lead in the race since he entered it. But the Times report has the potential to fundamentally reshape the contest in the Nutmeg State. Let’s start with what is wikification Read wiki titles slower

Challenging Dealing with Ambiguity of Natural Language Mentions of entities and concepts could have multiple meanings Dealing with Variability of Natural Language A given concept could be expressed in many ways Wikification addresses these two issues in a specific way: The Reference Problem What is meant by this concept? (WSD + Grounding) More than just co-reference (within and across documents) . We expect this idea to inspire a completely new philosophy for fact extraction which will be essential for the creation of a high-quality StructNet. Incorporate arbitrary global features and soft constraints; Features are automatically learned selectively Save some human knowledge engineering efforts Capture inter-dependency knowledge on the entire structure Efficient linear-time beam search by joint learning and decoding No need to model joint/conditional distribution with a large number of variables Extract all facts jointly in a unified framework Local predictions can be mutually improved Avoid error compounding problem

Wikification: Subtasks Wikification and Entity Linking requires addressing several sub-tasks: Identifying Target Mentions Mentions in the input text that should be Wikified Identifying Candidate Titles Candidate Wikipedia titles that could correspond to each mention Candidate Title Ranking Rank the candidate titles for a given mention NIL Detection and Clustering Identify mentions that do not correspond to a Wikipedia title Entity Linking: cluster NIL mentions that represent the same entity. . We expect this idea to inspire a completely new philosophy for fact extraction which will be essential for the creation of a high-quality StructNet. Incorporate arbitrary global features and soft constraints; Features are automatically learned selectively Save some human knowledge engineering efforts Capture inter-dependency knowledge on the entire structure Efficient linear-time beam search by joint learning and decoding No need to model joint/conditional distribution with a large number of variables Extract all facts jointly in a unified framework Local predictions can be mutually improved Avoid error compounding problem

High-level Algorithmic Approach. Input: A text document d; Output: a set of pairs (mi ,ti) mi are mentions in d; tj(mi ) are corresponding Wikipedia titles, or NIL. (1) Identify mentions mi in d (2) Local Inference For each mi in d: Identify a set of relevant titles T(mi ) Rank titles ti ∈ T(mi ) [E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )] occurrences in the Wikipedia graph] (3) Global Inference For each document d: Consider all mi ∈ d; and all ti ∈ T(mi ) Re-rank titles ti ∈ T(mi ) [E.g., if m, m’ are related by virtue of being in d, their corresponding titles t, t’ may also be related] Don’t focus on “our contribution” only – focus on describing an algorithmic approach (which is impressive and good)

Local approach A text Document Identified mentions Wikipedia Articles Local score of matching the mention to the title (decomposed by mi) Γ is a solution to the problem A set of pairs (m,t) m: a mention in the document t: the matched Wikipedia Title Changed—change in related slides.

Global Approach: Using Additional Structure Text Document(s)—News, Blogs,… Wikipedia Articles Adding a “global” term to evaluate how good the structure of the solution is. Use the local solutions Γ’ (each mention considered independently. Evaluate the structure based on pair-wise coherence scores Ψ(ti,tj) Choose those that satisfy document coherence conditions. Add a cartoon of edge cover to justify that the overall problem is hard.