Entity Linking Survey 2015.10.22.

Slides:



Advertisements
Similar presentations
Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Textual Relations Task Definition Annotate input text with disambiguated Wikipedia titles: Motivation Current state-of-the-art Wikifiers, using purely.
DSPIN: Detecting Automatically Spun Content on the Web Qing Zhang, David Y. Wang, Geoffrey M. Voelker University of California, San Diego 1.
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
Language-Independent Set Expansion of Named Entities using the Web Richard C. Wang & William W. Cohen Language Technologies Institute Carnegie Mellon University.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Overview of Search Engines
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Information Retrieval in Practice
Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Design Challenges and Misconceptions in Named Entity Recognition Lev Ratinov and Dan Roth The Named entity recognition problem: identify people, locations,
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
GLOSSARY COMPILATION Alex Kotov (akotov2) Hanna Zhong (hzhong) Hoa Nguyen (hnguyen4) Zhenyu Yang (zyang2)
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
Open Information Extraction using Wikipedia
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
Chapter 6: Information Retrieval and Web Search
1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.
Intelligent Web Topics Search Using Early Detection and Data Analysis by Yixin Yang Presented by Yixin Yang (Advisor Dr. C.C. Lee) Presented by Yixin Yang.
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Answer Mining by Combining Extraction Techniques with Abductive Reasoning Sanda Harabagiu, Dan Moldovan, Christine Clark, Mitchell Bowden, Jown Williams.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Information Retrieval in Practice
Automatically Labeled Data Generation for Large Scale Event Extraction
Measuring Monolinguality
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Clustering of Web pages
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
CSE5544 Final Project Interactive Visualization Tool(s) for IEEE Vis Publication Exploration and Analysis Team Name: Publication Miner Team Members:
Statistical Learning Methods for Natural Language Processing on the Internet 徐丹云.
Two Discourse Driven Language Models for Semantics
Generating Natural Answers by Incorporating Copying and Retrieving Mechanisms in Sequence-to-Sequence Learning Shizhu He, Cao liu, Kang Liu and Jun Zhao.
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
X Ambiguity & Variability The Challenge The Wikifier Solution
Summarizing Entities: A Survey Report
Lecture 24: NER & Entity Linking
Web IR: Recent Trends; Future of Web Search
Social Knowledge Mining
Applying Key Phrase Extraction to aid Invalidity Search
Presentation 王睿.
Relational Inference for Wikification
ISWC 2013 Entity Recommendations in Web Search
Family History Technology Workshop
Introduction to Information Retrieval
Chapter 5: Information Retrieval and Web Search
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
CS246: Information Retrieval
Summarization for entity annotation Contextual summary
Text Annotation: DBpedia Spotlight
The Winograd Schema Challenge Hector J. Levesque AAAI, 2011
Text Analytics Solutions with Azure Machine Learning
presented by Thomas L. Packer
Topic: Semantic Text Mining
Presentation transcript:

Entity Linking Survey 2015.10.22

Content Motivation Definition Evaluation A General Architecture Identify mentions Generating Candidate Titles Local & Global Inference NIL Detection & Clustering Challenges

Motivation Dealing with Ambiguity of Natural Language Mentions of entities and concepts could have multiple meanings Dealing with Variability of Natural Language A given concept could be expressed in many ways It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Definition The task consists of: A definition of the mentions (concepts, entities) to highlight A mention: a phrase used to refer to something in the world Named entity (person, organization, place), object, substance, event, philosophy, mental state, rule … Determining the target encyclopedic resource (KB) Defining what to point to in the KB (title) NIL Clustering cluster relevant mentions representing a single concept or entity

Evaluation Mention Selection Linking accuracy NIL clustering Are the mentions chosen for linking correct (R/P) Linking accuracy Evaluate quality of links chosen per-mention Ranking Accuracy (including NIL) NIL clustering Entity Linking: evaluate out-of-KB clustering (co-reference) B-cubed, CEAFm

A General Architecture Input: A text document d; Output: a set of pairs (mi ,ti) mi are mentions in d; tj(mi ) are corresponding Wikipedia titles, or NIL id. (1) Identify mentions mi in d (2) Identify a set of relevant titles T(mi ) (3) Local Inference For each mi in d: Rank titles ti ∈ T(mi ) [E.g., consider local statistics of edges [(mi ,ti) , (mi ,*), and (*, ti )] occurrences in the Wikipedia graph] (4) Global Inference For each document d: Consider all mi ∈ d; and all ti ∈ T(mi ) Re-rank titles ti ∈ T(mi ) [E.g., if m, m’ are related by virtue of being in d, their corresponding titles t, t’ may also be related] (5) NIL Clustering decide NIL Links and cluster relevant mentions

Identify mentions every n-gram is a potential concept mention Intractable for larger documents Surface form based filtering Shallow parsing (especially NP chunks), NP’s augmented with surrounding tokens, capitalized words Remove: single characters, “stop words”, punctuation, etc. Statistics based filtering Name tagging (Finkel et al., 2005; Ratinov and Roth, 2009; Li et al., 2012) Mention extraction (Florian et al., 2006, Li and Ji, 2014) Key phrase extraction, independence tests (Mihalcea and Csomai, 2007), common word removal (Mendes et al., 2012; ) Mention Expansion Optional post-processing Co-reference resolution, Known Aliases, Acronym,

Generating Candidate Titles Named Dictionary Based The main approach Dictionary D is constructed by leveraging features from Wikipedia or data mining Entity pages, Redirect pages, Disambiguation pages, Bold phrases from the first paragraphs, hyperlinks in Wikipedia articles Looking up methods: Exact matching, Super or substring of the mention, String similarity Search Engine based Effective at identifying some very difficult mapping Morph (Chris Christie, Jabba the Hutt), Acronym (CCP, Communist Party of China)

Local & Global Inference Local score of matching the mention to the title (decomposed by mi) Document-wide Conceptual Coherence

Features Corpus Recall Tweets 52.71% ACE 86.85% Wiki 98.59% No features are superior than others over all kinds of data sets “popularity” is not robust across domains

NIL Detection & Clustering Binary classification NIL Clustering Agglomerative Clustering B-cubed + F-Measure 85.4%-85.8% Partitioning Clustering B-cubed + F-Measure 85.5%-86.9% Co-reference method

Challenges Popularity Bias Better named entity recognition Give me all spacecrafts that flew to Mars. Bruno Mars Mars Better named entity recognition meaning representation Relational Information in the text Commonsense Knowledge

Better named entity recognition They performed Kashmir, written by Page and Plant. Page played unusual chords on his Gibson. They performed [GPE Kashmir] , written by [WORK_OF_ART Page and Plant] . Page played unusual chords on his [PERSON Gibson]

Better meaning representation Which soccer players were born on Malta? They performed Kashmir, written by Page and Plant. Page played unusual chords on his Gibson.

Commonsense Knowledge 2008-07-26 During talks in Geneva attended by William J. Burns Iran refused to respond to Solana’s offers. William J. Burns (1861-1932) William Joseph Burns (1956- )

Thank you!