Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
October 2014 Paul Kantor’s Fusion Fest Workshop Making Sense of Unstructured Data Dan Roth Department of Computer Science University of Illinois at Urbana-Champaign.
Linking Named Entity in Tweets with Knowledge Base via User Interest Modeling Date : 2014/01/22 Author : Wei Shen, Jianyong Wang, Ping Luo, Min Wang Source.
Learning to Cluster Web Search Results SIGIR 04. ABSTRACT Organizing Web search results into clusters facilitates users quick browsing through search.
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 Chapter 11 Developing Custom Help. 11 Chapter Objectives Use HTML to create customized Help topics for an application Use the HTML Help Workshop to.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Multilingual Word Sense Disambiguation using Wikipedia Bharath Dandala (University of North Texas) Rada Mihalcea (University of North Texas) Razvan Bunescu.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
© 2008 by PACT PACT Scorer Training Pilot.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
A Two Tier Framework for Context-Aware Service Organization & Discovery Wei Zhang 1, Jian Su 2, Bin Chen 2,WentingWang 2, Zhiqiang Toh 2, Yanchuan Sim.
Open Information Extraction using Wikipedia
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Learning to Link with Wikipedia David Milne and Ian H. Witten Department of Computer Science, University of Waikato CIKM 2008 (Best Paper Award) Presented.
LOD for the Rest of Us Tim Finin, Anupam Joshi, Varish Mulwad and Lushan Han University of Maryland, Baltimore County 15 March 2012
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Evgeniy Gabrilovich and Shaul Markovitch
Page 1 INARC Report Dan Roth, UIUC March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov & Dan Roth Department of Computer.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
UWMS Data Mining Workshop Content Analysis: Automated Summarizing Prof. Marti Hearst SIMS 202, Lecture 16.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
TWC Illuminate Knowledge Elements in Geoscience Literature Xiaogang (Marshall) Ma, Jin Guang Zheng, Han Wang, Peter Fox Tetherless World Constellation.
Single Document Key phrase Extraction Using Neighborhood Knowledge.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Concept Grounding to Multiple Knowledge Bases via Indirect Supervision
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
X Ambiguity & Variability The Challenge The Wikifier Solution
Summarizing Entities: A Survey Report
Lecture 24: NER & Entity Linking
Statistical NLP: Lecture 9
Discovering Emerging Entities with Ambiguous Names
Introduction Task: extracting relational facts from text
Effective Entity Recognition and Typing by Relation Phrase-Based Clustering
Text Annotation: DBpedia Spotlight
Entity Linking Survey
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Encyclopaedic Annotation of Text

 Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty  Arises due to presence of difficult word with respect to reader’s level  Collaborate  work together, Biennially  every two years  Syntactic difficulty  Arises due to use of complex syntactic constructs wrt reader level

 Annotate text with encyclopaedic references  Find important concepts and entities  Link identified entities and concepts to respective knowledge sources  Entity Linking (EL) problem  is the task of linking name mentions in text with their referent entities in a knowledge base  Entity Disambiguation (ED) problem  An entity may have many referents

 Word Sense Disambiguation  Predicting the sense of a word in a sentence, when the word may have multiple senses  Mapping of word to sense  EL/ED  Models encyclopaedia where important words, phrases or entities in a page are linked to respective informative pages

 Given a text generate wikipedia-like annotation automatically Resource: Wikify! Linking Documents to Encyclopedic Knowledge, Rada Mihalcea and Andras Csomai

 Collaborative encyclopaedia  Wikipedia article  defines and describes an entity or an event  consists of a hypertext document with hyperlinks to other pages within or outside Wikipedia  uniquely referenced by an identifier ▪ counter for drinks  bar (counter)  Hyperlink ▪ Unique identifier + anchor text ▪ “Henry Barnard, [[United States|American]] [[educationalist]], was born in [[Hartford, Connecticut]]”  Disambiguation page ▪ consist of links to articles defining the different meanings of the entity

 Keyword extraction follows Wikipedia manual  Links to articles that provide deeper understanding of topics like technical terms, names, places etc.  Avoid linking terms unrelated to main topic and having no article to explain  Avoid too many links

 Supervised or unsupervised  Candidate keywords should be limited to those that have a valid corresponding Wikipedia article  keyword vocabulary that contains only the Wikipedia article titles ▪ Augment the list with different morphological forms ▪ dissecting or dissections can be linked to the same article dissection.

 Unsupervised keyword extraction from document  Candidate extraction ▪ From input document extract all possible n-grams that are also present in controlled vocabulary  Keyword ranking ▪ Assign score reflecting likelihood of a candidate to be a valuable keyphrase

 Links can be treated as sense annotations  Wiki data has larger coverage of sense annotations of entities (nouns)  Presence of huge number of named entities  Multi-word expressions (e.g., mother church)

 Knowledge driven methods  Lesk algorithm ▪ most likely meaning for a word in a given context based on a measure of contextual overlap between the dictionary definitions of the ambiguous word and the context ▪ Modelling Wikification as WSD ▪ Dictionary definition  wikipedia page ▪ Context  paragraph in which the word occurs  Data driven methods

Document mentions Local approaches disambiguate each mention in a document separately utilize clues such as the textual similarity between the document and each candidate disambiguation’s Wikipedia page Candidate labels mention-to-label compatibility

Michael Jeffrey Jordan (born February 17, 1963), also known by his initials, MJ, is an American former professional basketball player. Jordan joined the NBA's Chicago Bulls in Michael Jordan fuelled the success of Nike's Air Jordan sneakers. He also starred in the 1996 feature film Space Jam as himself.

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Resource: Local and Global Algorithms for Disambiguation to Wikipedia, Ratinov el al.

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II. Used_In Is_a Succeeded Released

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid Chicago VIII was one of the early 70s-era Chicago albums to catch my ear, along with Chicago II.

Document mentions mention-to-label compatibility Inter-label topical coherence Collective Entity Linking Candidate labels

Text Document(s)—News, Blogs,… Wikipedia Articles

Text Document(s)—News, Blogs,… Wikipedia Articles many-to-one matching in a bipartite graph

 Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles

 Γ is a solution to the problem  A set of pairs (m,t)  m: a mention in the document  t: the matched Wikipedia Title Text Document(s)—News, Blogs,… Wikipedia Articles Local score of matching the mention to the title

A “global” term – evaluating how good the structure of the solution is Text Document(s)—News, Blogs,… Wikipedia Articles

Text Document(s)—News, Blogs,… Wikipedia Articles

Text Document(s)—News, Blogs,… Wikipedia Articles

Augment Mention List Construct Disambiguation Candidates Ranker Linker

 Text(t)  TF-IDF summary of Wikipedia title t  Context(t)  TF-IDF summary of the context within which t is hyperlinked in Wikipedia  Text(d)  TF-IDF summary of d containing m  Context(m)  TF-IDF summary of context window of m  Local features  cosine-sim(Text(t),Text(m))  cosine-sim(Text(t),Context(m))  cosine-sim(Context(t),Text(m))  cosine-sim(Context(t),Context(m))

 Wikipedia relatedness measures  Normalized Google Distance  Pointwise Mutual Information