Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.

Slides:



Advertisements
Similar presentations
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Advertisements

Linking Entities in #Microposts ROMIL BANSAL, SANDEEP PANEM, PRIYA RADHAKRISHNAN, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY,
Engeniy Gabrilovich and Shaul Markovitch American Association for Artificial Intelligence 2006 Prepared by Qi Li.
Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities Masumi Shirakawa 1, Kotaro Nakayama 2, Takahiro Hara 1, Shojiro.
Encyclopaedic Annotation of Text.  Entity level difficulty  All the entities in a document may not be in reader’s knowledge space  Lexical difficulty.
Wei Shen †, Jianyong Wang †, Ping Luo ‡, Min Wang ‡ † Tsinghua University, Beijing, China ‡ HP Labs China, Beijing, China WWW 2012 Presented by Tom Chao.
Search and Retrieval: More on Term Weighting and Document Ranking Prof. Marti Hearst SIMS 202, Lecture 22.
Using Encyclopedic Knowledge for Named Entity Disambiguation Razvan Bunescu Machine Learning Group Department of Computer Sciences University of Texas.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
Query session guided multi- document summarization THESIS PRESENTATION BY TAL BAUMEL ADVISOR: PROF. MICHAEL ELHADAD.
Disambiguation of References to Individuals Levon Lloyd (State University of New York) Varun Bhagwan, Daniel Gruhl (IBM Research Center) Varun Bhagwan,
A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia Presenter: Ziqi Zhang OAK Research Group, Department.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Automated Essay Grading Resources: Introduction to Information Retrieval, Manning, Raghavan, Schutze (Chapter 06 and 18) Automated Essay Scoring with e-rater.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
C OLLECTIVE ANNOTATION OF WIKIPEDIA ENTITIES IN WEB TEXT - Presented by Avinash S Bharadwaj ( )
Citation Recommendation 1 Web Technology Laboratory Ferdowsi University of Mashhad.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Funded by: European Commission – 6th Framework Project Reference: IST WP 2: Learning Web-service Domain Ontologies Miha Grčar Jožef Stefan.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
When Experts Agree: Using Non-Affiliated Experts To Rank Popular Topics Meital Aizen.
Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Improving Suffix Tree Clustering Base cluster ranking s(B) = |B| * f(|P|) |B| is the number of documents in base cluster B |P| is the number of words in.
The More the Better? Assessing the Influence of Wikipedia’s Growth on Semantic Relatedness Measures Torsten Zesch and Iryna Gurevych Ubiquitous Knowledge.
Contextual Ranking of Keywords Using Click Data Utku Irmak, Vadim von Brzeski, Reiner Kraft Yahoo! Inc ICDE 09’ Datamining session Summarized.
Query Suggestion Naama Kraus Slides are based on the papers: Baeza-Yates, Hurtado, Mendoza, Improving search engines by query clustering Boldi, Bonchi,
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Page 1 March 2011 Local and Global Algorithms for Disambiguation to Wikipedia Lev Ratinov 1, Dan Roth 1, Doug Downey 2, Mike Anderson 3 1 University of.
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Understanding User’s Query Intent with Wikipedia G 여 승 후.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Information Retrieval at NLC Jianfeng Gao NLC Group, Microsoft Research China.
Algorithmic Detection of Semantic Similarity WWW 2005.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Using Webcast Text for Semantic Event Detection in Broadcast Sports Video IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 10, NO. 7, NOVEMBER 2008.
Identifying Entity Relationships in News Reports 27. January 2010 Martin Jačala, Jozef Tvarožek Faculty of Informatics and Information Technology Slovak.
Evgeniy Gabrilovich and Shaul Markovitch
Summarizing Encyclopedic Term Descriptions on the Web from Coling 2004 Atsushi Fujii and Tetsuya Ishikawa Graduate School of Library, Information and Media.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Named Entity Disambiguation on an Ontology Enriched by Wikipedia Hien Thanh Nguyen 1, Tru Hoang Cao 2 1 Ton Duc Thang University, Vietnam 2 Ho Chi Minh.
Presented By- Shahina Ferdous, Student ID – , Spring 2010.
A Knowledge-Based Search Engine Powered by Wikipedia David Milne, Ian H. Witten, David M. Nichols (CIKM 2007)
Information Retrieval and Web Search Link analysis Instructor: Rada Mihalcea (Note: This slide set was adapted from an IR course taught by Prof. Chris.
Knowledge based Question Answering System Anurag Gautam Harshit Maheshwari.
LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge Date : 2013/03/25 Resource : WWW 2012 Advisor : Dr. Jia-Ling Koh Speaker : Wei.
Named Entity Disambiguation: A Hybrid Statistical and Rule-based Incremental Approach Hien Nguyen * (Ton Duc Thang University, Vietnam) Tru Cao (Ho Chi.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Learning Analogies and Semantic Relations Nov William Cohen.
Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo.
Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Exploiting Wikipedia as External Knowledge for Document Clustering
GLOW- Global and Local Algorithms for Disambiguation to Wikipedia
Lecture 24: NER & Entity Linking
Summarization for entity annotation Contextual summary
Text Annotation: DBpedia Spotlight
Entity Linking Survey
Latent Semantic Analysis
Presentation transcript:

Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University of Technology Bratislava, Slovakia

Problem Given an input text, detect and decide on correct meaning of named entites The jaguar is a big cat, a feline in the Panthera genus, and is the only Panthera species found in the Americas. Jaguar cars today are designed in Jaguar Land Rover's engineering centres at the Whitley plant in Coventry. Jaguar (animal) Jaguar (animal) Jaguar Cars Jaguar Cars 2Named Entity Disambiguation Based on Explicit Semantics

Current solutions Charles-Miller hypothesis – Similar words occur in semantically similar contexts – Hence, we should be able to distinguish the correct sense. Dictionary approach, sense clustering Semantic similarity Use of BigData – Wikipedia, Open Directory Project 3Named Entity Disambiguation Based on Explicit Semantics

General scheme In detecting the most probable sense – we rank the possible senses, and – pick the best Named Entity Disambiguation Based on Explicit Semantics4 Entity boundary detection Document transformation Candidate meanings selection Similarity computation Semantic space Disambiguation dictionary

Entity boundary detection Stanford named entity recognizer – Conditional random fields – F1 score 88 to 93% (CoNLL 2003 dataset) – Used only for detection of surface forms Named Entity Disambiguation Based on Explicit Semantics5 Jenny Rose Finkel, Trond Grenager, and Christopher Manning Incorporating Non- local Information into Information Extraction Systems by Gibbs Sampling. Proceedings of the 43nd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp

Disambiguation dictionary Build from explicit semantics in Wikipedia – Disambiguation and redirect pages – hyperlinks in articles’ text Candidate meanings for each surface form, e.g. Named Entity Disambiguation Based on Explicit Semantics6 Jaguar = {Jaguar, Jaguar Cars, Mac OS X 10.2,...} Big Blue = {IBM, Pacific Ocean, New York Giants,...} Jordan = {Jordan, Jordan River, Michael Jordan,...}

Vector space Concept space = Wikipedia’s articles – Explicit (human-defined concepts) We construct term-concept matrix – terms = rows, concepts = columns – tf-idf values For input text, the “semantic” vector is – Sum of term vectors for all terms in the input Named Entity Disambiguation Based on Explicit Semantics7

Example Named Entity Disambiguation Based on Explicit Semantics8 Concepts: Mammal, Britain, Technology, Racing, Wilderness, Amazonia, Sports Car, Leopard Jaguar cars today are designed in Jaguar Land Rover's engineering centres at the Whitley plant in Coventry. (30, 0, 0, 0, 430, 320, 0, 100) The jaguar is a big cat, a feline in the Panthera genus, and is the only Panthera species found in the Americas. (0, 20, 473, 845, 0, 0, 304, 0)

Ranking Candidate meaning description = wiki page – Description is transformed to the vector space Ranking algorithm: Given a surface form, entity meanings are ranked according to the cosine similarity between the input text and each of the candidate meaning descriptions Extension: add entity occurrence context as input to similarity computation Named Entity Disambiguation Based on Explicit Semantics9

Baseline Dictionary contains surface forms with context – extracted from Wikipedia using letter capitalization Context = sliding window of 50 words around each occurrence of hypertext link Article names are the entity labels – Description is the average of contexts in Wikipedia Disambiguation result is the best similarity match of the input text to the entity descriptions Named Entity Disambiguation Based on Explicit Semantics10 Bunescu, R., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proceedings of EACL. pp (2006)

Datasets Three datasets – manually disambiguated – Development, Evaluation1, Evaluation2 Each contains 20 “random” newswire articles containing ambiguous entities – Columbia, Texas, Jaguar, etc. Discarded refs to entities not in Wikipedia – Each entity has avg. 18 meanings – 78% entities have more than two meanings Named Entity Disambiguation Based on Explicit Semantics11

Evaluation results Named Entity Disambiguation Based on Explicit Semantics12

Evaluation discussion Baseline method fails to capture the Wikipedia content due to the way it is written – Hyperlinks are usually early in the article Latent semantic analysis (250 dims) – Speeds up the process – Decreased accuracy Named Entity Disambiguation Based on Explicit Semantics13

Summary Named entity disambiguation based on explicit semantics Explicit semantics within Wikipedia – Link descriptions – Redirect and disambiguation pages 7 to 8% improvement – 86 to 92% accuracy (up from 76 to 85%) Named Entity Disambiguation Based on Explicit Semantics14