Word sense induction using continuous vector space models

Slides:



Advertisements
Similar presentations
Modeling Semantic Relations Expressed by Prepositions Vivek Srikumar and Dan Roth University of Illinois, Urbana-Champaign.
Advertisements

Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu.
A Framework for Named Entity Recognition in the Open Domain Richard Evans Research Group in Computational Linguistics University of Wolverhampton UK
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Printing: This poster is 48” wide by 36” high. It’s designed to be printed on a large-format printer. Customizing the Content: The placeholders in this.
Latent Semantic Analysis Hongning Wang VS model in practice Document and query are represented by term vectors – Terms are not necessarily orthogonal.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Processing of large document collections Part 2 (Text categorization, term selection) Helena Ahonen-Myka Spring 2005.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Watch, Listen and Learn Sonal Gupta, Joohyun Kim, Kristen Grauman and Raymond Mooney -Pratiksha Shah.
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
Sampletalk Technology Presentation Andrew Gleibman
Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Jan 15, 2010 Mamoru Komachi Nara Institute of Science and Technology.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
CS 445/545 Machine Learning Winter, 2012 Course overview: –Instructor Melanie Mitchell –Textbook Machine Learning: An Algorithmic Approach by Stephen Marsland.
2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
Seeking Abbreviations From MEDLINE Jeffrey T. Chang Hinrich Schütze Russ B. Altman Presented by: Bo Han.
Overview of Machine Learning RPI Robotics Lab Spring 2011 Kane Hadley.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Omer Levy Yoav Goldberg Ido Dagan Bar-Ilan University Israel
Exploiting Ontologies for Automatic Image Annotation Munirathnam Srikanth, Joshua Varner, Mitchell Bowden, Dan Moldovan Language Computer Corporation SIGIR.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Using Game Reviews to Recommend Games Michael Meidl, Steven Lytinen DePaul University School of Computing, Chicago IL Kevin Raison Chatsubo Labs, Seattle.
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Second Language Learning From News Websites Word Sense Disambiguation using Word Embeddings.
Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.
Linguistic Regularities in Sparse and Explicit Word Representations Omer LevyYoav Goldberg Bar-Ilan University Israel.
Big Data Analytics with Hadoop and Spark Introduction: Fourth Paradigm Devdatt Dubhashi LAB (Machine Learning. Algorithms, Computational.
Vector Semantics Dense Vectors.
RELATION EXTRACTION, SYMBOLIC SEMANTICS, DISTRIBUTIONAL SEMANTICS Heng Ji Oct13, 2015 Acknowledgement: distributional semantics slides from.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.
Word Sense Disambiguation Algorithms in Hindi
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Graph-based WSD の続き DMLA /7/10 小町守.
Classification with Gene Expression Data
Erasmus University Rotterdam
Kiril Simov1, Alexander Popov1, Iliana Simova2, Petya Osenova1
Vector-Space (Distributional) Lexical Semantics
Big Data Analytics with Hadoop and Spark Introduction: Fourth Paradigm
Distributed Representation of Words, Sentences and Paragraphs
Statistical NLP: Lecture 9
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Vector Representation of Text
Word embeddings (continued)
Presented by: Anurag Paul
Vector Representation of Text
Presentation transcript:

Word sense induction using continuous vector space models Mikael Kågebäck, Fredrik Johansson, Richard Johansson*, Devdatt Dubhashi LAB, Chalmers University of Technology *Språkbanken, University of Gothenburg

Word Sense Induction (WSI) Automatic discovery of word senses. Given a corpus discover senses of a given word, e.g. rock

Applications of WSI Novel sense detection Temporal/Geographical word sense drift Localized word sense lexicons Machine translation Text understanding more…

Context clustering Compute embeddings for word instances in a corpus, based on their context. Cluster the space. Let the centroids represent the senses. Pioneered by Hinrich schütze (1998). Assumption: Distributional hypothesis valid.

Instance-context Embeddings (ICE) Based on word embeddings computed using the skip-gram model. Low rank approximate factorization of a normalized co-occurrence matrix C. Context word embeddings in V and word embeddings in U.

Instance-context Embeddings (ICE) Let the mean skip-gram vector representing the context form the Instance vector but: Apply a triangular window function Weight each context word using Naturally removes stop words Related to the PMI, Goldberg et al (2014).

Plotted instances for ‘paper’ ICE Mean vector Plotted using t-sne

Proposed algorithm Train skip gram model on the corpus. Compute instance representations using ICE. One for each instance of a word in the corpus. Cluster using (nonparametric) k-means. Cluster evaluation from Pham et al. (2005). (Evaluation) disambiguate test data using obtained cluster centroids.

SemEval 2013 task 13 WSI: Identify senses in ukWaC. WSD: Disambiguate test words To one of the induced senses. Evaluation :Compare to the annotated WordNet labels.

Detailed results Semeval 2013 – task 13

Detailed results Semeval 2013 – task 13

Detailed results Semeval 2013 – task 13

Conclusions Using skip-gram word embeddings clearly boost the performance of WSI. Semantic representation for word. Tell which context words are most important.

ICE profile

Evaluation SemEval 2013 - task 13 ukWaC 50 lemmas and 100 instances per lemma. Annotated with a WordNet senses.