+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon.

Slides:

Advertisements

Similar presentations

A Word at a Time Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky, Eugene Agichteiny, Evgeniy Gabrilovichz, Shaul Markovitch.

Advertisements

2 Information Retrieval System IR System Query String Document corpus Ranked Documents 1. Doc1 2. Doc2 3. Doc3.

Deep Learning in NLP Word representation and how to use it for Parsing

Module 1 Dictionary skills Part 1

Cross-Language Name Search Raghavendra UdupaMicrosoft Research India Mitesh KhapraIIT Bombay NAACL-HLT 2010 June 3, 2010 Improving the Multilingual User.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 4 March 30, 2005

The Conceptual Coupling Metrics for Object-Oriented Systems

Indexing by Latent Semantic Analysis Scot Deerwester, Susan Dumais,George Furnas,Thomas Landauer, and Richard Harshman Presented by: Ashraf Khalil.

1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Word sense induction using continuous vector space models

Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball

An Information Theoretic Approach to Bilingual Word Clustering Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.

Yang-de Chen Tutorial: word2vec Yang-de Chen

Yuliya Morozova Institute for Informatics Problems of the Russian Academy of Sciences, Moscow.

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Multilinguality to the Rescue Manaal Faruqui & Chris Dyer Language Technologies Institute SCS, CMU.

Utilising software to enhance your research Eamonn Hynes 5 th November, 2012.

Modeling Language Acquisition with Neural Networks A preliminary research plan Steve R. Howell.

Machine translation Context-based approach Lucia Otoyo.

The use of machine translation tools for cross-lingual text-mining Blaz Fortuna Jozef Stefan Institute, Ljubljana John Shawe-Taylor Southampton University.

Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.

CS365 Course Project Billion Word Imputation Guide: Prof. Amitabha Mukherjee Group 20: Aayush Mudgal [12008] Shruti Bhargava [13671]

Information Retrieval by means of Vector Space Model of Document Representation and Cascade Neural Networks Igor Mokriš, Lenka Skovajsová Institute of.

A Word at a Time: Computing Word Relatedness using Temporal Semantic Analysis Kira Radinsky (Technion) Eugene Agichtein (Emory) Evgeniy Gabrilovich (Yahoo!

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

 Copyright 2011 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute Enabling Networked Knowledge.

Multiple parallel hidden layers and other improvements to recurrent neural network language modeling ICASSP 2013 Diamantino Caseiro, Andrej Ljolje AT&T.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.

Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,

An Architecture for Emergent Semantics Sven Herschel, Ralf Heese, and Jens Bleiholder Humboldt-Universität zu Berlin/ Hasso-Plattner-Institut.

Efficient Estimation of Word Representations in Vector Space

CLEF2003 Forum/ August 2003 / Trondheim / page 1 Report on CLEF-2003 ML4 experiments Extracting multilingual resources from corpora N. Cancedda, H. Dejean,

Gene Clustering by Latent Semantic Indexing of MEDLINE Abstracts Ramin Homayouni, Kevin Heinrich, Lai Wei, and Michael W. Berry University of Tennessee.

Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.

+ Various Improvements in Vector Space Word Representations Manaal Faruqui Sujay Jauhar, Jesse Dodge Chris Dyer, Noah Smith.

Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.

Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date: From Word Representations:... ACL2010, From Frequency... JAIR 2010 Representing Word... Psychological.

Dependency-Based Word Embeddings Omer LevyYoav Goldberg Bar-Ilan University Israel.

A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

Vector Semantics Dense Vectors.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

DeepWalk: Online Learning of Social Representations

This research is supported by NIH grant U54-GM114838, a grant from the Allen Institute for Artificial Intelligence (allenai.org), and Contract HR

Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.

Cross-lingual Models of Word Embeddings: An Empirical Comparison

Comparison with other Models Exploring Predictive Architectures

Wu et. al., arXiv - sept 2016 Presenter: Lütfi Kerem Şenel

Neural Machine Translation by Jointly Learning to Align and Translate

Vector Semantics Introduction.

Translation of Unknown Words in Low Resource Languages

Joint Training for Pivot-based Neural Machine Translation

Vector-Space (Distributional) Lexical Semantics

Efficient Estimation of Word Representation in Vector Space

Word2Vec CS246 Junghoo “John” Cho.

Word Embeddings with Limited Memory

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Introduction to Natural Language Processing

Large scale multilingual and multimodal integration

Vector Representation of Text

Neural Machine Translation by Jointly Learning to Align and Translate

Presentation transcript:

+ Improving Vector Space Word Representations Using Multilingual Correlation Manaal Faruqui and Chris Dyer Language Technologies Institute Carnegie Mellon University

+ Distributional Semantics “You shall know a word by the company it keeps” (Harris 1954; Firth, 1957) …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… …flame is the visible portion of a fire… …take place whereby fires can sustain their own heat…

+ Translational Semantics What other Information? (Bannard & Callison-Burch, 2005) तीन सौ से अधिक लोगों को बैठाने वाला वायुयान … That plane can seat more than 300 people Russian airplanes are huge रूसी वायुयान बहुत बड़े हैं Multilingual Information! plane ≅ airplane

+ Outline Distributional Semantics Monolingual context Translational Semantics Multilingual context Better Semantic Representations Using Distributional + Translational semantics

+ Word Vector Representations How to encode such co-occurrences? daynight…cold sleep0102 winter3350 … the10129 contexts words

+ Word Vector Representation Latent Semantic Analysis (Deerwester et al., 1990) Singular Value Decomposition words context words

+ Multilingual Information English German French Spanish dragon Drache dragon dragón Problem ? = Append

+ Multilingual Information Vector Size Increases Idiosyncratic Info. What if word is OOV ? Disadvantages of Vector Concatenation ✗ ?

+ Multilingual Information …I will take what is mine with fire and blood… …the end battle would be between fire and ice… …My dragons are large and can breathe fire now… So, what can we do?... Das Ende der Schlacht würde zwischen Feuer und Eis gesehen ist Feuer eine Oxidationsreaktion mit Das Licht des Feuers ist eine physikalische Erscheinung… Two Views: Canonical Correlation Analysis !

+ Canonical Correlation Analysis (CCA) Project two sets of vectors (equal cardinality) in a space where they are maximally correlated ΩΘ Convex Optimization Problem with Exact Solution ! ΩΘ ≅ CCA

+ Canonical Correlation Analysis (CCA) k = min(r( Ω ), r( Θ )) W V X Y × × n2n2 d1d1 k n1n1 d2d2 d2d2 k d1d1 X”X” Y”Y” k k n2n2 n1n1 X ” and Y ” are now maximally correlated ! W, V = CCA( Ω, Θ )

+ Canonical Correlation Analysis (CCA) Vector Size Increases, Doesn’t increase Problems Addressed? Idiosyncratic Information, Lets you choose! What if word is OOV?, Projection vectors for everyone!

+ Canonical Correlation Analysis (CCA) The vocabularies cant be of equal size ! Ok, but equal cardinality sets Ω & Θ ? Get word alignments from a parallel corpus Preserve only words in the original vocabulary For every word in English, select the best foreign word

+ Experimental Setup LSA Word Vector Learning Monolingua l Data EnglishGermanFrenchSpanish News CorpusWMT-2011 WMT WMT-2011 Tokens360,000,000290,000,000263,000,000164,000,000 Types180,000294,000137,000145,000 Tokenizer and Lowercasing: WMT scripts

+ Experimental Setup LSA Word Vector Learning Parallel Data De-EnFr-EnEs-En News Comm + Europarl WMT Tokens128,000,000138,000,000134,000,000 Word pairs37,00038,000 Word Alignment Tool: fast_align (Dyer et al, 2013)

+ Experimental Setup LSA Word Vector Learning Corpus Preprocessing...hello… …hello… …hello… …hello… …hello… Context : 23.45, 21 st, , 0.5e10 NUM anchfgugsjh, wekjfbg, bhguyq UNK

+ Experimental Setup Word Similarity Evaluation WS-353 (Finkelstein et al, 2001) WS-353-SIM (Agirre et al, 2009) WS-353-REL (Agirre et al, 2009) RG-65 (Rubenstein and Goodenough, 1965) MC-30 (Miller and Charles, 1991) MTurk-287 (Radinsky et al, 2011) Word Relation Evaluation Semantic Relations (Mikolov et al, 2013) Syntactic Relations (Mikolov et al, 2013) Evaluation Benchmarks

+ Experimental Setup Monolingual Vector Length: 80 Multilingual Vector Length: ? Multilingual Vector Learning The length in projected space can be chosen: ‘k’ Choose the best value of ‘k’ for WS-353 k ε [0.1, 0.2, …, 1.0]

+ Experimental Setup Multilingual Vector Learning Performance on WS-353; k = 0.6 Spearman’s correlation Dimensions

+ Experimental Setup Multilingual Vector Learning Spearman’s correlation

+ Experimental Setup Multilingual Vector Learning Accuracy

+ Experimental Setup RNNLM (Mikolov et al, 2011) Predict next word given the history Neural language model Recurrent hidden layer connections Skip-Gram, word2vec (Mikolov et al, 2013) Predict context given the word Removes hidden layer Vocabulary represented in Huffman coding Multilingual Vectors: Neural Networks

+ Experimental Setup Multilingual Vector Learning RNNLM Skip-Gram

+ Experimental Setup Multilingual Vectors: Scaling Spearman’s correlation on WS-353

+ Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Monolingual Setting t-SNE tool (van der Maaten and Hinton, 2008)

+ Experimental Setup Multilingual Vectors: Qualitative Analysis Antonyms and Synonyms of “Beautiful”: Multilingual Setting t-SNE tool (van der Maaten and Hinton, 2008)

+ Conclusion CCA: Easy to use tool in MATLAB Take vectors from two languages and improve them. Multilingual Information is Important Even if the problems are inherently monolingual. More Effective for Distributional Vectors Semantics generalizes better than Syntax. Vectors available at:

+ Related Work Document representation Vinokourov et al, 2002, Platt et al, 2010 Synonymy and Paraphrasing Bannard and Burch, 2005, Ganitkevitch et al, 2013 Bilingual lexicon induction Haghighi et al, 2008 Vulic and Moens, 2013 Bilingual word vectors Klementiev et al 2012 Zou et al, 2013 Translation Models Kalbrenner & Blunsom, 2013 Compositional Semantics Hermann & Blunsom, 2014

+ Thanks! Visit us at ACL-demo: wordvectors.org