Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA.

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Pattern Recognition and Machine Learning

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

Deep Learning in NLP Word representation and how to use it for Parsing

Module 1 Dictionary skills Part 1

A Versatile Depalletizer of Boxes Based on Range Imagery Dimitrios Katsoulas*, Lothar Bergen*, Lambis Tassakos** *University of Freiburg **Inos Automation-software.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.

Reduced Support Vector Machine

Support Vector Machines

The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.

Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Yang-de Chen Tutorial: word2vec Yang-de Chen

Longbiao Kang, Baotian Hu, Xiangping Wu, Qingcai Chen, and Yan He Intelligent Computing Research Center, School of Computer Science and Technology, Harbin.

Information theory, fitness and sampling semantics colin johnson / university of kent john woodward / university of stirling.

Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.

This week: overview on pattern recognition (related to machine learning)

1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,

CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.

Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.

Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.

Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.

Ranking in Information Retrieval Systems Prepared by: Mariam John CSE /23/2006.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

N-best Reranking by Multitask Learning Kevin Duh Katsuhito Sudoh Hajime Tsukada Hideki Isozaki Masaaki Nagata NTT Communication Science Laboratories 2-4.

Efficient Estimation of Word Representations in Vector Space

Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.

Pleasing in appearance.

Friday, October 15 Objective: Students will be able to present information they have found on an engineering career. Bell Ringer: Quick Write: What is.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

CS 376b Introduction to Computer Vision 04 / 28 / 2008 Instructor: Michael Eckmann.

CS Inductive Bias1 Inductive Bias: How to generalize on novel data.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

1 Latent Concepts and the Number Orthogonal Factors in Latent Semantic Analysis Georges Dupret

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Speech Lab, ECE, State University of New York at Binghamton  Classification accuracies of neural network (left) and MXL (right) classifiers with various.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

click to start Example: A LINEAR TRANSFORMATION.

SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.

Dynamic Background Learning through Deep Auto-encoder Networks Pei Xu 1, Mao Ye 1, Xue Li 2, Qihe Liu 1, Yi Yang 2 and Jian Ding 3 1.University of Electronic.

Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

TDT 2004 Unsupervised and Supervised Tracking Hema Raghavan UMASS-Amherst at TDT 2004.

Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.

Efficient Estimation of Word Representations in Vector Space By Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Google Inc., Mountain View, CA. Published.

This research is supported by NIH grant U54-GM114838, a grant from the Allen Institute for Artificial Intelligence (allenai.org), and Contract HR

Medical Semantic Similarity with a Neural Language Model Dongfang Xu School of Information Using Skip-gram Model for word embedding.

Korean version of GloVe Applying GloVe & word2vec model to Korean corpus speaker : 양희정 date :

Approaches to Machine Translation

Bag-of-Visual-Words Based Feature Extraction

Joint Training for Pivot-based Neural Machine Translation

Efficient Estimation of Word Representation in Vector Space

Machine Learning Today: Reading: Maria Florina Balcan

Approaches to Machine Translation

Socialized Word Embeddings

Large scale multilingual and multimodal integration

Word embeddings Text processing with current NNs requires encoding into vectors. One-hot encoding: N words encoded by length N vectors. A word gets a.

Dennis Zhao,1 Dragomir Radev PhD1 LILY Lab

Word embeddings (continued)

Modeling IDS using hybrid intelligent systems

CS249: Neural Language Model

Presentation transcript:

Accurate Cross-lingual Projection between Count-based Word Vectors by Exploiting Translatable Context Pairs SHONOSUKE ISHIWATARI NOBUHIRO KAJI NAOKI YOSHINAGA MASASHI TOYODA MASARU KITSUREGAWA

Past Work : Mikolov et al. (2013b) 1.Suggested an alternative to dictionary and phrase tables based machine translation system. 2.Based on the observation that various concepts have similar geometric arrangements in vector space irrespective of their language. 3.First, monolingual models are built of the languages and then a small bilingual dictionary is used to learn a linear transformation from one source to target language. 4.The word closest to the computed representation is then served as the translation.

Proposed Idea 1.The objective is to improve the transformation when using count based word vectors by using the correspondence between dimensions of word vectors. 2.Exploit the fact that dimensions of word vectors are related to context words which could be used to be gain insight into the cross-lingual correspondence between dimensions of word vectors. 3.Also try to gain some insight by using surface forms of the word. This is useful when dealing with languages that share vocabulary. (e.g., “cocktail” in English and “c´octel” in Spanish)

Proposed Method 1.Start with the already discussed optimization problem 2.Add a regularizer to this problem to prevent over-fitting. 3.Add the terms that exploit the discussed theory

Proposed Method(continued) 1.D train = The existing training data of word pairs is reused for finding correspondence between the dimensions. 2.Evaluate word pairs based on the following distance function and add those which have distance below a certain threshold to D sim. 3.The last two terms in the objective function force the learning process to strengthen the term w jk when k-th dimension in the source language corresponds to j-th dimension in the target language. 4. β train and β sim are parameters representing the strength of the new terms.

Comparison Methods 1.Baseline : This method learns the transformation matrix on count based word vectors using the base equation. 2.CBOW: This method learns the transformation matrix on word vectors obtained by a neural network and uses the base equation. 3.Direct Mapping: Training data was used to “map each dimension in a word vector in the source language to the corresponding dimension in a word vector in the target language”

Evaluation Procedure 1.Take a word vector from source language. 2.Run it through the method in question and obtain a word vector in target language. 3.Find the top-n (n=1,5) similar (based on cosine similarity) word vectors in the target language. 4.Check if the real vector appears among the chosen top-n.

Results