Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Iterative Bilingual Lexicon Extraction from Comparable Corpora Using Topic Model and Context Based Methods Chenhui Chu, Toshiaki Nakazawa, Sadao Kurohashi.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Language Knowledge Engineering Lab. Kyoto University NTCIR-10 PatentMT, Japan, Jun , 2013 Description of KYOTO EBMT System in PatentMT at NTCIR-10.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Accurate Parallel Fragment Extraction from Quasi-Comparable Corpora using Alignment Model and Translation Lexicon Chenhui Chu, Toshiaki Nakazawa, Sadao.
Joining Adjectives て form for い adjectives and な adjectives.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
2006/12/081 Large Scale Crawling the Web for Parallel Texts Chikayama Taura lab. M1 Dai Saito.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.
2007/4/201 Extracting Parallel Texts from Massive Web Documents Chikayama Taura lab. M2 Dai Saito.
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent Translation Task Kyoto University Toshiaki Nakazawa Sadao Kurohashi.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Example-based Machine Translation based on Deeper NLP 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan,
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Example-based Machine Translation Pursuing Fully Structural NLP Sadao Kurohashi, Toshiaki Nakazawa, Kauffmann Alexis, Daisuke Kawahara University of Tokyo.
い 日本の どこに 行きたい です か。 Where do you want to go in Japan?
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A Simulator for the LWA Masaya Kuniyoshi (UNM). Outline 1.Station Beam Model 2.Asymmetry Station Beam 3.Station Beam Error 4.Summary.
RELATIVE CLAUSES Adjectival Clauses/Modifiers. RELATIVE CLAUSES A relative clause is the part of a sentence which describes a noun Eg. The cake (which)
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Statistical Machine Translation Part II: Word Alignments and EM
Suggestions for Class Projects
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Improved Word Alignments Using the Web as a Corpus
Statistical Machine Translation Papers from COLING 2004
An Empirical Comparison of Domain Adaptation Methods for
Kyoto University Participation to WAT 2016
Presentation transcript:

Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University

Outline Background & Related Work Model Overview Model Training Experiments Conclusion 2

Background Alignment quality of GIZA++ is quite insufficient for distant language pairs – Wide range reordering, many-to-many alignment – En-Fr < 10% AER vs. En-Ja, Zh-Ja ≒ 20% AER En: He is my brother. Zh: 他 是 我 哥哥 。 Fr: Il est mon frère. Ja: 彼 は 私 の 兄 です 。 AER: Alignment Error Rate

Tree-based Alignment Model 1.Choose a number of components 2.Generate each of treelet pairs independently 3.Combine treelets so as to create sentences 4 He is my brother 兄 です 彼 は 私 の C1C1 C1C1 C2C2 C2C2 C3C3 C3C3 C4C4 C4C4 [Nakazawa+, 2011]

PrecisionRecallAER GIZA++ & grow Nakazawa+, ・・・ Sure Possible System Output

Related Work 1.Insert pseudo nodes for function words heuristically [Isozaki+, 2010] 2.Delete alignments for some function words after GIZA++ [Wu+, 2011] Ad-hoc and need language-pair dependent rules for the insertion or deletion 6

Motivation Two types of function words Type 1: counterparts exist in the other side less problematic, can be handled as content words Type 2: counterparts do not exist in the other side problematic, we call them “unique function words” Proposed – bilingually generate treelet pairs for content words and Type 1 function words – monolingually derive Type 2, unique function words 7

Case Study of Alignment ゆっくり * 走った * ran* slowly* 彼*彼* は He* the park* in 公園 * で NULL Content words are easy to alignSome function words are also easy to alignSome function words are incorrectly alignedDerive function words to avoid alignment errors (topic marker)

Important Notice We do not explicitly distinguish between content words and function words in the model – All the words are treated in the same manner The proposed model can naturally derive unique function words

Model Overview 10

Generative Story of the Model 11 1.Choose a number of components 2.Generate each of core treelet pairs bilingually 3.Derive treelets monolingually は 体重 * 治療 * 変化 * may weight* change* The not ない treatment* medical* the C1C1 C1C1 C2C2 C2C2 C3C3 C3C3 C4C4 C4C4 C5C5 C5C5 より し に かも しれ ない (topic marker) (light verb) (by)

Generative Story of the Model 12 1.Choose a number of components 2.Generate each of core treelet pairs bilingually 3.Derive treelets monolingually 4.Combine treelets so as to create sentences 体重 * は 治療 * に より 変化 * し ない medical* treatment* may not change* weight* The the かも しれ ない (topic marker) (light verb) (by)

Model in Formal Generative story of joint probability model 1.Choose a number of components 2.Generate each of core treelet pairs bilingually 3.Derive treelets monolingually 4.Combine treelets so as to create sentences Formally Step 1Step 4 Step 2,3 13 See proceedings or Nakazawa+, 2011

Bilingual Generation and Monolingual Derivation Core treelet pair is generated from unknown distribution ~ Dirichlet process Each core treelet and derives sets of treelets and monolingually – The number of derivations: geometric distribution – Derivation probability: estimated from large monolingual corpora 14

Derivation Probability Derivation is conditioned on the word which is connected to – e.g. Not only the lexicalized probability, but also POS-based probability – e.g. 15 medical treatment The example

Derivation Probability Derivation probability is estimated from large monolingual corpus denotes the frequency of connected to Example of counting the derivations 16 medical treatment may not change The 治療 * に より

Natural Distinction The number of derivation patterns from function words becomes larger Probability of each pattern from function words becomes much smaller Derivations from function words rarely occurs – Derivations of function words from content words are prefferred Content WordsFunction Words often surrounded byfunction wordscontent words vocabulary sizevery largesmall

Example of Probability Calculation 体重 * は 治療 * に より 変化 * し ない medical* treatment* may not change* weight* θ T (〈 change, 変化〉) × φ 変化 (に より) × φ 変化 (し) The θ T (〈 not, なかった〉) θ T (〈 weight, 体重〉) × φ weight ( the ) × φ 体重 (は) (by) (topic marker) (light verb) the かも しれ ない θ T (〈 may, かも しれ ない〉) * Ignoring which parameterizes the number of derivations θ T (〈 medical treatment, 治療〉) × φ medical treatment ( The )

Model Training 19

Gibbs Sampling Initialization – Create heuristic phrase alignment like ‘grow-diag- final-and’ on dependency trees using results from GIZA++ Refine the model by changing the alignment – Operators: SWAP, TOGGLE, EXPAND, MERGE, BOUNDARY, TRANSFORM 20

SWAP Operator ・・・ NULL ・・・ SWAP-1 SWAP-2 Swap the counterparts of two alignments 21

TOGGLE Operator Remove existing alignment or add new alignment NULL 22

EXPAND Operator Expand or contract an aligned treelet NULL EXPAND (as core) EXPAND (as derivation) 23

BOUNDARY Operator Change the boundary between two aligned treelets

TRANSFORM Operator Change the attributes of a word between core and derivation

MERGE Operator Include another alignment as derivations or exclude derivations as another alignment

Alignment & Translation Experiment 27

Alignment Experiment Training: En-Ja paper abst. 1M sentence pairs Testing: 500 sentence pairs with Sure and Possible alignments Maximum size of a derivation treelet: 3 Measure: Precision, Recall, Alignment Error Rate (AER) Base analyzer: nlparser (En), JUMAN+KNP (Ja) Monolingual corpora: Web 550M sentences 28

Experimental Results Alignment ModelPrecisionRecallAER GIZA++ & grow Nakazawa+, Proposed Proposed, evaluating core only

Improved Example Sure Possible Core Derivation Proposed

Improved Example ・・・ Sure Possible Core Derivation Proposed

Error Analysis ・・・ noise of parallel sentenceparsing error

Translation Experiment Training: Identical to the alignment experiment Direction: En -> Ja and Ja -> En Tuning: 500 sentences Testing: 500 sentences Decoder: Joshua (hierarchical PSMT) with default settings Measure: BLEU 33

Experimental Results Alignment ModelAER BLEU En -> Ja BLEU Ja -> En GIZA++ & grow Nakazawa+, Proposed †18.46†‡ Proposed, using core only †: significant difference from GIZA++ (p < 0.05) ‡: significant difference from Nakazawa+, 2011 (p < 0.05)

Conclusion Bilingual generation and monolingual derivation – Handle unique function words properly – Good estimation of bilingual generation e.g. θ( ) instead of θ( ) Derivations are acquired from large monolingual corpora Reduce alignment errors of function words and improve translation quality 35

Future Work Word-class-based derivation – e.g. “patient who …” Handling light verb construction – e.g. “introduction is given” Conduct experiments on other language pairs Tree-based decoding

Acknowledgement This work was partially supported by Yahoo Japan Corporation Acknowledgement This work was partially supported by Yahoo Japan Corporation Thank you for your attention!!