Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

A probabilistic model for retrospective news event detection
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Thomas Schoenemann University of Düsseldorf, Germany ACL 2013, Sofia, Bulgaria Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment TexPoint.
GIZA ++ A review of how to run GIZA++ By: Bridget McInnes
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin.
Re-ranking for NP-Chunking: Maximum-Entropy Framework By: Mona Vajihollahi.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
Statistical Machine Translation Part IV - Assignments and Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
Stephan Vogel - Machine Translation1 Machine Translation Word Alignment Stephan Vogel Spring Semester 2011.
Natural Language Processing Expectation Maximization.
Solving Quadratic Equations – The Discriminant The Discriminant is the expression found under the radical symbol in the quadratic formula. Discriminant.
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Statistical Machine Translation Part III – Phrase-based SMT Alexander Fraser CIS, LMU München WSD and MT.
Retrieval Models for Question and Answer Archives Xiaobing Xue, Jiwoon Jeon, W. Bruce Croft Computer Science Department University of Massachusetts, Google,
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Holt Algebra The Quadratic Formula and the Discriminant Warm Up (Add to HW & Pass Back Papers) Evaluate for x =–2, y = 3, and z = – x 2 2.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Addressing the Rare Word Problem in Neural Machine Translation
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Table of Contents Solving Quadratic Equations – The Discriminant The Discriminant is the expression found under the radical symbol in the quadratic formula.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Machine Translation Statistical Machine Translation Part VI – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München.
Statistical Machine Translation Part II: Word Alignments and EM
GIZA ++ A review of how to run GIZA++ By: Bridget McInnes
Alexander Fraser CIS, LMU München Machine Translation
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Alex Fraser Institute for Natural Language Processing
Unit 7 Day 4 the Quadratic Formula.
Statistical Machine Translation
Machine Translation and MT tools: Giza++ and Moses
Section 9.5 Day 1 Solving Quadratic Equations by using the Quadratic Formula Algebra 1.
Statistical Machine Translation Papers from COLING 2004
Machine Translation and MT tools: Giza++ and Moses
A Path-based Transfer Model for Machine Translation
Statistical NLP Spring 2011
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
SUCCESS CRITERIA: UNDERSTAND BASIC SENTENCES FOR GRADE E.
Presentation transcript:

Getting the structure right for word alignment: LEAF Alexander Fraser and Daniel Marcu Presenter Qin Gao

Problem IBM Models have 1-N assumption Solutions A sophisticated generative story Generative Estimation of parameters Additional Solution Decompose the model components Semi-supervised training Result Significant Improvement on BLEU (AR-EN) Quick summary

The generative story Source word Head words Links to zero or more non-head words (same side) Non-head words Linked from one head word (same side) Deleted words No link in source side Target words Head words Links to zero or more non-head words (same side) Non-head words Linked from one head word (same side) Spurious words No link in target side

Minimal translational correspondence

The generative story ABC

1a. Condition: Source word ABC

1b. Determine source word class ABC

2a. Condition on source classes C(A)C(B)C(C)

2b. Determine links between head word and non-head words C(A)C(B)C(C)

3a. Depends on the source head word ABC

3b. Determine the target head word ABC X

4a. Conditioned on source head word and cept size ABC X 2

4b. Determine the target cept size ABC X 2 ?

5a. Depend on the existing sentence length ABC X 2 ?

5b. Determine the number of spurious target words ABC X 2 ??

6a. Depend on the target word ABC X?? XYZXYZ

6b. Determine the spurious word ABC X?Z XYZXYZ

7a. Depends on target head word’s class and source word ABC C(X)?Z

7b. Determine the non-head word it linked to ABC C(X)YZ

8a. Depends on the classes of source/target head words C(A)BC C(X)YZ 123

2 8b. Determine the position of target head word C(A)BC C(X) YZ 13

2 8c. Depends on the target word class C(A)BC C(X) YZ 13

32 8d. Determine the position of non-headwords C(A)BC C(X) Y Z 1

Fill the vacant position uniformly C(A)BC C(X) Y Z

132 (10) The real alignment C(A)BC C(X) Y Z

Unsupervised parameter estimation  Bootstrap using HMM alignments in two directions  Using the intersection to determine head words  Using 1-N alignment to determine target cepts  Using M-1 alignment to determine source cepts  Could be infeasible

Training: Similar to model 3/4/5  From some alignment (not sure how they get it), apply one of the seven operators to get new alignments  Move French non-head word to new head,  move English non-head word to new head,  swap heads of two French non-head words,  swap heads of two English non-head words,  swap English head word links of two French head words,  link English word to French word making new head words,  unlink English and French head words.  All the alignments that can be generated by one of the operators above, are called neighbors of the alignment

Training  If we have better alignment in the neighborhood, update the current alignment  Continue until no better alignment can be found  Collect count from the last neighborhood

Semi-supervised training  Decompose the components in the large formula treat them as features in log-linear model  And other features  Used EMD algorithm (EM-Discriminative) method

Experiment  First, a very weird operation, they fully link alignments from ALL systems and then compare the performance

Training/Test Set

Experiments  French/English: Phrase based  Arabic/English: Hierarchical (Chiang 2005)  Baseline: GIZA++ Model 4, Union  Baseline Discriminative: Only using Model 4 components as features

Conclusion(Mine)  The new structural features are useful in discriminative training  No evidence to support the generative model is superior over model 4

Unclear points  Are F scores “biased?”  No BLEU score given for LEAF unsupervised  They used features in addition to LEAF features, where is the contribution comes from?