Syntax for MT EECS 767 Feb. 1, 2006. Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner - Johns Hopkins Univ. a b A B events of misinform wrongly report to-John.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 7 Diana Trandab ă ț Academic year
Contextual Bitext-derived Paraphrases in Automatic MT Evaluation HLT-NAACL, 09 June 2006 Karolina Owczarzak, Declan Groves, Josef Van Genabith, Andy Way.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Natural Language Processing Expectation Maximization.
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Active Learning for Statistical Phrase-based Machine Translation Gholamreza Haffari Joint work with: Maxim Roy, Anoop Sarkar Simon Fraser University NAACL.
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 1-4 Shauna Eggers.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Supertagging CMSC Natural Language Processing January 31, 2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Statistical NLP: Lecture 13
Issues in Arabic MT Alex Fraser USC/ISI 9/22/2018 Issues in Arabic MT.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
A Path-based Transfer Model for Machine Translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Syntax for MT EECS 767 Feb. 1, 2006

Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple features  Syntax-based features

The IBM Models Word reordering  Single words, not groups  Conditioned on position of words Null-word insertion  Uniform across position

The Alignment Template Model Word Reordering  Phrases can be reordered in any way, but tend to stay in same order as source.  Reordering within phrases defined by templates Word Translations  Must match up = No null

Implied Assumptions Word Order  Similar to source sentence Translation  Near 1-1 correspondence

What goes wrong? We see many errors in machine translation when we only look at the word level  Missing content words MT: Condemns US interference in its internal affairs. Human: Ukraine condemns US interference in its internal affairs.  Verb phrase MT: Indonesia that oppose the presence of foreign troops. Human: Indonesia reiterated its opposition to foreign military presence. WS 2003 Syntax for Statistical Machine Translation Final Presentation

What goes wrong cont.  Wrong dependencies MT: …, particularly those who cheat the audience the players. Human: …, particularly those players who cheat the audience.  Missing articles MT: …, he is fully able to activate team. Human: …, he is fully able to activate the team. WS 2003 Syntax for Statistical Machine Translation Final Presentation

What goes wrong cont.  Word salad: the world arena on top of the u. s. sampla competitors, and since mid – july has not appeared in sports field, the wounds heal go back to the situation is very good, less than a half hours in the same score to eliminate 6:2 in light of the south african athletes to the second round. WS 2003 Syntax for Statistical Machine Translation Final Presentation

How can we improve? Relying on language model to produce more ‘accurate’ sentences is not enough Many of the problems can be considered ‘syntactic’ Perhaps MT-systems don’t know enough about what is important to people So, include syntax into MT  Build a model around syntax  Include syntax-based features in a model WS 2003 Syntax for Statistical Machine Translation Final Presentation

A New Translation Story You have a sentence and its parse tree The children at each node in the tree are rearranged New nodes may be inserted before or after a child node These new nodes are assigned a translation Each of the leaf lexical nodes is then translated Yamada A Syntax-Based Statistical Translation Model Thesis 2002

A Syntax-based model Assume word order is based on a reordering of source syntax tree. Assume null-generated words happen at syntactical boundaries. (For now) Assume a word translates into a single word. Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Reorder Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Insert Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Translate Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Parameters Reorder (R) – child node reordering  Can take any possible child node reordering  Defines word order in translation sentence  Conditioned on original child node order  Only applies to non-leaf nodes Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Parameters cont. Insertion (N) – placement and translation  Left, right, or none  Defines word to be inserted  Place conditioned on current and parent labels  Word choice is unconditioned Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Parameters cont. Translation (T) – 1 to 1  Conditioned only on source word  Can take on null Translation (T) – N to N  Consider word fertility (for 1-to-N mapping)  Consider phrase translation at each node  Limit size of possible phrases  Mix phrasal w/ word-to-word translation Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Formalization Set of nodes in parse tree Total probability Assume node independence Assume random variables are Independent of one another and only dependent on certain features Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Training (EM) 1. Initialize all probability tables (uniform) 2. Reset all counters 3. For each pair in the training corpus A) Try all possible mappings of N,R, and T B) Update the counts as seen in the mappings 4. Normalize the probability tables with the new counts 5. Repeat 2-4 several times Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Decoding Modify original CFG with new reordering and their probabilities Add in VP->VP X and X -> word rules from N Add lexical rules englishWord->foreignWord Use the noisy-channel approach starting with a translated sentence Proceed through the parse tree using a bottom-up beam search keeping an N-best list of good partial translations for each subtree Yamada&Knight A Decoder for Syntax-based Statistical MT 2002

Decoding cont. Yamada&Knight A Decoder for Syntax-based Statistical MT 2002

Performance (Alignment) Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Performance (Alignment) cont. Counting number of individual alignments Perfect means all alignments in a pair are correct Yamada A Syntax-Based Statistical Translation Model Thesis 2002

Performance cont. Chinese-English BLEU scores Yamada&Knight A Decoder for Syntax-based Statistical MT 2002

Do we need the entire model to be based on syntax? Good performance increase Large computational cost  Many permutations to CFG rules (120K non- lexical) How about trying something else?  Add syntax-based features that look for more specific things

Using Syntax in MT Multiple Features  Formalization  Baseline  Training Syntax-based Features  Shallow  Deep

Multiple Features (log-linear) Calculate probability using a variety of features parameterized by an associated ‘weight’ Find the translated sentence which maximizes the feature function with your foreign sentence JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline System JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline System JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline Features Alignment template feature  Uses simple counts JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline Features Word selection feature  Uses lexicon probability estimated by relative frequency Additional feature capturing word reordering within phrasal alignments JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline Features Phrase alignment feature  Measure of deviation from monotone alignment JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Baseline Features Language model feature  Standard backing-off trigram probability Word/Phrase penalty feature  Feature counting number of words in translated sentence  Feature counting number of phrases in translated sentence Alignment lexicon feature  Feature counting the number of time something from a given alignment lexicon is used JHU WS 2003 Syntax for Statistical Machine Translation Final Report

A possible training method Line optimization method JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Use reranking of N-best lists Feature functions do not need to be integrated in dynamic programming search A feature function can arbitrarily condition itself on any part of English/Chinese sentece/parse tree/chunks Provides a simple software architecture Using a fixed set of translations allows feature functions to be a vector of numbers You are limited to improvements you see within the N-best lists WS 2003 Syntax for Statistical Machine Translation Final Presentation

Syntax-based Features Shallow  POS and Chunk Tag counts  Projected POS language model Deep  Tree-to-string  Tree-to-tree  Verb arguments JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Shallow Syntax-Based Features POS and chunk tag count  Low-level syntactic problems with baseline system. Too many articles, commas and singular nouns. Too few pronouns, past tense verbs, and plural nouns.  Reranker can learn balanced distributions of tags from various features  Examples Number of NPs in English Difference in number of NPs between English and Chinese Number of Chinese N tags translated to only non-N tags in English. JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Shallow Syntax-Based Features Projected POS language model  Use word-level alignments to project Chinese POS tags onto the English words Possibly keeping relative position within Chinese phrase Possibly keeping NULLs in POS sequence Possibly using lexicalized NULLs from English word  Use the POS tags to train a language model based on POS N-grams JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Deep Syntax-Based Features Tree to string  Uses the Syntax-based model we saw previously  Reduces computational cost by limiting size of reorderings  Add in a feature for probability as defined by the model and the probability of the viterbi alignment defined by the model JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Deep Syntax-Based Features Tree to Tree  Uses tree transformation functions similar to those in the tree-to-string model  The probability of transforming a source tree into a target tree is modeled as a sequence of steps starting from the root of the target tree down. JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Tree to Tree cont. At each level of the tree: 1. At most one of the current node’s children is grouped with the current node into a single elementary tree with its probability conditioned on the current node and its children. 2. An alignment of the children of the current elementary tree is chosen with its probability conditioned on the current node an the children of child in the elementary tree. This is similar to the reorder operation in the tree-to-string model, but allows for node addition and removal. Leaf-level parameters are ignored when calculating probability of tree-to-tree. JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Verb Arguments Idea: A feature that counts the difference in the number of arguments to the main verb between the Chinese and English sentences Perform a breadth-first search traversal of the dependency trees  Mark the first verb encountered as the main verb  The number of arguments is equal to the number of its children JHU WS 2003 Syntax for Statistical Machine Translation Final Report

Performance Some things helped, some things didn’t Is syntax useful? Necessary?

References K. Yamada and K. Knight A syntax-based statistical translation model. In ACL-01. K. Yamada A Syntax-Based Statistical Translation Model. Ph.D. thesis, University of Southern California. Yamada, Kenji and Kevin Knight A decoder for syntaxbased MT. In Proc. of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, PA. Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. A smorgasbord of features for statistical machine translation. In Proceedings of the Human Language Technology Conference.North American chapter of the Association for Computational Linguistics Annual Meeting, pages , MIT Press. Franz Josef Och, Daniel Gildea, Sanjeev Khudanpur, Anoop Sarkar, Kenji Yamada, Alex Fraser, Shankar Kumar, Libin Shen, David Smith, Katherine Eng, Viren Jain, Zhen Jin, and Dragomir Radev. Final Report of the Johns Hopkins 2003 summer workshop on Syntax for Statistical Machine Translation. Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the Human Language Technology Conference/North American Chapter of the Association for Computational Linguistics Annual Meeting, MIT Press.