Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner - Johns Hopkins Univ. a b A B events of misinform wrongly report to-John.
Conceptual Clustering
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Fast Algorithms For Hierarchical Range Histogram Constructions
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
10. Lexicalized and Probabilistic Parsing -Speech and Language Processing- 발표자 : 정영임 발표일 :
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
1/13 Parsing III Probabilistic Parsing and Conclusions.
Features and Unification
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
ISBN Chapter 4 Lexical and Syntax Analysis The Parsing Problem Recursive-Descent Parsing.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
Approximate Factoring for A* Search Aria Haghighi, John DeNero, and Dan Klein Computer Science Division University of California Berkeley.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Natural Language Processing Expectation Maximization.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
For Friday Finish chapter 23 Homework: –Chapter 22, exercise 9.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Unsupervised learning of Natural languages Eitan Volsky Yasmine Meroz.
12/06/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Parsing Dr. Jan Hajič CS Dept., Johns Hopkins Univ.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Supertagging CMSC Natural Language Processing January 31, 2006.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
Chapter 3 Context-Free Grammars Dr. Frank Lee. 3.1 CFG Definition The next phase of compilation after lexical analysis is syntax analysis. This phase.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Statistical Machine Translation Part II: Word Alignments and EM
Introduction to Parsing
Syntax-based Statistical Machine Translation Models
Statistical NLP: Lecture 13
Syntax-Directed Translation
Training Tree Transducers
Parsing and More Parsing
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Stochastic Context Free Grammars for RNA Structure Modeling
Statistical Machine Translation Papers from COLING 2004
Improving IBM Word-Alignment Model 1(Robert C. MOORE)
A Path-based Transfer Model for Machine Translation
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
Dekai Wu Presented by David Goss-Grubbs
Presentation transcript:

Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006

Overview Gildea presents an alignment model he describes as “loosely tree-based” Gildea presents an alignment model he describes as “loosely tree-based” Builds on Yamada & Knight (2001), a tree- to-string model Builds on Yamada & Knight (2001), a tree- to-string model Gildea extends it with a clone operation, and also into a tree-to-tree model Gildea extends it with a clone operation, and also into a tree-to-tree model Wants to keep performance reasonable (polynomial in sentence length) Wants to keep performance reasonable (polynomial in sentence length)

Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment

Background Historically, two approaches to MT: transfer-based and statistical Historically, two approaches to MT: transfer-based and statistical More recently, though, hybrids More recently, though, hybrids Probabilistic models of structured representations: Probabilistic models of structured representations: Wu (1997) Stochastic Inversion Transduction Grammars Wu (1997) Stochastic Inversion Transduction Grammars Alshawi et. al. (2000) Head Transducers Alshawi et. al. (2000) Head Transducers Yamada & Knight (2001) (see below) Yamada & Knight (2001) (see below)

Gildea’s Proposal Need to handle drastic changes to trees (real bitexts aren’t isomorphic) Need to handle drastic changes to trees (real bitexts aren’t isomorphic) To do this, Gildea adds a new operation to the Y&K’s model: subtree clone To do this, Gildea adds a new operation to the Y&K’s model: subtree clone This operation clones a subtree from the source tree to anywhere in the target tree. This operation clones a subtree from the source tree to anywhere in the target tree. Gildea also proposes a tree-to-tree model that uses parallel tree corpora. Gildea also proposes a tree-to-tree model that uses parallel tree corpora.

Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment

Yamada and Knight (2001) Y&K’s model is tree-to-string: the input is a tree and output is a string of words. Y&K’s model is tree-to-string: the input is a tree and output is a string of words. (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!) (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)

Y&K Tree-to-String Model Three steps to turn input into output: Three steps to turn input into output: 1. Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children) 2. Optionally insert words at each node either before or after all the children (conditioned only on foreign word) 3. Translate words at leaves (conditioned on P(f|e); words can translate to NULL)

Aside: Y&K Suitability Recall that this model was used for translating English to Japanese. Recall that this model was used for translating English to Japanese. Their model is well-suited to this language pair: Their model is well-suited to this language pair: Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. Japanese marks subjects/topics and objects with postpositions. Insertion handles this. Japanese marks subjects/topics and objects with postpositions. Insertion handles this.

Y&K EM Algorithm EM algorithm estimates inside probabilities β bottom-up: EM algorithm estimates inside probabilities β bottom-up: for all nodes ε i in input tree T do for all k, l such that 1 < k < l < N do for all orderings ρ of the children ε 1 … ε m of ε i do for all partitions of span k, l into k 1, l 1 …k m, l m do end for end for end for end for

Y&K Performance Computation complexity O(|T|N m+2 ), where T = tree, N = input length, m = fan-out of the grammar Computation complexity O(|T|N m+2 ), where T = tree, N = input length, m = fan-out of the grammar “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n 3 m!2 m ) “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n 3 m!2 m ) Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n If |T| is O(n) then the whole thing is O(n 4 ) If |T| is O(n) then the whole thing is O(n 4 )

Y&K Drawbacks No alignments with crossing brackets: No alignments with crossing brackets:A BZ XY XZY and YZX are impossible XZY and YZX are impossible Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases

Adding Clone Gildea adds clone operation to Y&K’s model Gildea adds clone operation to Y&K’s model For each node, allow the insertion of a clone of another node as its child. For each node, allow the insertion of a clone of another node as its child. Probability of cloning ε i under ε j in two steps: Probability of cloning ε i under ε j in two steps: Choice to insert: Choice to insert: Node to clone: Node to clone: P clone is one estimated number, P makeclone is constant (all nodes equally probable, reusable) P clone is one estimated number, P makeclone is constant (all nodes equally probable, reusable)

Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment

Tree-to-Tree Model Output is a tree, not a string, and it must match the tree in the target corpus Output is a tree, not a string, and it must match the tree in the target corpus Add two new transformation operations: Add two new transformation operations: one source node → two target nodes one source node → two target nodes two source nodes → one target node two source nodes → one target node “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.” “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”

Calculating Probability From the root down. At each level: From the root down. At each level: At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. Lexical leaves translated as before. Lexical leaves translated as before.

Elementary Trees? Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: AA BZ→XZY XY

EM algorithm Estimates inside probabilities β bottom-up: Estimates inside probabilities β bottom-up: in bottom-up order nodes in target tree in bottom-up order alignments α of the children of and do for all nodes ε a in source tree T a in bottom-up order do for all elementary trees t a rooted in ε a do for all nodes ε b in target tree T b in bottom-up order do for all elementary trees t b rooted in ε b do for all alignments α of the children of t a and t b do end for end for end for end for end for

Performance Outer two loops are O(|T| 2 ) Outer two loops are O(|T| 2 ) Elementary trees include at most one child, so choosing e-trees is O(m 2 ) Elementary trees include at most one child, so choosing e-trees is O(m 2 ) Alignment is O(2 2m ) Alignment is O(2 2m ) Which nodes to insert or clone is O(2 2m ) Which nodes to insert or clone is O(2 2m ) How to reorder is O((2m)!) How to reorder is O((2m)!) Overall: O(|T| 2 m 2 4 2m (2m)!), quadratic (!) in size of the input sentence. Overall: O(|T| 2 m 2 4 2m (2m)!), quadratic (!) in size of the input sentence.

Tree-to-Tree Clone Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non- isomorphism” Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non- isomorphism” So, as before, add a clone operation So, as before, add a clone operation Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform) Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)

Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment

The Data Parallel Korean-English corpus Parallel Korean-English corpus Trees annotated by hand on both sides Trees annotated by hand on both sides “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” (That can’t be right—only true for TTS?) (That can’t be right—only true for TTS?) 5083 sentence: 4982 training, 101 eval 5083 sentence: 4982 training, 101 eval

Aside: Suitability Recall that Y&K’s model was suited to the English-to-Japanese task. Recall that Y&K’s model was suited to the English-to-Japanese task. Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related). In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).

Results Alignment Error Rate Och & Ney (2000): Alignment Error Rate Och & Ney (2000): AER IBM Model 1.37 IBM Model 2.35 IBM Model 3.43 Tree-to-String.42 TTS + clone.36 TTS + clone, P ins =.5.32 Tree-to-Tree.49 TTT + clone.36

Results Detailed The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform Best results when P ins set to 0.5 rather than estimated (!) Best results when P ins set to 0.5 rather than estimated (!) “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall” “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”

How’d TTS and TTT Do? The best results were with tree-to-string, surprisingly The best results were with tree-to-string, surprisingly Y&K + clone was ≈ to IBM, fixing P ins was best overall Y&K + clone was ≈ to IBM, fixing P ins was best overall Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) Still, disappointing results for TTT Still, disappointing results for TTT

Conclusions Model allows syntactic info to be used for training without ordering constraints Model allows syntactic info to be used for training without ordering constraints Clone operations improve alignment results Clone operations improve alignment results Tree-to-tree + clone is better only in performance (but he’s hopeful) Tree-to-tree + clone is better only in performance (but he’s hopeful) Future directions: bigger corpora, conditioning on lexicalized trees Future directions: bigger corpora, conditioning on lexicalized trees