Download presentation
Presentation is loading. Please wait.
1
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006
2
Overview Gildea presents an alignment model he describes as “loosely tree-based” Gildea presents an alignment model he describes as “loosely tree-based” Builds on Yamada & Knight (2001), a tree- to-string model Builds on Yamada & Knight (2001), a tree- to-string model Gildea extends it with a clone operation, and also into a tree-to-tree model Gildea extends it with a clone operation, and also into a tree-to-tree model Wants to keep performance reasonable (polynomial in sentence length) Wants to keep performance reasonable (polynomial in sentence length)
3
Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment
4
Background Historically, two approaches to MT: transfer-based and statistical Historically, two approaches to MT: transfer-based and statistical More recently, though, hybrids More recently, though, hybrids Probabilistic models of structured representations: Probabilistic models of structured representations: Wu (1997) Stochastic Inversion Transduction Grammars Wu (1997) Stochastic Inversion Transduction Grammars Alshawi et. al. (2000) Head Transducers Alshawi et. al. (2000) Head Transducers Yamada & Knight (2001) (see below) Yamada & Knight (2001) (see below)
5
Gildea’s Proposal Need to handle drastic changes to trees (real bitexts aren’t isomorphic) Need to handle drastic changes to trees (real bitexts aren’t isomorphic) To do this, Gildea adds a new operation to the Y&K’s model: subtree clone To do this, Gildea adds a new operation to the Y&K’s model: subtree clone This operation clones a subtree from the source tree to anywhere in the target tree. This operation clones a subtree from the source tree to anywhere in the target tree. Gildea also proposes a tree-to-tree model that uses parallel tree corpora. Gildea also proposes a tree-to-tree model that uses parallel tree corpora.
6
Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment
7
Yamada and Knight (2001) Y&K’s model is tree-to-string: the input is a tree and output is a string of words. Y&K’s model is tree-to-string: the input is a tree and output is a string of words. (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!) (Gildea compares it to an “Alexander Calder mobile”. He’s the guy who invented that kind of sculpture, which is like Y&K’s model, because each node of the tree can turn either backwards or forwards. Visualize!)
8
Y&K Tree-to-String Model Three steps to turn input into output: Three steps to turn input into output: 1. Reorder the children of each node (for m nodes, m! orderings; conditioned only on the category of the node and its children) 2. Optionally insert words at each node either before or after all the children (conditioned only on foreign word) 3. Translate words at leaves (conditioned on P(f|e); words can translate to NULL)
9
Aside: Y&K Suitability Recall that this model was used for translating English to Japanese. Recall that this model was used for translating English to Japanese. Their model is well-suited to this language pair: Their model is well-suited to this language pair: Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. Japanese is SOV, while English is SVO. Japanese is also generally head-last where English is head-first. Reordering handles both of these. Japanese marks subjects/topics and objects with postpositions. Insertion handles this. Japanese marks subjects/topics and objects with postpositions. Insertion handles this.
10
Y&K EM Algorithm EM algorithm estimates inside probabilities β bottom-up: EM algorithm estimates inside probabilities β bottom-up: for all nodes ε i in input tree T do for all k, l such that 1 < k < l < N do for all orderings ρ of the children ε 1 … ε m of ε i do for all partitions of span k, l into k 1, l 1 …k m, l m do end for end for end for end for
11
Y&K Performance Computation complexity O(|T|N m+2 ), where T = tree, N = input length, m = fan-out of the grammar Computation complexity O(|T|N m+2 ), where T = tree, N = input length, m = fan-out of the grammar “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n 3 m!2 m ) “By storing partially complete arcs in the chart and interleaving the inner two loops”, improve to O(|T|n 3 m!2 m ) Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n Gildea says “exponential in m” (looks factorial to me) but polynomial in N/n If |T| is O(n) then the whole thing is O(n 4 ) If |T| is O(n) then the whole thing is O(n 4 )
12
Y&K Drawbacks No alignments with crossing brackets: No alignments with crossing brackets:A BZ XY XZY and YZX are impossible XZY and YZX are impossible Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases Recall that Y&K flatten trees to avoid some of this, but don’t catch all cases
13
Adding Clone Gildea adds clone operation to Y&K’s model Gildea adds clone operation to Y&K’s model For each node, allow the insertion of a clone of another node as its child. For each node, allow the insertion of a clone of another node as its child. Probability of cloning ε i under ε j in two steps: Probability of cloning ε i under ε j in two steps: Choice to insert: Choice to insert: Node to clone: Node to clone: P clone is one estimated number, P makeclone is constant (all nodes equally probable, reusable) P clone is one estimated number, P makeclone is constant (all nodes equally probable, reusable)
14
Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment
15
Tree-to-Tree Model Output is a tree, not a string, and it must match the tree in the target corpus Output is a tree, not a string, and it must match the tree in the target corpus Add two new transformation operations: Add two new transformation operations: one source node → two target nodes one source node → two target nodes two source nodes → one target node two source nodes → one target node “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.” “a synchronous tree substitution grammar, with probabilities parameterized to generate the target tree conditioned on the structure of the source tree.”
16
Calculating Probability From the root down. At each level: From the root down. At each level: At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) At most one of node’s children grouped with it, forming an elementary tree (conditioned on current node and CFG rule children) Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. Alignment of e-tree chosen (conditioned as above). Like Y&K reordering except: (1) alignment can include insertions and deletions (2) two nodes grouped together are reordered together. Lexical leaves translated as before. Lexical leaves translated as before.
17
Elementary Trees? Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: Elementary trees allow the alignment of trees with different depths. Treat A,B as an e-tree, reorder their children together: AA BZ→XZY XY
18
EM algorithm Estimates inside probabilities β bottom-up: Estimates inside probabilities β bottom-up: in bottom-up order nodes in target tree in bottom-up order alignments α of the children of and do for all nodes ε a in source tree T a in bottom-up order do for all elementary trees t a rooted in ε a do for all nodes ε b in target tree T b in bottom-up order do for all elementary trees t b rooted in ε b do for all alignments α of the children of t a and t b do end for end for end for end for end for
19
Performance Outer two loops are O(|T| 2 ) Outer two loops are O(|T| 2 ) Elementary trees include at most one child, so choosing e-trees is O(m 2 ) Elementary trees include at most one child, so choosing e-trees is O(m 2 ) Alignment is O(2 2m ) Alignment is O(2 2m ) Which nodes to insert or clone is O(2 2m ) Which nodes to insert or clone is O(2 2m ) How to reorder is O((2m)!) How to reorder is O((2m)!) Overall: O(|T| 2 m 2 4 2m (2m)!), quadratic (!) in size of the input sentence. Overall: O(|T| 2 m 2 4 2m (2m)!), quadratic (!) in size of the input sentence.
20
Tree-to-Tree Clone Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non- isomorphism” Allowing m-to-n matching of up to two nodes (e-trees) allows only “limited non- isomorphism” So, as before, add a clone operation So, as before, add a clone operation Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform) Algorithm unchanged, except alignments may now include cloned subtrees, same probability as in tree-to-string (uniform)
21
Background Background Tree-to-String Model Tree-to-String Model Tree-to-Tree Model Tree-to-Tree Model Experiment Experiment
22
The Data Parallel Korean-English corpus Parallel Korean-English corpus Trees annotated by hand on both sides Trees annotated by hand on both sides “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” “in this paper we will be using only the Korean trees, modeling their transformation into the English text.” (That can’t be right—only true for TTS?) (That can’t be right—only true for TTS?) 5083 sentence: 4982 training, 101 eval 5083 sentence: 4982 training, 101 eval
23
Aside: Suitability Recall that Y&K’s model was suited to the English-to-Japanese task. Recall that Y&K’s model was suited to the English-to-Japanese task. Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? Gildea is going to compare their model to his, but using a Korean-English corpus. Is that fair? In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related). In a word, yes. Korean and Japanese are syntactically very similar: agglutinative, head-last (so similar that syntax is the main argument that they’re related).
24
Results Alignment Error Rate Och & Ney (2000): Alignment Error Rate Och & Ney (2000): AER IBM Model 1.37 IBM Model 2.35 IBM Model 3.43 Tree-to-String.42 TTS + clone.36 TTS + clone, P ins =.5.32 Tree-to-Tree.49 TTT + clone.36
25
Results Detailed The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform The lexical probabilities come from Model 1 and node reordering probabilities initialized to uniform Best results when P ins set to 0.5 rather than estimated (!) Best results when P ins set to 0.5 rather than estimated (!) “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall” “While the model learned by EM tends to overestimate the total number of aligned word pairs, fixing a higher probability for insertions results in fewer total aligned pairs and therefore a better trade-off between precision and recall”
26
How’d TTS and TTT Do? The best results were with tree-to-string, surprisingly The best results were with tree-to-string, surprisingly Y&K + clone was ≈ to IBM, fixing P ins was best overall Y&K + clone was ≈ to IBM, fixing P ins was best overall Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) Tree-to-tree + clone was ≈ to IBM, but it was much more efficient to train (since it’s quadratic instead of quartic) Still, disappointing results for TTT Still, disappointing results for TTT
27
Conclusions Model allows syntactic info to be used for training without ordering constraints Model allows syntactic info to be used for training without ordering constraints Clone operations improve alignment results Clone operations improve alignment results Tree-to-tree + clone is better only in performance (but he’s hopeful) Tree-to-tree + clone is better only in performance (but he’s hopeful) Future directions: bigger corpora, conditioning on lexicalized trees Future directions: bigger corpora, conditioning on lexicalized trees
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.