Training Tree Transducers Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou 11/16/2018
Outline Finite State Transducers (FSTs) and R Trees and Regular Tree Grammars xR and Derivation Tree Inside-Outside algorithm and EM training Turning tree to string (xRS) Example and Related Work My thought/questions 11/16/2018
Finite State Transducers (FSTs) Finite-state Transducer: from what we’ve learned-> q0 q1 b:y a:x 11/16/2018
R transducer An R transducer compactly represent a potentially infinite set of input/output tree pairs. While a FST compactly represent such a set of input/output string pairs. R is a generalization of FST. 11/16/2018
Example of R He drinks water S PRO VP V NP he drinks water 11/16/2018
Example for R cont Rule: 2,3,4 English order S(PRO, VP(V, NP)) q S PRO VP V NP he drinks water S qleft.vp.v VP qpro PRO qright.vp.np VP Rule 1: Rule: 2,3,4 V NP PRO S qleft.vp.v VP qpro PRO qright.vp.np VP English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP) 11/16/2018
Trees Definitions: 11/16/2018
Regular Tree Grammars (RTG) Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. wRTG is just like WFSA. wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions 11/16/2018
Sample wRTG 11/16/2018
Extended-LHS Tree Transducer (xR) Different from R: explicitly represent the lookahead and movement with a more specified LHS Form of LHS is: The pattern will be used to match an input subtree. There is a set of finite tree patterns. 11/16/2018
Binary Relation: 11/16/2018
Derivation Tree So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. But derivation tree can deterministically produce a single weighted output tree. 11/16/2018
Derivation tree & derivation wRTG X X’ 11/16/2018
Inside-Outside algorithm Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA 11/16/2018
Inside-Outside for wRTG Inside weights using G are given by βG: Outside weights αG: 11/16/2018
EM training EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights. 11/16/2018
From tree to string Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). Now we have xRS—tree to string transducer. 11/16/2018
Tree-to-string transducer Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) It is similar to xR, but the rhs is strings instead of trees. 11/16/2018
Example Implemented the translation model of (Yamada and Knight 2001) There is a trainable xRS tree-to-string transducer that embodies: 11/16/2018
Example 11/16/2018
Related Work TSG vs RTG (equivalent) xR vs weighted synchronous TSG (similar) EM training vs forward backward algorithm for finite state (string) transducer and also for HMM 11/16/2018
Questions Is there any future work on this tree transducer especially for Machine Translation? Precision? Recall? Also a little bit confused in the descriptions of those two relationships =>x and =>G Not very sure about inside-outside algorithm. Questions? 11/16/2018
Thank you!! 11/16/2018
Reference 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992 11/16/2018
What might be useful An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl 11/16/2018
– R: Top-down transducer, introduced before. – F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules 34-39. It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. – T: Total transducer, with a minimum of one production per <state, symbol> pair. – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met. 11/16/2018
11/16/2018
11/16/2018