Training Tree Transducers

Training Tree Transducers
Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou 11/16/2018

Outline Finite State Transducers (FSTs) and R
Trees and Regular Tree Grammars xR and Derivation Tree Inside-Outside algorithm and EM training Turning tree to string (xRS) Example and Related Work My thought/questions 11/16/2018

Finite State Transducers (FSTs)
Finite-state Transducer: from what we’ve learned-> q0 q1 b:y a:x 11/16/2018

R transducer An R transducer compactly represent a potentially infinite set of input/output tree pairs. While a FST compactly represent such a set of input/output string pairs. R is a generalization of FST. 11/16/2018

Example of R He drinks water S PRO VP V NP he drinks water 11/16/2018

Example for R cont Rule: 2,3,4 English order S(PRO, VP(V, NP))
q S PRO VP V NP he drinks water S qleft.vp.v VP qpro PRO qright.vp.np VP Rule 1: Rule: 2,3,4 V NP PRO S qleft.vp.v VP qpro PRO qright.vp.np VP English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP) 11/16/2018

Trees Definitions: 11/16/2018

Regular Tree Grammars (RTG)
Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. wRTG is just like WFSA. wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions 11/16/2018

Sample wRTG 11/16/2018

Extended-LHS Tree Transducer (xR)
Different from R: explicitly represent the lookahead and movement with a more specified LHS Form of LHS is: The pattern will be used to match an input subtree. There is a set of finite tree patterns. 11/16/2018

Binary Relation: 11/16/2018

Derivation Tree So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. But derivation tree can deterministically produce a single weighted output tree. 11/16/2018

Derivation tree & derivation wRTG
X X’ 11/16/2018

Inside-Outside algorithm
Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA 11/16/2018

Inside-Outside for wRTG
Inside weights using G are given by βG: Outside weights αG: 11/16/2018

EM training EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights. 11/16/2018

From tree to string Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). Now we have xRS—tree to string transducer. 11/16/2018

Tree-to-string transducer
Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) It is similar to xR, but the rhs is strings instead of trees. 11/16/2018

Example Implemented the translation model of (Yamada and Knight 2001)
There is a trainable xRS tree-to-string transducer that embodies: 11/16/2018

Example 11/16/2018

Related Work TSG vs RTG (equivalent)
xR vs weighted synchronous TSG (similar) EM training vs forward backward algorithm for finite state (string) transducer and also for HMM 11/16/2018

Questions Is there any future work on this tree transducer especially for Machine Translation? Precision? Recall? Also a little bit confused in the descriptions of those two relationships =>x and =>G Not very sure about inside-outside algorithm. Questions? 11/16/2018

Thank you!! 11/16/2018

Reference 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992 11/16/2018

What might be useful An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl 11/16/2018

– R: Top-down transducer, introduced before.
– F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. – T: Total transducer, with a minimum of one production per <state, symbol> pair. – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met. 11/16/2018

11/16/2018

Training Tree Transducers

Similar presentations

Presentation on theme: "Training Tree Transducers"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Training Tree Transducers

Similar presentations

Presentation on theme: "Training Tree Transducers"— Presentation transcript:

Similar presentations

About project

Feedback