Download presentation
Presentation is loading. Please wait.
1
Training Tree Transducers
Author: Jonathan Graehl Kevin Knight Presented by Zhengbo Zhou 11/16/2018
2
Outline Finite State Transducers (FSTs) and R
Trees and Regular Tree Grammars xR and Derivation Tree Inside-Outside algorithm and EM training Turning tree to string (xRS) Example and Related Work My thought/questions 11/16/2018
3
Finite State Transducers (FSTs)
Finite-state Transducer: from what we’ve learned-> q0 q1 b:y a:x 11/16/2018
4
R transducer An R transducer compactly represent a potentially infinite set of input/output tree pairs. While a FST compactly represent such a set of input/output string pairs. R is a generalization of FST. 11/16/2018
5
Example of R He drinks water S PRO VP V NP he drinks water 11/16/2018
6
Example for R cont Rule: 2,3,4 English order S(PRO, VP(V, NP))
q S PRO VP V NP he drinks water S qleft.vp.v VP qpro PRO qright.vp.np VP Rule 1: Rule: 2,3,4 V NP PRO S qleft.vp.v VP qpro PRO qright.vp.np VP English order S(PRO, VP(V, NP)) Arabic order S(V,PRO,NP) 11/16/2018
7
Trees Definitions: 11/16/2018
8
Regular Tree Grammars (RTG)
Regular Tree Grammar, a common way of compactly representing a potentially infinite set of trees. wRTG is just like WFSA. wRTG G : (∑,N,S,P) ∑: alphabet N: nonterminals S: start nonterminal : Weighted productions 11/16/2018
9
Sample wRTG 11/16/2018
10
Extended-LHS Tree Transducer (xR)
Different from R: explicitly represent the lookahead and movement with a more specified LHS Form of LHS is: The pattern will be used to match an input subtree. There is a set of finite tree patterns. 11/16/2018
11
Binary Relation: 11/16/2018
12
Derivation Tree So many trees now, but this derivation tree is a representation of the transducer, neither the input tree nor the output tree. But derivation tree can deterministically produce a single weighted output tree. 11/16/2018
13
Derivation tree & derivation wRTG
X X’ 11/16/2018
14
Inside-Outside algorithm
Basic idea of inside-outside algorithm: Use current probability of rules to estimate the expected frequencies of certain types of derivation steps and compute new probabilities for those rules.[1] Generally for inside probability is to recalculate p of A->a may go through A->BC for outside probability is to recalculate p of C->AB or C->BA 11/16/2018
15
Inside-Outside for wRTG
Inside weights using G are given by βG: Outside weights αG: 11/16/2018
16
EM training EM training: to maximized the corpus likelihood, repeatedly estimating the expectation of decision and maximizing by assigning counts to parameter and renormaliztion. Algorithm 2 implements EM xR training by repeatedly computing inside-outside weights. 11/16/2018
17
From tree to string Although we can use Extended-LHS Tree Transducer (xR) to get an output tree from an input tree (say parse trees), but still, it is a (parse) tree, not the sentence in another language (for machine translation). Now we have xRS—tree to string transducer. 11/16/2018
18
Tree-to-string transducer
Weighted extended-lhs root-to-frontier tree-to-string transducer: X=(∑,Δ,Q, Qi, R) It is similar to xR, but the rhs is strings instead of trees. 11/16/2018
19
Example Implemented the translation model of (Yamada and Knight 2001)
There is a trainable xRS tree-to-string transducer that embodies: 11/16/2018
20
Example 11/16/2018
21
Related Work TSG vs RTG (equivalent)
xR vs weighted synchronous TSG (similar) EM training vs forward backward algorithm for finite state (string) transducer and also for HMM 11/16/2018
22
Questions Is there any future work on this tree transducer especially for Machine Translation? Precision? Recall? Also a little bit confused in the descriptions of those two relationships =>x and =>G Not very sure about inside-outside algorithm. Questions? 11/16/2018
23
Thank you!! 11/16/2018
24
Reference 1 Fernando Pereira, Yves Schabes INSIDE-OUTSIDE REESTIMATION FROM PARTIALLY BRACKETED CORPORA 1992 11/16/2018
25
What might be useful An Overview of Probabilistic Tree Transducers for Natural Language Processing Kevin Knight and Jonathan Graehl 11/16/2018
26
– R: Top-down transducer, introduced before.
– F: Bottom-up transducer (“Frontier-to-root”), with similar rules, but transforming the leaves of the input tree first, and working its way up. – L: Linear transducer, which prohibits copying subtrees. Rule 4 in Figure 4 is example of a copying production, so this whole transducer is R but not RL. – N: Non-deleting transducer, which requires that every left-hand-side variable also appear on the right-hand side. A deleting R-transducer can simply delete a subtree (without inspecting it). The transducer in Figure 4 is the deleting kind, because of rules It would also be deleting if it included a rule for dropping English determiners, e.g., q NP(x0, x1) q x1. – D: Deterministic transducer, with a maximum of one production per <state, symbol> pair. – T: Total transducer, with a minimum of one production per <state, symbol> pair. – PDTT: Push-down tree transducer, the transducer analog of CFTG [36]. – subscript: Regular-lookahead transducer, which can check to see if an input subtree is tree-regular, i.e., whether it belongs to a specified RTL. Productions only fire when their lookahead conditions are met. 11/16/2018
27
11/16/2018
28
11/16/2018
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.