Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing.

Similar presentations


Presentation on theme: "Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing."— Presentation transcript:

1 Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing Department of Computer Science Johns Hopkins University

2 Synchronous Grammars Synchronous grammars elegantly model P(T 1, T 2, A) Conditionalizing for Alignment Translation Training? Observe parallel trees? Impute trees/links? Project known trees… ImAnfangwardasWort Inthebeginningwastheword

3 Projection Train with bitext Parse one side Align words Project dependencies Many to one links? Non-projective and circular dependencies? Proposals in Hwa et al., Quirk et al., etc. ImAnfangwardasWort Inthebeginningwastheword

4 Divergent Projection AufFragediesebekommenichhabeleiderAntwortkeine Ididnotunfortunatelyreceiveananswertothisquestion NULL monotonic null head-swapping siblings

5 Free Translation TschernobylkönntedannetwasspäterandieReihekommen ThenwecoulddealwithChernobylsometimelater Bad dependencies Parent-ancestors? NULL

6 Dependency Menagerie

7 Overview Divergent & Sloppy Projection Modeling Motivation Quasi-Synchronous Grammars (QG) Basic Parameterization Modeling Experiments Alignment Experiments

8 QG by Analogy HMM: noisy channel generating states MEMM: direct generative model of states CRF: undirected, globally normalized Target Source Target

9 I really mean “conference paper”. Words with Senses Ipresentedthehavepaperabout IchhabedieVeröffentlichungüber…präsentiertdas Papier mit with Now senses in a particular (German) sentence Veröffentlichung

10 Quasi-Synchronous Grammar QG: A target-language grammar that generates translations of a particular source- language sentence. A direct, conditional model of translation as P(T 2, A | T 1 ) This grammar can be CFG, TSG, TAG, etc.

11 Generating QCFG from T 1 U = Target language grammar nonterminals V = Nodes of given source tree T 1 Binarized QCFG: A, B, C ∈ U; α, β, γ ∈ 2 V ⇒ ⇒ w Present modeling restrictions |α| ≤ 1 Dependency grammars (1 node per word) Tie parameters that depend on α, β, γ “Model 1” property: reuse of senses. Why? “senses”

12 Modeling Assumptions ImAnfangwardasWort thebeginningwastheword At most 1 sense per English word Dependency Grammar: one node/word Allow sense “reuse” Tie params for all tokens of “im” In

13 Dependency Relations + “none of the above”

14 QCFG Generative Story observed AufFragediesebekommenichleiderAntwortkeine Ididnotunfortunatelyreceiveananswertothisquestion NULL habe P(parent-child) P(PRP | no left children of did) P(I | ich) O(m 2 n 3 ) P(breakage)

15 Training the QCFG Rough surrogates for translation performance How can we best model target given source? How can we best match human alignments? German-English Europarl from SMT05 1k, 10k, 100k sentence pairs German parsed w/Stanford parser EM training of monolingual/bilingual parameters For efficiency, select alignments in training (not test) from IBM Model 4 union

16 Cross-Entropy Results

17 AER Results

18 AER Comparison IBM4 German-English QG German-English IBM4 English-German

19 Conclusions Strict isomorphism hurts for Modeling translations Aligning bitext Breakages beyond local nodes help most “None of the above” beats simple head-swapping and 2-to-1 alignments Insignificant gains from further breakage taxonomy

20 Continuing Research Senses of more than one word should help Maintaining O(m 2 n 3 ) Further refining monolingual features on monolingual data Comparison to other synchronizers Decoder in progress uses same direct model of P(T 2,A | T 1 ) Globally normalized and discriminatively trained

21 Thanks David Yarowsky Sanjeev Khudanpur Noah Smith Markus Dreyer David Chiang Our reviewers The National Science Foundation

22 Synchronous Grammar as QG Target nodes correspond to 1 or 0 source nodes ∀ ⇒ … ( ∀ i ≠ j) α i ≠ α j unless α i = NULL ( ∀ i > 0) α i is a child of α 0 in T 1, unless α i = NULL STSG, STAG operate on derivation trees Cf. Gildea’s clone operation as a quasi- synchronous move

23 Say What You’ve Said

24 Projection Synchronous grammars can explain s-t relation May need fancy formalisms, harder to learn Align as many fragments as possible: explain fragmentariness when target language requirements override Some regular phenomena: head-swapping, c-command (STAG), traces Monolingual parser Word alignment Project to other language Empirical model vs. decoding P(T2,A|T1) via synchronous dep. Grammar How do you train? Just look at your synchronous corpus … oops. Just look at your parallel corpus and infer the synchronous trees … oops. Just look at your parallel corpus aligned by Giza and project dependencies over to infer synchronous tree fragments. But how do you project over many-to-one? How do you resolve nonprojective links in the projected version? And can’t we use syntax to align better than Giza did, anyway? Deal with incompleteness in the alignments, unknown words (?)

25 Talking Points Get advantages of a synchronous grammar without being so darn rigid/expensive: conditional distribution, alignment, decoding all taking syntax into account What is the generative process? How are the probabilities determined from parameters in a way that combines monolingual and cross-lingual preferences? How are these parameters trained? Did it work? What are the most closely related ideas and why is this one better?

26

27 Cross-Entropy Results ConfigurationCE at 1kCE at 10kCE at 100k NULL60.8653.2846.94 +parent-child43.8222.4013.44 +child-parent41.2721.7312.62 +same node41.0121.5012.38 +all breakages35.6318.7211.27 +siblings34.5918.5911.21 +grandparent34.5218.5511.17 +c-command34.4618.5911.27

28 AER Results ConfigurationAER at 1kAER at 10kAER at 100k parent-child40.6939.0333.62 +child-parent43.1739.7833.79 +same node43.2240.8634.38 +all breakages37.6330.5125.99 +siblings37.8733.3629.27 +grandparent36.7832.7328.84 +c-command37.0433.5127.45


Download ppt "Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing."

Similar presentations


Ads by Google