Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

A Word-Class Approach to Labeling PSCFG Rules for Machine Translation (ACL 2011) Andreas Zollmann and Stephan Vogel Presented by Yun Huang 01/07/2011.
The Learning Non-Isomorphic Tree Mappings for Machine Translation Jason Eisner - Johns Hopkins Univ. a b A B events of misinform wrongly report to-John.
Combining Word-Alignment Symmetrizations in Dependency Tree Projection David Mareček Charles University in Prague Institute of.
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
Noah A. Smith and Jason Eisner Department of Computer Science /
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
28 June 2007EMNLP-CoNLL1 Probabilistic Models of Nonprojective Dependency Trees David A. Smith Center for Language and Speech Processing Computer Science.
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data John Lafferty Andrew McCallum Fernando Pereira.
1 Statistical NLP: Lecture 12 Probabilistic Context Free Grammars.
A Scalable Decoder for Parsing-based Machine Translation with Equivalent Language Model State Maintenance Zhifei Li and Sanjeev Khudanpur Johns Hopkins.
1 Quasi-Synchronous Grammars  Based on key observations in MT: translated sentences often have some isomorphic syntactic structure, but not usually in.
A Non-Parametric Bayesian Approach to Inflectional Morphology Jason Eisner Johns Hopkins University This is joint work with Markus Dreyer. Most of the.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
1/15 Synchronous Tree-Adjoining Grammars Authors: Stuart M. Shieber and Yves Schabes Reporter: 江欣倩 Professor: 陳嘉平.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Graphical models for part of speech tagging
Parser Adaptation and Projection with Quasi-Synchronous Grammar Features David A. Smith (UMass Amherst) Jason Eisner (Johns Hopkins) 1.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.
PART I: overview material
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Graphical Models over Multiple Strings Markus Dreyer and Jason Eisner Dept. of Computer Science, Johns Hopkins University EMNLP 2009 Presented by Ji Zongcheng.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
John Lafferty Andrew McCallum Fernando Pereira
Probabilistic Context Free Grammars Grant Schindler 8803-MDM April 27, 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Overview of Previous Lesson(s) Over View 3 Model of a Compiler Front End.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Natural Language Processing : Probabilistic Context Free Grammars Updated 8/07.
Statistical Machine Translation Part II: Word Alignments and EM
CSC 594 Topics in AI – Natural Language Processing
Approaches to Machine Translation
Syntax-based Statistical Machine Translation Models
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
Approaches to Machine Translation
Stochastic Context Free Grammars for RNA Structure Modeling
Statistical Machine Translation Papers from COLING 2004
Dekai Wu Presented by David Goss-Grubbs
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Faculty of Computer Science and Information System
Presentation transcript:

Quasi-Synchronous Grammars Alignment by Soft Projection of Syntactic Dependencies David A. Smith and Jason Eisner Center for Language and Speech Processing Department of Computer Science Johns Hopkins University

Synchronous Grammars Synchronous grammars elegantly model P(T 1, T 2, A) Conditionalizing for Alignment Translation Training? Observe parallel trees? Impute trees/links? Project known trees… ImAnfangwardasWort Inthebeginningwastheword

Projection Train with bitext Parse one side Align words Project dependencies Many to one links? Non-projective and circular dependencies? Proposals in Hwa et al., Quirk et al., etc. ImAnfangwardasWort Inthebeginningwastheword

Divergent Projection AufFragediesebekommenichhabeleiderAntwortkeine Ididnotunfortunatelyreceiveananswertothisquestion NULL monotonic null head-swapping siblings

Free Translation TschernobylkönntedannetwasspäterandieReihekommen ThenwecoulddealwithChernobylsometimelater Bad dependencies Parent-ancestors? NULL

Dependency Menagerie

Overview Divergent & Sloppy Projection Modeling Motivation Quasi-Synchronous Grammars (QG) Basic Parameterization Modeling Experiments Alignment Experiments

QG by Analogy HMM: noisy channel generating states MEMM: direct generative model of states CRF: undirected, globally normalized Target Source Target

I really mean “conference paper”. Words with Senses Ipresentedthehavepaperabout IchhabedieVeröffentlichungüber…präsentiertdas Papier mit with Now senses in a particular (German) sentence Veröffentlichung

Quasi-Synchronous Grammar QG: A target-language grammar that generates translations of a particular source- language sentence. A direct, conditional model of translation as P(T 2, A | T 1 ) This grammar can be CFG, TSG, TAG, etc.

Generating QCFG from T 1 U = Target language grammar nonterminals V = Nodes of given source tree T 1 Binarized QCFG: A, B, C ∈ U; α, β, γ ∈ 2 V ⇒ ⇒ w Present modeling restrictions |α| ≤ 1 Dependency grammars (1 node per word) Tie parameters that depend on α, β, γ “Model 1” property: reuse of senses. Why? “senses”

Modeling Assumptions ImAnfangwardasWort thebeginningwastheword At most 1 sense per English word Dependency Grammar: one node/word Allow sense “reuse” Tie params for all tokens of “im” In

Dependency Relations + “none of the above”

QCFG Generative Story observed AufFragediesebekommenichleiderAntwortkeine Ididnotunfortunatelyreceiveananswertothisquestion NULL habe P(parent-child) P(PRP | no left children of did) P(I | ich) O(m 2 n 3 ) P(breakage)

Training the QCFG Rough surrogates for translation performance How can we best model target given source? How can we best match human alignments? German-English Europarl from SMT05 1k, 10k, 100k sentence pairs German parsed w/Stanford parser EM training of monolingual/bilingual parameters For efficiency, select alignments in training (not test) from IBM Model 4 union

Cross-Entropy Results

AER Results

AER Comparison IBM4 German-English QG German-English IBM4 English-German

Conclusions Strict isomorphism hurts for Modeling translations Aligning bitext Breakages beyond local nodes help most “None of the above” beats simple head-swapping and 2-to-1 alignments Insignificant gains from further breakage taxonomy

Continuing Research Senses of more than one word should help Maintaining O(m 2 n 3 ) Further refining monolingual features on monolingual data Comparison to other synchronizers Decoder in progress uses same direct model of P(T 2,A | T 1 ) Globally normalized and discriminatively trained

Thanks David Yarowsky Sanjeev Khudanpur Noah Smith Markus Dreyer David Chiang Our reviewers The National Science Foundation

Synchronous Grammar as QG Target nodes correspond to 1 or 0 source nodes ∀ ⇒ … ( ∀ i ≠ j) α i ≠ α j unless α i = NULL ( ∀ i > 0) α i is a child of α 0 in T 1, unless α i = NULL STSG, STAG operate on derivation trees Cf. Gildea’s clone operation as a quasi- synchronous move

Say What You’ve Said

Projection Synchronous grammars can explain s-t relation May need fancy formalisms, harder to learn Align as many fragments as possible: explain fragmentariness when target language requirements override Some regular phenomena: head-swapping, c-command (STAG), traces Monolingual parser Word alignment Project to other language Empirical model vs. decoding P(T2,A|T1) via synchronous dep. Grammar How do you train? Just look at your synchronous corpus … oops. Just look at your parallel corpus and infer the synchronous trees … oops. Just look at your parallel corpus aligned by Giza and project dependencies over to infer synchronous tree fragments. But how do you project over many-to-one? How do you resolve nonprojective links in the projected version? And can’t we use syntax to align better than Giza did, anyway? Deal with incompleteness in the alignments, unknown words (?)

Talking Points Get advantages of a synchronous grammar without being so darn rigid/expensive: conditional distribution, alignment, decoding all taking syntax into account What is the generative process? How are the probabilities determined from parameters in a way that combines monolingual and cross-lingual preferences? How are these parameters trained? Did it work? What are the most closely related ideas and why is this one better?

Cross-Entropy Results ConfigurationCE at 1kCE at 10kCE at 100k NULL parent-child child-parent same node all breakages siblings grandparent c-command

AER Results ConfigurationAER at 1kAER at 10kAER at 100k parent-child child-parent same node all breakages siblings grandparent c-command