LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Chapter Chapter Summary Languages and Grammars Finite-State Machines with Output Finite-State Machines with No Output Language Recognition Turing.
March 1, 2009 Dr. Muhammed Al-Mulhem 1 ICS 482 Natural Language Processing Probabilistic Context Free Grammars (Chapter 14) Muhammed Al-Mulhem March 1,
CKY Parsing Ling 571 Deep Processing Techniques for NLP January 12, 2011.
A Scalable Decoder for Parsing-based Machine Translation with Equivalent Language Model State Maintenance Zhifei Li and Sanjeev Khudanpur Johns Hopkins.
Amirkabir University of Technology Computer Engineering Faculty AILAB Efficient Parsing Ahmad Abdollahzadeh Barfouroush Aban 1381 Natural Language Processing.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Context-Free Grammars Lecture 7
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Syntactic Parsing with CFGs CMSC 723: Computational Linguistics I ― Session #7 Jimmy Lin The iSchool University of Maryland Wednesday, October 14, 2009.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Daniel Gildea (2003): Loosely Tree-Based Alignment for Machine Translation Linguistics 580 (Machine Translation) Scott Drellishak, 2/21/2006.
Models of Generative Grammar Smriti Singh. Generative Grammar  A Generative Grammar is a set of formal rules that can generate an infinite set of sentences.
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
11 CS 388: Natural Language Processing: Syntactic Parsing Raymond J. Mooney University of Texas at Austin.
Context Free Grammars Reading: Chap 12-13, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
GRAMMARS David Kauchak CS159 – Fall 2014 some slides adapted from Ray Mooney.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
CSE 517 Natural Language Processing Winter 2015 Syntax-Based Machine Translation Yejin Choi Slides from Philipp Koehn, Matt Post, Luke Zettlemoyer, …
Context Free Grammars Reading: Chap 9, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Rada Mihalcea.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Page 1 Probabilistic Parsing and Treebanks L545 Spring 2000.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
CSA2050 Introduction to Computational Linguistics Parsing I.
Sentence Parsing Parsing 3 Dynamic Programming. Jan 2009 Speech and Language Processing - Jurafsky and Martin 2 Acknowledgement  Lecture based on  Jurafsky.
PARSING 2 David Kauchak CS159 – Spring 2011 some slides adapted from Ray Mooney.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
December 2011CSA3202: PCFGs1 CSA3202: Human Language Technology Probabilistic Phrase Structure Grammars (PCFGs)
GRAMMARS David Kauchak CS457 – Spring 2011 some slides adapted from Ray Mooney.
CSC312 Automata Theory Lecture # 26 Chapter # 12 by Cohen Context Free Grammars.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
The estimation of stochastic context-free grammars using the Inside-Outside algorithm Oh-Woog Kwon KLE Lab. CSE POSTECH.
PARSING David Kauchak CS159 – Fall Admin Assignment 3 Quiz #1  High: 36  Average: 33 (92%)  Median: 33.5 (93%)
1 Local Search for Optimal Permutations Jason Eisner and Roy Tromble with Very Large-Scale Neighborhoods in Machine Translation.
Spring 2010 Lecture 4 Kristina Toutanova MSR & UW With slides borrowed from Philipp Koehn and Hwee Tou Ng LING 575: Seminar on statistical machine translation.
Statistical NLP Spring 2011
Zhifei Li and Sanjeev Khudanpur Johns Hopkins University
Statistical Machine Translation Papers from COLING 2004
David Kauchak CS159 – Spring 2019
Statistical NLP Spring 2011
Dekai Wu Presented by David Goss-Grubbs
Presentation transcript:

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi Tree-based translation

Overview  Motivation  Examples of reordering/translation phenomena  Synchronous context free grammar  Example derivations  ITG grammars  Reordering for ITG grammars  Applications of bracketing ITG grammars  Applications: ITGs for word alignment  Hierarchical phrase-based translation with Hiero  Rule extraction  Model features  Decoding for SCFGs and integrating a LM

Motivation for tree-based translation  Phrases capture contextual translation and local reordering surprisingly well  However this information is brittle:  “author of the book  本書的作者 ” tells us nothing about how to translate “author of the pamphlet” or “author of the play”  The Chinese phrase “NOUN1 的 NOUN2” becomes “NOUN2 of NOUN1” in English

Motivation for tree-based translation  There are general principles a phrase-based system is not using  Some languages have adjectives before the nouns, some after  Some languages place prepositions before nouns, some after  Some languages put PPs before the head, others after  Some languages place relative clauses before head, others after  Discontinuous translations are not handled well by phrase-based systems  ne pas in French, German verb split

Types of tree-based systems  Formally tree-based but not using linguistic syntax  Can still model hierarchical nature of language  Can capture hierarchical reordering  Examples: phrase-based ITGs and Hiero (will focus on these in this lecture)  Can use linguistic syntax on source, target, or both sides  Phrase structure trees, dependency trees  Next lecture

Synchronous context-free grammars

 A generalization of context free grammars Slide from David Chiang, ACL 2006 tutorial

Context-free grammars (example in Japanese) Slide from David Chiang, ACL 2006 tutorial

Synchronous CFGs Slide from David Chiang, ACL 2006 tutorial

Synchronous CFGs Slide from David Chiang, ACL 2006 tutorial

Synchronous CFGs Slide from David Chiang, ACL 2006 tutorial

Rules with probabilities Joint probability of source and target language re-writes, given non-terminal on left. Could also use conditional probability of target given source or source given target.

Synchronous CFGs Slide from David Chiang, ACL 2006 tutorial

Inversion Transduction Grammars (ITGs)

Stochastic Inversion Transduction Grammars [Wu 97]

Bracketing ITG grammars

Reordering in bracketing ITG grammar

Example re-ordering with ITG

Are there other synchronous parses of this sentence pair? [1,2,3,4]

Example re-ordering with ITG  Other re-orderings with parses  A horizontal bar means the non-terminals are swapped

But some re-orderings are not allowed  When words move inside-out  22 out of the 24 permutations of 4 words are parsable by the bracketing ITG

Number of permutations compared to ones parsable by ITG

Application of ITGs  Have been applied to word alignment and translation in many previous works  One recent interesting work is Haghighi et al’s 09 paper on supervised word alignment with block ITGs  Aria Haghighi, John Blitzer, John DeNero, and Dan Klein “Better word alignments with supervised ITG Models”

Comparison of oracle alignment error (AER) for different alignment spaces Space of all alignments, space of 1-to-1 alignments, space of ITG alignments From Haghighi et al 09

Block ITG: adding one to many alignments From Haghighi et al 09

Comparison of oracle alignment error (AER) for different alignment spaces From Haghighi et al 09

Alignment performance using discriminative model From Haghighi et al 09

Training for maximum likelihood  So far results were with MIRA  Requiring only finding the best alignment under the model  Efficient under 1-to-1 and ITG models  If we want to train for maximum likelihood according to a log-linear model  Requires summing over all possible alignments  This is tractable in ITGs (will discuss bitext parsing in a bit)  One of the big advantages of ITGs

MIRA versus maximum likelihood training

Algorithms for SCFGs  Translation with synchronous CFGs  Bi-text parsing with synchronous CFGs

Review: CKY parsing for CFGs in CNF  Start with spans of length one and construct possible constituents Slide from David Chiang, ACL 2006 tutorial

Review: CKY parsing for CFGs in CNF  Continue with spans of length 2 and construct constituents using words and constructed constituents Slide from David Chiang, ACL 2006 tutorial

Review: CKY parsing for CFGs in CNF  Spans of length 3 Slide from David Chiang, ACL 2006 tutorial

Review: CKY parsing for CFGs in CNF  Spans of length 4 Slide from David Chiang, ACL 2006 tutorial

Review: CKY parsing for CFGs in CNF  The best S constituent covering the whole sentence is the final output Slide from David Chiang, ACL 2006 tutorial

Review: complexity of CKY Slide from David Chiang, ACL 2006 tutorial

Translation with SCFG Slide from David Chiang, ACL 2006 tutorial

Translation Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing  We consider SCFGs with at most two symbols on the right-hand-side (rank 2) Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing Slide from David Chiang, ACL 2006 tutorial

Bi-text parsing for grammars with higher rank  No CNF for synchronous CFGs with rank greater or equal to 4 in the general case  With higher-rank grammars we can translate efficiently by converting the source side CFG to CNF, parsing, flattening the trees back, and translating  Not so for bi-text parsing:  In general, exponential in rank of grammar and polynomial in sentence length

David Chiang ISI, USC Hierarchical phrase-based translation

Hierarchical phrase-based translation overview  Motivation  Extracting rules  Scoring derivations  Decoding without an LM  Decoding with a LM

Motivation  Review of phrase based models  Segment input into sequence of phrases  Translate each phrase  Re-order phrases depending on distortion and perhaps the lexical content of the phrases  Properties of phrase-based models  Local re-ordering is captured within phrases for frequently occurring groups of words  Global re-ordering is not modeled well  Only contiguous translations are learned

Chinese-English example Australia is one of the few countries that have diplomatic relations with North Korea. Output from phrase-based system: Captured some reordering through phrase translation and phrase re-ordering Did not re-order the relative clause and the noun phrase.

Idea: Hierarchical phrases

Other example hierarchical phrases

A Synchronous CFG for example

General approach  Align parallel training data using word-alignment models (e.g. GIZA++)  Extract hierarchical phrase pairs  Can be represented as SCFG rules  Assign probabilities (scores) to rules  Like in log-linear models for phrase-based MT, can define various features on rules to come up with rule scores  Translating new sentences  Parsing with an SCFG grammar  Integrating a language model

Example derivation

Extracting hierarchical phrases  Start with contiguous phrase pairs, as in phrasal SMT models (called initial phrase pairs)  Make rules for these phrase pairs and add them to the rule-set extracted from this sentence pair

Extracting hierarchical phrase pairs  For every rule of the sentence pair, and every initial phrase pair contained in it, replace initial phrase pair by non-terminal and add new rule

Another example Traditional phrases Hierarchical phrase

Constraining the grammar rules  This method generates too many phrase pairs and leads to spurious ambiguity  Place constraints on the set of allowable rules for robustness/speed

Adding glue rules

Assigning scores to derivations

Estimating feature values and feature weights

Finding the best translation: decoding

Finding the best translation including an LM

Parsing with Hiero grammars  Modification of CKY which does not require conversion to Chomsky Normal Form  Parsing as weighted deduction (without LM, using source-side grammar)  Goal: prove [S,0,n]

Pseudo-code for parsing If two items are equivalent: same span, same non-terminal, they are merged. For k-best generation, keep pointers to all ways to generate the item, plus weights.

K-best derivation generation  To generate a k-best list for some item in the chart (e.g. X from 5 to 8), need to consider top k combinations of rules used to form X, plus sub-items used for the rules  E.g. top 4 rules applying at span 5 to 8 and target sides of top 3 derivations from 6 to 8  Naïve method: generate all combinations, sort them and return top k  Faster method: we don’t need to generate all combinations to get top k

K-best combinations of two lists

Integrating an LM with cube pruning

Results using different LM integration methods

Comparison to a phrase-based system

Summary

 Described hierarchical phrase-based translation  Uses hierarchical rules encoding phrase re-ordering and discontinuous lexical correspondence  Rules include traditional contiguous phrase pairs  Can translate efficiently without LM using SCFG parsing  Outperforms phrase-based models for several languages  Hiero is implemented in Moses

References  Hierarchical phrase-based translation. David Chiang, CL  An introduction to Synchronous Grammars. Notes and slides from ACL 2006 tutorial. David Chiang.  Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Dekai Wu, CL  Better word alignment with Supervised ITG models. ACL 2009, A. Haghighi, J. Blitzer, J. DeNero, and D. Klein  Many other interesting papers using ITGs and extensions to Hiero: will add some to the web page

Next lecture  Chris Quirk will talk about SMT systems using linguistic syntax  Using syntax on source, target  Different types of syntactic analysis  Other types of synchronous grammars  List of readings will be updated