Grammatical Machine Translation Stefan Riezler & John Maxwell.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Grammatical Machine Translation Stefan Riezler & John Maxwell Palo Alto Research Center.
CILC2011 A framework for structured knowledge extraction and representation from natural language via deep sentence analysis Stefania Costantini Niva Florio.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Dependency-Based Automatic Evaluation for Machine Translation Karolina Owczarzak, Josef van Genabith, Andy Way National Centre for Language Technology.
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Hybridity in MT: Experiments on the Europarl Corpus Declan Groves 24 th May, NCLT Seminar Series 2006.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Direct Translation Approaches: Statistical Machine Translation
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
A Language Independent Method for Question Classification COLING 2004.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Grammar Engineering: What is it good for? Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
Concepts and Realization of a Diagram Editor Generator Based on Hypergraph Transformation Author: Mark Minas Presenter: Song Gu.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Grammatical Machine Translation Stefan Riezler & John Maxwell

Overview 1.Introduction 2.Extracting F-Structure Snippets 3.Parsing-Transfer-Generation 4.Statistical Models and Training 5.Experimental Evaluation 6.Discussion

Section 1: Introduction

Introduction Recent approaches to SMT use Phrase-based SMT Syntactic knowledge Phrase-base SMT is great for Local ordering Short idiomatic expressions But not so good for Learning LDDs Generalising to unseen phrases that share non-overt linguistic info

Statistical Parsers Statistical Parsers can provide information to Resolve LDDs Generalise to unseen phrases that share non-overt linguistic info Examples: Xia & McCord 2004 Collins et al Lin 2004 Ding & Palmer 2005 Quirk et al. 2005

Grammar-based Generation Could grammar-based generation be useful for MT? Quirk et al Simple statistical model outperforms grammar-base generator of Menezes & Richardson 2001 on BLEU score Charniak et al Parsing-based language modelling can improve grammaticality of translations while not improving BLEU score Perhaps BLEU score is not sufficient way to test for grammaticality. Further investigation needed

Grammatical Machine Translation Aim: Investigate incorporating a grammar-based generator into a dependency-based SMT system The authors present: A dependency-based SMT model Statistical components that are modelled on phrase-based system of Koehn et al Also used: Component weights adjusted using MER training (Och 2003) Grammar-based generator N-gram and distortion models

Section 2: Extracting F-Structure Snippets

Extracting F-Structure Snippets SL and TL sentences of bilingual corpus parsed using LFG grammars For each English and German f-structure pair The two f-structures that most preserve dependencies are selected Many-to-many word alignments used to create many-to- many correspondences between the substructures Correspondences are the basis for deciding what goes into the basic transfer rule

Extracting F-Structure Snippets:Example Dafur bin ich zutiefst dankbar  I have a deep appreciation for that Many-to-many bidirectional word alignment:

Transfer Rule Extraction: Example From the aligned words we get the following substructure correspondences:

Transfer Rule Extraction: Example From the correspondences two kinds of transfer rules are extracted: 1.Primitive Transfer Rules 2.Complex Transfer Rules Transfer Contiguity Constraint 1. Source and target f-structures are each connected. 2. F-structures in the transfer source can only be aligned with f-structures in the transfer target and vice versa.

Transfer Rule Extraction: Example Primitive Rule 1: pred( X1, sein) pred( X1, have) subj( X1, X2)  subj( X1, X2) xcomp( X1, X3) obj( X1, X3)

Transfer Rule Extraction: Example Primitive Rule 2: pred( X1, ich)  pred( X1, I)

Transfer Rule Extraction: Example Primitive Rule 3: pred( X1, dafur) pred( X1, for)  obj( X1, X2) pred( X2, that)

Transfer Rule Extraction: Example Primitive Rule 4: pred( X1, dankbar) pred( X1, appreciation) adj( X1, X2)  spec( X1, X2) in_set( X3, X2) pred( X2, a) pred(X3, zutiefst)adj( X1, X3) in_set( X4, X3) pred( X4, deep)

Transfer Rule Extraction: Example Complex Transfer Rules primitive transfer rules that are adjacent in f-structure combined to form more complex rules Example (rules 1 & 2 above): pred( X1, sein) pred( X1, have) subj( X1, X2)  subj( X1, X2) pred( X2, ich) pred( X2, I) xcomp( X1, X3) obj( X1, X3) In the worst case, there can be an exponential number of combinations of primitive transfer rules, the number of primitive rules used to form a complex rule is restricted to 3 – causing the no. of transfer rules taken to be O(n 2 ) in the worst case.

Section 3: Parsing-Transfer-Generation

Parsing LFG grammars used to parse source and target text FRAGMENT grammar is used to augment standard grammar increasing robustness Correct parse determined by fewest chunk method

Transfer Rules applied to source f-structure non- deterministically and in parallel Each fact of German f-structure translated by exactly one transfer rule Default rule included that allows any fact to be translated as itself Chart used to encode translations Beam search decoding used to select the most probable translations

Generation Method of generation has to be fault tolerant Transfer system can be given a fragmentary parse as input Transfer system can output an non-valid f- structure Unknown predicates –Default morphology used to inflect source stem for English Unknown structures –Default grammar used that allows any attribute to be generated in any order with any category

Section 4: Statistical Models & Training

Statistical Components Modelled on statistical components of Pharaoh Paraoh integrates 8 statistical models 1.Relative frequency of phrase translations in source-to- target 2.Relative frequency of phrase translations in target-to- source 3.Lexical weighting in source-to-target 4.Lexical weighting in target-to-source 5.Phrase count 6.Language model probability 7.Word count 8.Distortion probability

Statistical Components Following statistics for each translation: 1.Log-probability of source-to-target transfer rules, where the probability r(e|f) of a rule that transfers source snippet f into target snippet e is estimated by the relative frequency 2. Log-probability of target-to-source rules

Statistical Components 3. Log-probability of lexical translations from source to target snippets, estimated from Viterbi alignments â between source word positions i = 1, …, n and target word positions j = 1, …, m for stems f i and e j in snippets f and e with relative word translation frequencies t(e j |f i ) 4. Log-probability of lexical translations from target-to- source snippets

Statistical Components 5. Number of transfer rule 6. Number of transfer rules with frequency 1 7. Number of default transfer rules 8. Log-probability of strings of predicates from root to frontier of target f-structure, estimated from predicate trigrams of English 9. Number of predicates in target language 10. Number of constituent movements during generation based on the original order of the head predicates of the constituents (for example, AP[2] BP[3] CP[1] counts as two movements since the head predicate of CP moved from first to third position)

Statistical Components 11. Number of generation repairs 12. Log-probability of target string as computed by trigram language model 13. Number of words in target string 1 – 10 are used to choose the most probable parse from the transfer chart 1 – 7 are are tests on source and target f-structure snippets related via transfer rules are language model and distortion features on the target c- and f-structures 11 – 13 are computed on the strings that are generated from the target f-structure The statistics are combined into a log-linear model whose parameters are adjusted by minimum error rate training.

Section 5: ExperimentalEvaluation

Experimental Evaluation Europarl German to English Sents of length 5 – 15 words Training set: 163,141 sents Development set: 1,967 sents Test set: 1,755 sents (same as Koehn et al 2003) Bidirectional word alignment created from word alignment of IBM model 4 as implemented by Giza++ (Och et al. 1999) Grammars achieve 100% coverage on unseen data –80% as full parses –20% as fragment parses 700,000 transfer rules extracted For language modelling trigram model of Stolcke 2002 is used

Experimental Evaluation For translating the test set 1 parse for each German sentence was used 10 transferred f-structures 1,000 generated strings for each transferred f- structure Most probable target f-structure is gotten by a beam search on the transfer chart using features 1-10 above, with a beam size of 20. Features are computed on the strings that are generated

Experimental Evaluation For automatic evaluation they used NIST combined with the approximate randomization test (Noreen, 1999) 6.40*5.62*5.57Full test set *5.99* In-coverage (44%) Phrase- based SMT LFGIBM Model4

Experimental Evaluation Manual Evaluation To separate the factors of grammaticality and translation adequacy 500 sentences randomly extracted from in-coverage examples 2 independent human judges Presented with the output from the phrase-based SMT system and LFG-based system in a blind test and asked them to choose a preference for one of the translations based on –Grammaticality / fluency –Translational / semantic adequacy equal LFG P equalLFGPequalLFGPJ1 \ j2 grammaticalityadequacy

Experimental Evaluation Promising results for examples that are in-coverage of LFG grammars However, back-off to robustness techniques for parsing and generation results in loss of translation quality  Rule Extraction Problems 20% of the parses are fragmental Errors occur in rule extraction process resulting in ill-formed transfer rules Parsing-Transfer-Generation Problems Parsing errors  errors in transfer  generation errors In-coverage  disambiguation errors in parsing and transfer  suboptimal translation

Experimental Evaluation Despite use of minimum error rate training and n-gram language models, the system cannot be used to maximize n-gram scores on reference translations in the same way as phrase-based systems since statistical ordering models are employed in the framework after generation This gives preference to grammaticality over similarity to reference translations

Conclusion SMT model that marries phrase-based SMT with traditional grammar-based MT NIST measure showed that results achieved are comparable with phrase-based SMT system of Koehn et al 2003 for in-coverage examples Manual evaluation showed significant improvements in both grammaticality and translational adequacy for in-coverage examples

Conclusion Determinable with this system whether or not a source sentence is in-coverage Possibility for hybrid system that achieves improved grammaticality at state-of-the-art translation quality Future Work: Improvement of translation of in-coverage source sentences e.g. stochastic generation Apply system to other language pairs and data sets

References Miriam Butt, Dyvik Helge, Tracy King, Hiroshi Masuichi and Christian Rohrer The Parallel Grammar Project. Eugene Charniak, Kevin Knight and Kenji Yamada Syntax-based Language Models for Statistical Machine Translation. Michael Collins, PhilippKoehn and Ivona Kucerova Clause Restructuring for Statistical Machine Translation. Philipp Koehn, Franz Och and Daniel Marcu Statistical Phrase-based Translation. Philipp Koehn Pharaoh: a beam search decoder for phrase-based statistical machine translation Arul Menezes and Stephen Richardson A best-first alignment for automatic extraction of transfer mappings from bilingual corpora. Franz Och, Christoph Tillmann and Ney Hermann Improved Alignment Models for Statistical Machine Translation. Franz Och Minimum error rate training in statistical machine translation. Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu BLEU: a method for automatic evaluation of machine translation. Stefan Riezler, Tracy King, Ronald Kaplan, Richard Crouch, John Maxwell and Mark Johnson Parsing the Wall Street Journal using LFG and Discriminative Estimation Techniques Stefan Riezler and John Maxwell Grammatical Machine Translation. Fei Xia and Michael McCord Improving a statistical MT system with automatically learned rewrite patterns