A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

Slides:



Advertisements
Similar presentations
Translator Architecture Code Generator ParserTokenizer string of characters (source code) string of tokens abstract program string of integers (object.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.
Grammars, Languages and Parse Trees. Language Let V be an alphabet or vocabulary V* is set of all strings over V A language L is a subset of V*, i.e.,
GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.
For Monday Read Chapter 23, sections 3-4 Homework –Chapter 23, exercises 1, 6, 14, 19 –Do them in order. Do NOT read ahead.
Albert Gatt LIN3022 Natural Language Processing Lecture 8.
Copyright © 2006 The McGraw-Hill Companies, Inc. Programming Languages 2nd edition Tucker and Noonan Chapter 2 Syntax A language that is simple to parse.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
1 Learning Translation Templates from Bilingual Translation Examples Source: Applied Intelligence, 2001 Authors: Ilyas Cicekli and H. Altay Guvenir Reporter:
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
1/15 Synchronous Tree-Adjoining Grammars Authors: Stuart M. Shieber and Yves Schabes Reporter: 江欣倩 Professor: 陳嘉平.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Chapter 3: Formal Translation Models
Parsing SLP Chapter 13. 7/2/2015 Speech and Language Processing - Jurafsky and Martin 2 Outline  Parsing with CFGs  Bottom-up, top-down  CKY parsing.
Creation of a Russian-English Translation Program Karen Shiells.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Invitation to Computer Science 5th Edition
The College of Saint Rose CIS 433 – Programming Languages David Goldschmidt, Ph.D. from Concepts of Programming Languages, 9th edition by Robert W. Sebesta,
Syntax Directed Definitions Synthesized Attributes
ICS611 Introduction to Compilers Set 1. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Chapter 10: Compilers and Language Translation Invitation to Computer Science, Java Version, Third Edition.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Chapter 6 Programming Languages (2) Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Dependency Tree-to-Dependency Tree Machine Translation November 4, 2011 Presented by: Jeffrey Flanigan (CMU) Lori Levin, Jaime Carbonell In collaboration.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Parsing with Context Free Grammars.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Review 1.Lexical Analysis 2.Syntax Analysis 3.Semantic Analysis 4.Code Generation 5.Code Optimization.
Sentence Compression Based on ILP Decoding Method Hongling Wang, Yonglei Zhang, Guodong Zhou NLP Lab, Soochow University.
Parsing Lecture 5 Fri, Jan 28, Syntax Analysis The syntax of a language is described by a context-free grammar. Each grammar rule has the form A.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
CPS 506 Comparative Programming Languages Syntax Specification.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
CSA2050 Introduction to Computational Linguistics Parsing I.
Muhammad Idrees, Lecturer University of Lahore 1 Top-Down Parsing Top down parsing can be viewed as an attempt to find a leftmost derivation for an input.
CPSC 503 Computational Linguistics
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Organization of Programming Languages Meeting 3 January 15, 2016.
Parsing with Context Free Grammars. Slide 1 Outline Why should you care? Parsing Top-Down Parsing Bottom-Up Parsing Bottom-Up Space (an example) Top -
Approaches to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Parsing and Parser Parsing methods: top-down & bottom-up
--Mengxue Zhang, Qingyang Li
Presentation by Julie Betlach 7/02/2009
Programming Languages 2nd edition Tucker and Noonan
Approaches to Machine Translation
Statistical Machine Translation Papers from COLING 2004
A Path-based Transfer Model for Machine Translation
PRESENTATION: GROUP # 5 Roll No: 14,17,25,36,37 TOPIC: STATISTICAL PARSING AND HIDDEN MARKOV MODEL.
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Faculty of Computer Science and Information System
Presentation transcript:

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN Reporter: 江欣倩 Professor: 陳嘉平

Introduction The motivation  exploit syntactic structure features to model translation process two major benefits of our STSG-based tree-to-tree alignment model  It is possible to explicitly model the syntax of the target language, thereby improve the grammaticality of target sentence.  this model has more expressive power and flexibility since it allows multi-level global structure distortion of the tree typology and fully utilizes source and target parse tree structure features.

Synchronous TSG Synchronous TSG (STSG)  Σ s and Σ t : source and target terminal alphabets (POSs or lexical words) N s and N t : source and target non-terminal alphabets S s ∈ N s and S t ∈ N t : the source and target start symbols P: a production rule set  a pair of elementary tree (ξ s ↔ξ t ) with linking relation between leaf nodes in source elementary tree (ξ s ) and leaf nodes in target elementary tree (ξ t )

PET PET: a production or a rule is a pair of elementary tree with alignment information ξ s : a source elementary tree ξt : a target elementary tree A: the alignments between leaf nodes of two elementary trees  A ⊆ {(i, j) :i is the position of i th leaf node of ξ s ; j is the position of j th leaf node of ξ t }

STSG-based Tree-to-Tree Alignment source sentences target sentences source and target parse trees

STSG-based Tree-to-Tree Alignment hidden variable D

STSG-based Tree-to-Tree Alignment Four sub-models  Parse model  Detachment model  Translation model Tree alignment selection model Structure transfer model  Generation model

Tree-to-tree translation model works The source sentence is parsed in a source parse tree T s The parse tree T s is detached into three elementary trees The three PETs are selected to map the three source elementary trees to three target elementary threes, which are combined to T t A target translation is generated from the target parse tree

Tree-to-tree translation model works

Features Simplify the model  Parse model  Detachment model  Generation model After model simplification

Features Bidirectional elementary tree mapping probability Bidirectional elementary tree lexical translation probability Language model Number of elementary tree pairs used: K Number of target words: I

Rule Extraction  T(z): a parse tree covering string z  Two categories initial PET ( ): all leaf nodes in both source and target elementary trees of a PET are terminals  ∀ (i, j) ∈ A: i 1 ≤i≤i 2 ↔j 1 ≤j≤j 2 abstract PET

Decoding Two main steps  Use a CFG-based chart parser to parse input sentence  A STSG-based bottom-up beam search algorithm

A STSG-based bottom-up beam search algorithm

Experiment Dataset  Chinese-to-English translation  HIT Chinese-English corpus Only one reference  LM: 9k English sentences Threshold  c=5  pTableLen=30  pTablePro=-100 (log probability)  hTableLen=100  hTablePro=-100

Results

Conclusion Show how to utilize linguistic syntax structure features for SMT. STSG-based tree-to-tree alignment method is much more effective in modeling global reordering and structure transfer than phrase- based and SCFG-based methods.