Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Coreference Based Event-Argument Relation Extraction on Biomedical Text Katsumasa Yoshikawa 1), Sebastian Riedel 2), Tsutomu Hirao 3), Masayuki Asahara.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan;
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Automatic Classification of Semantic Relations between Facts and Opinions Koji Murakami, Eric Nichols, Junta Mizuno, Yotaro Watanabe, Hayato Goto, Megumi.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Alignment by Bilingual Generation and Monolingual Derivation Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.
Grammatical Machine Translation Stefan Riezler & John Maxwell.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
To join sentences in English we use ‘and’. To join sentences in Japanese we use the ‘ て form’. Example adjectives: It is big. It is fun. おおきいです。たのしいです。
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
1 Exploiting Syntactic Patterns as Clues in Zero- Anaphora Resolution Ryu Iida, Kentaro Inui and Yuji Matsumoto Nara Institute of Science and Technology.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Japanese Dependency Analysis using Cascaded Chunking Taku Kudo 工藤 拓 Yuji Matsumoto 松本 裕治 Nara Institute Science and Technology, JAPAN.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.
2006/12/081 Large Scale Crawling the Web for Parallel Texts Chikayama Taura lab. M1 Dai Saito.
Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.
Bayesian Subtree Alignment Model based on Dependency Trees Toshiaki Nakazawa Sadao Kurohashi Kyoto University 1 IJCNLP2011.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Finding Translation Correspondences from Parallel Parsed Corpus for Example-based Translation Eiji Aramaki (Kyoto-U), Sadao Kurohashi (U-Tokyo), Satoshi.
2007/4/201 Extracting Parallel Texts from Massive Web Documents Chikayama Taura lab. M2 Dai Saito.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Example-based Machine Translation Pursuing Fully Structural NLP Sadao Kurohashi, Toshiaki Nakazawa, Kauffmann Alexis, Daisuke Kawahara University of Tokyo.
Linguistically-motivated Tree-based Probabilistic Phrase Alignment Toshiaki Nakazawa, Sadao Kurohashi (Kyoto University)
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Noun Modification Describing nouns. りん ご red fresh yummy あかい あたらし い おいしい big 大き い.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters Chenhui Chu, Toshiaki Nakazawa and Sadao Kurohashi Graduate School of Informatics,
An Improved Hierarchical Word Sequence Language Model Using Word Association NARA Institute of Science and Technology Xiaoyi WuYuji Matsumoto.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Approaches to Machine Translation
Compilers B V Sai Aravind (11CS10008).
Approaches to Machine Translation
Introduction to Natural Language Processing
Statistical Machine Translation Papers from COLING 2004
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presentation transcript:

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and Technology Masaaki Nagata NTT Communication Science Laboratories

2 Overview of NAIST-NTT System pleasewritedownyouraddresshere 住所をここに書い下さいて 住所をここに書い下さいて Improve translation model by phrase reordering address-ACC here -LOC writeplease

3 Motivation Translation model using syntactic and semantic information has not yet succeeded Improve distortion model between language pairs with different word orders Improve statistical machine translation by using predicate-argument structure Improve word alignment by phrase reordering

4 Outline Overview Phrase Reordering by Predicate- argument Structure Experiments and Results Discussions Conclusions Future Work

5 Phrase Reordering by Predicate-argument Structure 住所をここに書い下さいて 住所をここに書い下さいて Phrase reordering by morphological analysis (Niessen and Ney, 2001) Phrase reordering by parsing (Collins et al., 2005) Predicate-argument structure analysis address-ACC here -LOC writeplease

6 Predicate-argument Structure Analyzer: SynCha Predicate-argument structure analyzer based on (Iida et al., 2006) and (Komachi et al., 2006) Identify predicates (verb/adjective/event- denoting noun) and their arguments Trained on NAIST Text Corpus Can cope with zero-anaphora and ellipsis Achieves F-score 0.8 for arguments within a sentence

7 住所をここに書い下さいて 住所をここに書い下さいて 住所をここに書い下さいて WO-ACCNI-LOCpredicate Predicate-argument Structure Analysis Steps address-ACC here -LOC writeplease

8 Phrase Reordering Steps 住所をここに書い下さいて 住所をここに書い下さいて Find predicates(verb/adjective/event- denoting noun) Use heuristics to match English word order address-ACC here -LOC writeplease

9 Preprocessing Japanese side Morphological analyzer/Tokenizer: ChaSen Dependency parser: CaboCha Predicate-argument structure: SynCha English side Tokenizer: tokenizer.sed (LDC) Morphological analyzer: MXPOST All English words were lowercased for training

10 Aligning Training Corpus Manually aligned 45,909 sentence pairs out of 39,953 conversations かしこまり まし た 。 この 用紙 に 記入 し て 下さ い 。 sure. please fill out this form. かしこまり まし た 。 この 用紙 に 記入 し て 下さ い 。 sure.please fill out this form.

11 Training Corpus Statistics この 用紙 に 記入 し て 下さい please fill out this form 記入 し て 下さい この 用紙 に Add each pair to training corpus Learn word alignment by GIZA++ # of sent. Improve alignment33,874 Degrade alignment7,959 No change4,076 Total45,909 # of sent. Reordered18,539 Contain crossing39,979 thisform-LOC pleasewrite

12 Experiments WMT 2006 shared task baseline system trained on normal order corpus with default parameters Baseline system trained on pre- processed corpus with default parameters Baseline system trained on pre- processed corpus with parameter optimization by a minimum error rate training tool (Venugopal, 2005)

13 Translation Model and Language Model Translation model GIZA++ (Och and Ney, 2003) Language model Back-off word trigram model trained by Palmkit (Ito, 2002) Decoder WMT 2006 shared task baseline system (Pharaoh)

14 Minimum Error Rate Training (MERT) Optimize translation parameters for Pharaoh decoder Phrase translation probability (JE/EJ) Lexical translation probability (JE/EJ) Phrase penalty Phrase distortion probability Trained with 500 normal order sentences

15 Results SystemBLEUNIST ASR 1-BEST Baseline Proposed (w/o MERT) Proposed (w/ MERT) Correct recognition Baseline Proposed (w/o MERT) Proposed (w/ MERT)

16 Results for the Evaluation Campaign While it had high accuracy on translation of content words, it had poor results on individual word translation ASR: BLEU 12/14, NIST 11/14, METEOR 6/14 Correct Recognition: BLEU 12/14, NIST 10/14, METEOR 7/14 Pretty high WER

17 Discussion Better accuracy over the baseline system Improve translation model by phase reordering Degrade accuracy by MERT Could not find a reason yet Could be explained by the fact that we did not put any constraints on reordered sentences (They may be ungrammatical on Japanese side) Predicate-argument structure accuracy SynCha is trained on newswire sources (not optimized for travel conversation)

18 Discussion (Cont.) Phrase alignment got worse by splitting a case marker from its dependent verb 住所をここに書い下さいて 住所をここに書い下さいて pleasewritedownyouraddresshere address-ACC here -LOC writeplease

19 Conclusions Present phrase reordering model based on predicate-argument structure The phrase reordering model improved translation accuracy over the baseline method

20 Future work Investigate the reason why MERT does not work Make reordered corpus more grammatical (reorder only arguments) Use newswire sources to see the effect of correct predicate-argument structure Reorder sentences which have crossing alignments only Use verb clustering and map arguments automatically