INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.

Slides:

Advertisements

Similar presentations

Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.

Advertisements

Tracking L2 Lexical and Syntactic Development Xiaofei Lu CALPER 2010 Summer Workshop July 14, 2010.

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.

May 2006CLINT-LN Parsing1 Computational Linguistics Introduction Approaches to Parsing.

Word Sense Disambiguation for Machine Translation Han-Bin Chen

计算机科学与技术学院 Chinese Semantic Role Labeling with Dependency-driven Constituent Parse Tree Structure Hongling Wang, Bukang Wang Guodong Zhou NLP Lab, School.

Using Percolated Dependencies in PBSMT Ankit K. Srivastava and Andy Way Dublin City University CLUKI XII: April 24, 2009.

GRAMMAR & PARSING (Syntactic Analysis) NLP- WEEK 4.

Predicting Text Quality for Scientific Articles Annie Louis University of Pennsylvania Advisor: Ani Nenkova.

1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.

A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.

Machine Translation via Dependency Transfer Philip Resnik University of Maryland DoD MURI award in collaboration with JHU: Bootstrapping Out of the Multilingual.

Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.

A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.

Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.

Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.

Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.

PFA Node Alignment Algorithm Consider the parse trees of a Chinese-English parallel pair of sentences.

English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.

Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.

Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.

Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.

Tree-adjoining grammar (TAG) is a grammar formalism defined by Aravind Joshi and introduced in Tree-adjoining grammars are somewhat similar to context-free.

Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.

Learning to Transform Natural to Formal Language Presented by Ping Zhang Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney.

INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.

Scalable Inference and Training of Context- Rich Syntactic Translation Models Michel Galley, Jonathan Graehl, Keven Knight, Daniel Marcu, Steve DeNeefe.

Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.

Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.

The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.

NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.

Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.

INSTITUTE OF COMPUTING TECHNOLOGY Forest-based Semantic Role Labeling Hao Xiong, Haitao Mi, Yang Liu and Qun Liu Institute of Computing Technology Academy.

Statistical Phrase Alignment Model Using Dependency Relation Probability Toshiaki Nakazawa and Sadao Kurohashi Kyoto University.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

1 Intelligente Analyse- und Informationssysteme Frank Reichartz, Hannes Korte & Gerhard Paass Fraunhofer IAIS, Sankt Augustin, Germany Dependency Tree.

What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.

Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.

Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.

LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.

A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Evaluating Models of Computation and Storage in Human Sentence Processing Thang Luong CogACLL 2015 Tim J. O’Donnell & Noah D. Goodman.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Learning a Compositional Semantic Parser.

Supertagging CMSC Natural Language Processing January 31, 2006.

Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.

Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.

2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.

Support Vector Machines and Kernel Methods for Co-Reference Resolution 2007 Summer Workshop on Human Language Technology Center for Language and Speech.

1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.

NLP. Introduction to NLP Time flies like an arrow –Many parses –Some (clearly) more likely than others –Need for a probabilistic ranking method.

Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.

Towards Syntactically Constrained Statistical Word Alignment Greg Hanneman : Advanced Machine Translation Seminar April 30, 2008.

Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.

October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.

A Simple English-to-Punjabi Translation System By : Shailendra Singh.

LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.

Natural Language Processing Vasile Rus

Neural Machine Translation

Eiji Aramaki* Sadao Kurohashi* * University of Tokyo

Statistical Machine Translation Papers from COLING 2004

Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.

Artificial Intelligence 2004 Speech & Natural Language Processing

Rule Markov Models for Fast Tree-to-String Translation

Presentation transcript:

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese Academy of Sciences

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Syntactic and Non-syntactic Bilingual Phrases NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech syntactic non-syntactic

INSTITUTE OF COMPUTING TECHNOLOGY Importance of Non-syntactic Bilingual Phrases About 28% of bilingual phrases are non-syntactic on a English-Chinese corpus (Marcu et al., 2006). Requiring bilingual phrases to be syntactically motivated will lose a good amount of valuable knowledge (Koehn et al., 2003). Keeping the strengths of phrases while incorporating syntax into statistical translation results in significant improvements (Chiang, 2005).

INSTITUTE OF COMPUTING TECHNOLOGY Previous Work NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech Galley et al., 2004

INSTITUTE OF COMPUTING TECHNOLOGY Previous Work NPB DTJJNN themutualunderstanding THEMUTUALUNDERSTANDING DTJJ themutual THEMUTUAL *NPB_*NN NPB *NPB_*NNNN Marcu et al., 2006

INSTITUTE OF COMPUTING TECHNOLOGY Previous Work NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH Liu et al., 2006 BUSHPRESIDENT PresidentBush NRNN NP PresidentBush

INSTITUTE OF COMPUTING TECHNOLOGY Our Work We augment the tree-to-string translation model with forest-to-string rules that capture non-syntactic phrase pairs auxiliary rules that help integrate forest-to-string rules into the tree-to-string model

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Tree-to-String Rules GUNMAN NN thegunman VP SBVP WASNPVV NNKILLED waskilledby IP NPVPPU

INSTITUTE OF COMPUTING TECHNOLOGY Derivation A derivation is a left-most composition of translation rules that explains how a source parse tree, a target sentence, and the word alignment between them are synchronously generated. IP NPVPPU NN GUNMAN the gunman SBVP WASNPVV NNKILLED was killed by POLICE police..

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs IP NPVPPU IP NPVPPU

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs IP NPVPPU GUNMAN NN thegunman NP NN GUNMAN thegunman

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs IP NPVPPU NN GUNMAN thegunman VP SBVP WASNPVV NNKILLED waskilledby SBVP WASNPVV NNKILLED was killed by

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs IP NPVPPU NN GUNMAN thegunman police SBVP WASNPVV NNKILLED was killed by NN POLICE police

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs. PU. IP NPVPPU NN GUNMAN thegunman SBVP WASNPVV NNKILLED was killed by POLICE police..

INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String and Auxiliary Rules NN GUNMAN thegunman SB WAS was NP forest = tree sequence ! IP NPVPPU SBVP care about only root sequence while incorporating forest rules

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs, FRs, and ARs

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs, FRs, and ARs IP NPVPPU SBVP IP NPVPPU SBVP

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs, FRs, and ARs IP NPVPPU SBVP NN GUNMAN thegunman SB WAS was NP NN GUNMANWAS thegunmanwas

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs, FRs, and ARs NP killedby PU.. VP VV KILLED IP NPVPPU NN GUNMAN thegunman SBVP WASNPVV KILLED was killed by..

INSTITUTE OF COMPUTING TECHNOLOGY A Derivation Composed of TRs, FRs, and ARs IP NPVPPU NN GUNMAN thegunman SBVP WASNPVV NNKILLED was killed by POLICE police.. NN POLICE NP

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Training Extract both tree-to-string and forest-to- string rules from word-aligned, source-side parsed bilingual corpus Bottom-up strategy Auxiliary rules are NOT learnt from real- world data

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NR BUSH Bush

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NN PRESIDENT President

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech VV MADE made

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NN SPEECH speech

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NP NR President NN BUSHPRESIDENT Bush NP NR President NN PRESIDENT NP NRNN BUSH Bush NP NRNN

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech VP VVNN a VP VVNN madea MADE VP VVNN aspeech SPEECH VP VVNN madeaspeech MADESPEECH

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NPVVNP VVNRNN NP VVNRNN MADEBUSHPRESIDENT madePresidentBush 10 FRs

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH PresidentBushmadeaspeech NP VP max_height = 2

INSTITUTE OF COMPUTING TECHNOLOGY Why We Don’t Extract Auxiliary Rules ? IP NPVP-B NP-B NR NNCCNN SHANGHAIPUDONGDEVEWITHLEGALESTAB VV STEP The development of Shanghai ‘s Pudong is in step with the establishment of its legal system

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Decoding Input: a source parse tree Output: a target sentence Bottom-up strategy Build auxiliary rules while decoding Compute subcell divisions for building auxiliary rules

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH NR BUSH Bush ( NR BUSH ) ||| Bush ||| 1:1 Rule Derivation Translation Bush

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH NN PRESIDENT President ( NN PRESIDENT ) ||| President ||| 1:1 Rule Derivation Translation President

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH VV MADE made ( VV MADE ) ||| made ||| 1:1 Rule Derivation Translation made

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH NN SPEECH speech ( NN SPEECH ) ||| speech ||| 1:1 Rule Derivation Translation speech

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH NP ( NP ( NR ) ( NN ) ) ||| X 1 X 2 ||| 1:2 2:1 ( NR BUSH ) ||| Bush ||| 1:1 ( NN PRESIDENT ) ||| President ||| 1:1 Rule Derivation Translation President Bush NRNN

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH Rule Derivation Translation

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH VP ( VP ( VV MADE ) ( NN ) ) ||| made a X ||| 1:1 2:3 ( NN SPEECH ) ||| speech ||| 1:1 Rule Derivation Translation made a speech VVNN MADE madea

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH NP ( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE ) ||| President X made a ||| 1:2 2:1 3:3 ( NR BUSH ) ||| Bush ||| 1:1 Rule Derivation Translation President Bush made a NRNNVV PRESIDENTMADE Presidentmadea

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH Rule Derivation Translation

INSTITUTE OF COMPUTING TECHNOLOGY An Example NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH Rule Derivation Translation NP VP VVNN ( NP ( NP ) ( VP ( VV ) ( NN ) ) ) ||| X 1 X 2 || 1:1 2:1 3:2 ( NP ( NN ) ( NN PRESIDENT ) ) ( VV MADE ) ||| President X made a ||| 1:2 2:1 3:3 ( NR BUSH ) ||| Bush ||| 1:1 ( NN SPEECH ) ||| speech ||| 1:1 President Bush made a speech

INSTITUTE OF COMPUTING TECHNOLOGY Subcell Division NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH 1:1 2:2 3:3 4:4

INSTITUTE OF COMPUTING TECHNOLOGY Subcell Division NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH 1:3 4:4

INSTITUTE OF COMPUTING TECHNOLOGY Subcell Division NP VP NRNNVVNN BUSHPRESIDENTMADESPEECH 1:4 1:1 2:4 1:2 3:4 1:3 4:4 1:1 2:2 3:4 1:1 2:3 4:4 1:2 3:3 4:4 1:1 2:2 3:3 4:4 2^(n-1)

INSTITUTE OF COMPUTING TECHNOLOGY Build Auxiliary Rule NP VP NRNNVVNN BUSHPRESIDENTMADESPEECHNP VP NRNNVVNN

INSTITUTE OF COMPUTING TECHNOLOGY Penalize the Use of FRs and ARs Auxiliary rules, which are built rather than learnt, have no probabilities. We introduce a feature that sums up the node count of auxiliary rules to balance the preference between conventional tree-to-string rules new forest-to-string and auxiliary rules

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Experiments Training corpus: 31,149 sentence pairs with 843K Chinese words and 949K English words Development set: 2002 NIST Chinese-to- English test set (571 of 878 sentences) Test set: 2005 NIST Chinese-to-English test set (1,082 sentences)

INSTITUTE OF COMPUTING TECHNOLOGY Tools Evaluation: mteval-v11b.pl Language model: SRI Language Modeling Toolkits (Stolcke, 2002) Significant test: Zhang et al., 2004 Parser: Xiong et al., 2005 Minimum error rate training: optimizeV5IBMBLEU.m (Venugopal and Vogel, 2005)

INSTITUTE OF COMPUTING TECHNOLOGY Rules Used in Experiments RuleLPUTotal BP251, TR56, 98341, 0273, , 539 FR16, , 34625, , 006

INSTITUTE OF COMPUTING TECHNOLOGY Comparison SystemRule SetBLEU4 PharaohBP0.2182± Lynx BP0.2059± TR0.2302± TR + BP0.2346± TR + FR + AR0.2402±0.0087

INSTITUTE OF COMPUTING TECHNOLOGY TRs Are Still Dominant To achieve the best result of , Lynx made use of: 26, 082 tree-to-string rules 9,219 default rules 5,432 forest-to-string rules 2,919 auxiliary rules

INSTITUTE OF COMPUTING TECHNOLOGY Effect of Lexicalization Forest-to-String Rule SetBLEU4 None0.2225± L0.2297± P0.2279± U0.2270± L + P + U0.2312±0.0082

INSTITUTE OF COMPUTING TECHNOLOGY Outline Introduction Forest-to-String Translation Rules Training Decoding Experiments Conclusion

INSTITUTE OF COMPUTING TECHNOLOGY Conclusion We augment the tree-to-string translation model with forest-to-string rules that capture non-syntactic phrase pairs auxiliary rules that help integrate forest-to-string rules into the tree-to-string model Forest and auxiliary rules enable tree-to- string models to derive in a more general way and bring significant improvement.

INSTITUTE OF COMPUTING TECHNOLOGY Future Work Scale up to large data Further investigation in auxiliary rules

INSTITUTE OF COMPUTING TECHNOLOGY Thanks!