Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Large Language Models in Machine Translation Conference on Empirical Methods in Natural Language Processing 2007 報告者:郝柏翰 2013/06/04 Thorsten Brants, Ashok.
Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding Delphine Bernhard and Iryna Gurevvch Ubiquitous.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Learning Phonetic Similarity for Matching Named Entity Translation and Mining New Translations Wai Lam, Ruizhang Huang, Pik-Shan Cheung ACM SIGIR 2004.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Evaluation of Context-Dependent Phrasal Translation Lexicons for Statistical Machine Translation M arine C ARPUAT and D ekai W U Human Language Technology.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Yang Liu State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information Science and Technology Department of Computer.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
NLP. Machine Translation Tree-to-tree – Yamada and Knight Phrase-based – Och and Ney Syntax-based – Och et al. Alignment templates – Och and Ney.
微软亚洲研究院汉英翻译系统 CWMT2008 评测技术报告 张冬冬 李志灏 李沐 周明 微软亚洲研究院.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
Neural Machine Translation
Alexander Fraser CIS, LMU München Machine Translation
Suggestions for Class Projects
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Build MT systems with Moses
Memory-augmented Chinese-Uyghur Neural Machine Translation
Statistical Machine Translation Papers from COLING 2004
The XMU SMT System for IWSLT 2007
Statistical NLP Spring 2011
Statistical Machine Translation Part VI – Phrase-based Decoding
Presentation transcript:

Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

Our group Natural Language Processing Laboratory, College of information science and engineering, Northeastern University Long history for working on a variety of problems related to machine translation including – Multi-language MT – Rule-based MT – Example-based MT Our research work on SMT started in late 2007 Welcome to our homepage

People (SMT) at NEUNLPLab Faculty – Zhu Jingbo / 朱靖波 (Professor) – Ren Feiliang / 任飞亮 (Lecturer) – Wang Huizhen / 王会珍 (Lecturer) PhD Students – Xiao Tong / 肖桐 Master Students – Li Tianning / 李天宁 – Zhang Zhuyu / 张祝玉 – Chen Rushan / 陈如山

Task of CWMT08 Four sub-tasks – Chinese -> English News – English -> Chinese News – English -> Chinese Science and Technology – System combination of Chinese -> English News We participated in – Sub-task1 (2 systems) – Sub-task2 (2 systems)

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

System description Our basic system is a state-of-the-art Phrase- based statistical machine translation system. Characters of our system – Phrase-based SMT (Philipp et al, 2003) – Log-linear model (Och and Ney, 2002) – 7 features The major part (decoder) of our system is Moses (

Features – part1 Feature – phrase translation probability – inverse phrase translation probability – lexical translation probability – inverse lexical translation probability – language model – sentence length penalty – MSD reordering model 中国 政府 向 外商 投资 企业 提供 支持 。 the chinese government may provide support to foreign invested enterprises. Phrase pair : 中国政府  the chinese government 中国政府  chinese government 外商  foreign ……. ……

Features – part2 Feature – phrase translation probability – inverse phrase translation probability – lexical translation probability – inverse lexical translation probability – language model – sentence length penalty – MSD reordering model Pr(the chinese government may provide support …) =Pr( the ) × Pr( chinese | the ) × Pr( government | chinese the ) × … Pr(w i | w i-n+1 w i-2 … w i-1 ) …  5-gram (For both Chinese->English and English->Chinese sub-tasks)  smoothing

Features – part3 Feature – phrase translation probability – inverse phrase translation probability – lexical translation probability – inverse lexical translation probability – language model – sentence length penalty – MSD reordering model Swap Monotone source phrase pair 1 phrase pair 2 target source phrase pair 1 phrase pair 2 target source phrase pair 1 phrase pair 2 target discontinuous Discontinuous  Reordering model Three ways (Monotone, Swap and Discontinuous) 6 sub-features

System Pre-processing – Chinese segmentation (our lab) – English tokenization (tokenizeE.perl.tmpl) – NE recognition of time, date and person (a rule-based system of our lab) – Remove case Word alignment – GIZA++ ( – Alignment symmetrization (Philipp et al, 2003) Phrase extraction and scoring (Philipp et al, 2003) Language model – SRILM ( Decoding – Moses decoder ( – NE translation (a rule-based system of our lab) Post-processing – Recase for English (a rule-based system of our lab)

Systems for CWMT08 System 1 for Chinese->English News – Basic system – Limited data condition System 2 for Chinese->English News – Basic system – Large Language model (trained on NIST data) System 1 for English->Chinese News – Basic system – Limited data condition System 2 for English->Chinese News – Modified pro-processing strategy – Limited data condition

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

Data Data provided within the CWMT08 evaluation tasks ChineseEnglish Sentence Word Vocabulary Data used to train LM (C->E system2) –The English side of the LDC parallel corpus + gigaword-xinhua LDC2003E14, LDC2004T08, LDC2005T06, LDC2004T07, LDC2003E07, LDC2004E12, LDC2007T07

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

Experimental results CWMT08 Chinese->English News systemtimeBLEU4NIST5GTMmWERmPERICT System12:49: System23:30: CWMT08 English->Chinese News systemtimeBLEU5BLEU6NIST6NIST7GTMmWERmPERICT System12:54: System22:55: Environment Xeon(TM) 3.00GHz + 32GB memory + Linux

Outline Overview System description – Basic MT system – Systems for CWMT08 Data Experiment Summary

We built – Two systems for Chinese->English News – Two systems for English->Chinese News Problems of our system – Word alignment – Reordering In future – Syntax-based SMT – System combination

Thank you