NRC Report Conclusion Tu Zhaopeng 2009-09-08. NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Novel Reordering Approaches in Phrase-Based Statistical Machine Translation S. Kanthak, D. Vilar, E. Matusov, R. Zens & H. Ney ACL Workshop on Building.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
ACL 2005 WORKSHOP ON BUILDING AND USING PARALLEL TEXTS (WPT-05), Ann Arbor, MI. June Competitive Grouping in Integrated Segmentation and Alignment.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
Application of RNNs to Language Processing Andrey Malinin, Shixiang Gu CUED Division F Speech Group.
9/12/2003LTI Student Research Symposium1 An Integrated Phrase Segmentation/Alignment Algorithm for Statistical Machine Translation Joy Advisor: Stephan.
Statistical Machine Translation Part IX – Better Word Alignment, Morphology and Syntax Alexander Fraser ICL, U. Heidelberg CIS, LMU München
1 1 Automatic Transliteration of Proper Nouns from Arabic to English Mehdi M. Kashani, Fred Popowich, Anoop Sarkar Simon Fraser University Vancouver, BC.
Spoken Language Translation 1 Intelligent Robot Lecture Note.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Statistical Machine Translation Part V - Advanced Topics Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Statistical Machine Translation Part V – Better Word Alignment, Morphology and Syntax Alexander Fraser CIS, LMU München Seminar: Open Source.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
1 The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France)
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
1 Gholamreza Haffari Simon Fraser University PhD Seminar, August 2009 Machine Learning approaches for dealing with Limited Bilingual Data in SMT.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Reordering Model Using Syntactic Information of a Source Tree for Statistical Machine Translation Kei Hashimoto, Hirohumi Yamamoto, Hideo Okuma, Eiichiro.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
Korea Maritime and Ocean University NLP Jung Tae LEE
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Model Chun-Jen Lee Jason S. Chang Thomas C. Chuang AMTA 2004.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
A New Approach for English- Chinese Named Entity Alignment Donghui Feng Yayuan Lv Ming Zhou USC MSR Asia EMNLP-04.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
The P YTHY Summarization System: Microsoft Research at DUC 2007 Kristina Toutanova, Chris Brockett, Michael Gamon, Jagadeesh Jagarlamudi, Hisami Suzuki,
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
A Syntax-Driven Bracketing Model for Phrase-Based Translation Deyi Xiong, et al. ACL 2009.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
N-best list reranking using higher level phonetic, lexical, syntactic and semantic knowledge sources Mithun Balakrishna, Dan Moldovan and Ellis K. Cave.
Neural Machine Translation by Jointly Learning to Align and Translate
Issues in Arabic MT Alex Fraser USC/ISI 9/22/2018 Issues in Arabic MT.
Deep Learning based Machine Translation
Build MT systems with Moses
Statistical Machine Translation Papers from COLING 2004
The XMU SMT System for IWSLT 2007
Statistical NLP Spring 2011
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

NRC Report Conclusion Tu Zhaopeng

NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based system:  Pre-process source text  Viterbi decoding using loglinear model  Nbest rescoring using fancier loglinear model  Post-process raw translation

NIST06  Pre-processing:  Convert to GB2312, removing traditional characters with no GB2312 representation  Segment using LDC segmenter  Translate numbers and dates using rules  Strip non-ASCII OOV’s

NIST06  Post-processing  Truecase using 4-gram HMM (via SRILM disambig) trained on parallel corpus  Detokenization heuristics

NIST06  Rescoring  Rescoring based on 5k-best lists, using Powell’s algorithm to find max-BLEU weights  Features (22)  All 12 decoder features  Character length  IBM2 scores in both directions  IBM1-based “missing word” feature (compare score of best translation for each word to best known)  Posterior probabilities calculated from nbest list for: sentence length, phrases, words, unigrams, and bigrams.

NIST06  Search Parameters

NIST08  Towards Tighter Integration of Rule-based and Statistical MT in Serial System Combination  Rule-based  Systran  Phrase-based  Portage

NIST08  Annotation of Systran output, five different chunk types:  named entities, numbers, dates  unknown words or unlikely sequences of short words  ‘strong’ rules : very reliable chunks, e.g., rules based on a long distance syntactic relationship, or a long multiword expression

NIST09  Serial system combination

NIST09  NRC system trained on SY/EN parallel corpus:  use SYSTRAN to translate ZH half of parallel ZH/EN training corpus, discarding UN, HKH/L corpora for eciency ! 3M sentence pairs  preprocess SY: strip markup, tokenize, lowercase  standard phrase-based training

NIST09  Two strategies that didn't work:  Exploit SY/EN surface similarity: boost HMM ttable scores of similar forms, prior to phrase extraction ! no improvement  Use SY case information: adopt SY case for aligned EN words|no improvement compared to baseline independent truecaser

NIST09  Common features:  phrase table based on symmetrized HMM word alignments (4 features: lex+rf, fwd+bkw)  5g mixture LM from parallel corpus (Foster & Kuhn, WMT07)  6g LM from GW  word count and distortion

NIST09

 Useful  rescoring with IBM- and nbest-based features (Ueng and Ney, CL07; Chen et al, IWSLT05): +0.3 BLEU  greedy feature pruning for rescoring +0.3 BLEU  truecasing with \title trick": +0.3 BLEU