Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Joint Parsing and Alignment with Weakly Synchronized Grammars David Burkett, John Blitzer, & Dan Klein TexPoint fonts used in EMF. Read the TexPoint manual.
CSA4050: Advanced Topics in NLP Example Based MT.
Agenda AH systems evolution, GAF
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
A Tree-to-Tree Alignment- based Model for Statistical Machine Translation Authors: Min ZHANG, Hongfei JIANG, Ai Ti AW, Jun SUN, Sheng LI, Chew Lim TAN.
Flow Network Models for Sub-Sentential Alignment Ying Zhang (Joy) Advisor: Ralf Brown Dec 18 th, 2001.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
A Hierarchical Phrase-Based Model for Statistical Machine Translation Author: David Chiang Presented by Achim Ruopp Formulas/illustrations/numbers extracted.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Kuang Ru; Jinan Xu; Yujie Zhang; Peihao Wu Beijing Jiaotong University
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University.
A Survey for Interspeech Xavier Anguera Information Retrieval-based Dynamic TimeWarping.
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
Kyoshiro SUGIYAMA, AHC-Lab., NAIST An Investigation of Machine Translation Evaluation Metrics in Cross-lingual Question Answering Kyoshiro Sugiyama, Masahiro.
Better Punctuation Prediction with Dynamic Conditional Random Fields Wei Lu and Hwee Tou Ng National University of Singapore.
Named Entity Recognition based on Bilingual Co-training Li Yegang School of Computer, BIT.
INSTITUTE OF COMPUTING TECHNOLOGY Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing.
Copyright © Lumivox International Co., Ltd. Learning Goal 1. To be able to form verbal phrases. 2. To be able to talk about.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug.
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.
AVENUE Automatic Machine Translation for low-density languages Ariadna Font Llitjós Language Technologies Institute SCS Carnegie Mellon University.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
INSTITUTE OF COMPUTING TECHNOLOGY Forest-to-String Statistical Translation Rules Yang Liu, Qun Liu, and Shouxun Lin Institute of Computing Technology Chinese.
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
Cache-based Document-level Statistical Machine Translation Prepared for I 2 R Reading Group Gongzhengxian 10 OCT 2011.
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
A non-contiguous Tree Sequence Alignment-based Model for Statistical Machine Translation Jun Sun ┼, Min Zhang ╪, Chew Lim Tan ┼ ┼╪
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
From Text to Image: Generating Visual Query for Image Retrieval Wen-Cheng Lin, Yih-Chen Chang and Hsin-Hsi Chen Department of Computer Science and Information.
Imposing Constraints from the Source Tree on ITG Constraints for SMT Hirofumi Yamamoto, Hideo Okuma, Eiichiro Sumita National Institute of Information.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Example-based Machine Translation Pursuing Fully Structural NLP Sadao Kurohashi, Toshiaki Nakazawa, Kauffmann Alexis, Daisuke Kawahara University of Tokyo.
Observations on Scientific Writing in Social Sciences Dr. Abdalla Kafeel Associate Professor, Management Garden City College for Science & Technology
Strategic Assessment of ICT Options Josh Woodard November 30, 2011 This presentation was developed for a Farmer to Farmer implementing partners workshop.
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Discriminative Modeling extraction Sets for Machine Translation Author John DeNero and Dan KleinUC Berkeley Presenter Justin Chiu.
/ #ProjectNZ.
Discriminative Word Alignment with Conditional Random Fields Phil Blunsom & Trevor Cohn [ACL2006] Eiji ARAMAKI.
A Brief Summary of MISS Project Weiquan Liu Feiyu Xu the multilingual world before MISO.
1. Solve using a number line: Solve using counters: 7 + (-4) 3. Solve: -2 – (-5) 4. Solve” (-3.4)
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
Chapter 3 The Grammar-Translation Method. The Grammar-Translation Method is a method of foreign or second language teaching that uses translation and.
Automatically Labeled Data Generation for Large Scale Event Extraction
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
KantanNeural™ LQR Experiment
Suggestions for Class Projects
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Machine Translation(MT)
The XMU SMT System for IWSLT 2007
1-P-30 Speech-to-Speech Translation using Dual Learning and Prosody Conversion Zhaojie Luo, Yoichi Takashima, Tetsuya Takiguchi, and Yasuo Ariki (Kobe.
Presentation transcript:

Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences 1

Motivation Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes 2

Motivation further investigation shows that this happens due to the usage of incorrect MT rules 我 想 买 茶叶 送给 家人 做 礼物 。 rule : #X1# 茶叶 #X2#-> #X1# #X2# 我 想 买 I would like to buy 送给 家人 做 礼物 。 souvenir for my family. 3 result: I would like to buy souvenir for my family.

Motivation There is no specific feature in classic SMT framework to distinguish bad rules from good ones. An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones. 4

two rules 推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 a good rule a bad rule that miss the translation of content word “ 推荐 ” 5

two rules 推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 R2 may be favored by classic MT system Since it generate shorter translation result 6

Our Model 7

推荐 的 茶 tea recommended 推荐 的 茶 tea R1 R2 8

Training bilingual corpus with word alignment info 9 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended ……

Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended bilingual corpus with word alignment info …… 10

Training bilingual corpus with word alignment info 11 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended isn’t content phrase content phrase stoplist 么 吗 的 … content words are label with bold face ……

Training Co-relation table 茶 tea 茶 Japanese tea 4.89 … 12 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea … bilingual corpus with word alignment information 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended ……

Two penalties Source Unaligned Penalty – the number of unaligned source content words in a rule Target Unaligned Penalty – the number of unaligned target content words in a rule 13

Experiment Data Sets – training : 280K CH-EN spoken language sentences – tuning : DEVSET2 of IWSLT 2010 – test : DEVSET3 ~ DEVSET6 of IWSLT 2010 – training set is used to our model 14

Experiment 15

Thanks Q & A 16