1 The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France)

Slides:



Advertisements
Similar presentations
The University of Washington Machine Translation System for the IWSLT 2007 Competition Katrin Kirchhoff & Mei Yang Department of Electrical Engineering.
Advertisements

XYZ 10/12/2014 MIT Lincoln Laboratory The MITLL/AFRL IWSLT-2007 MT System Wade Shen, Brian Delaney, Tim Anderson and Ray Slyh 27 November 2006.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
歡迎 IBM Watson 研究員 詹益毅 博士 蒞臨 國立台灣師範大學. Hai-Son Le, Ilya Oparin, Alexandre Allauzen, Jean-Luc Gauvain, Franc¸ois Yvon ICASSP 2011 許曜麒 Structured Output.
Vamshi Ambati | Stephan Vogel | Jaime Carbonell Language Technologies Institute Carnegie Mellon University A ctive Learning and C rowd-Sourcing for Machine.
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
Unsupervised Turkish Morphological Segmentation for Statistical Machine Translation Coskun Mermer and Murat Saraclar Workshop on Machine Translation and.
Why Generative Models Underperform Surface Heuristics UC Berkeley Natural Language Processing John DeNero, Dan Gillick, James Zhang, and Dan Klein.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
CAS-IA System Description Jinhua Du CNGL July 23, 2008.
“Applying Morphology Generation Models to Machine Translation” By Kristina Toutanova, Hisami Suzuki, Achim Ruopp (Microsoft Research). UW Machine Translation.
TIDES MT Workshop Review. Using Syntax?  ISI-small: –Cross-lingual parsing/decoding Input: Chinese sentence + English lattice built with all possible.
1 Language Model Adaptation in Machine Translation from Speech Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, and John Makhoul.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Factored Language Models EE517 Presentation April 19, 2005 Kevin Duh
Czech-to-English Translation: MT Marathon 2009 Session Preview Jonathan Clark Greg Hanneman Language Technologies Institute Carnegie Mellon University.
Statistical Machine Translation Part VIII – Log-Linear Models Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Spoken Language Translation 1 Intelligent Robot Lecture Note.
The CMU-UKA Statistical Machine Translation Systems for IWSLT 2007 Ian Lane, Andreas Zollmann, Thuy Linh Nguyen, Nguyen Bach, Ashish Venugopal, Stephan.
1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification ドメイン内の信頼度と談話の整 合性 を用いた音声認識誤りの検出 Ian R. Lane, Tatsuya Kawahara.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Technical Report of NEUNLPLab System for CWMT08 Xiao Tong, Chen Rushan, Li Tianning, Ren Feiliang, Zhang Zhuyu, Zhu Jingbo, Wang Huizhen
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
Statistical Machine Translation Part IV – Log-Linear Models Alex Fraser Institute for Natural Language Processing University of Stuttgart Seminar:
METEOR-Ranking & M-BLEU: Flexible Matching & Parameter Tuning for MT Evaluation Alon Lavie and Abhaya Agarwal Language Technologies Institute Carnegie.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Statistical Machine Translation Part IV – Log-Linear Models Alexander Fraser Institute for Natural Language Processing University of Stuttgart
2012: Monolingual and Crosslingual SMS-based FAQ Retrieval Johannes Leveling CNGL, School of Computing, Dublin City University, Ireland.
Morphology & Machine Translation Eric Davis MT Seminar 02/06/08 Professor Alon Lavie Professor Stephan Vogel.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Chinese Word Segmentation and Statistical Machine Translation Presenter : Wu, Jia-Hao Authors : RUIQIANG.
Training dependency parsers by jointly optimizing multiple objectives Keith HallRyan McDonaldJason Katz- BrownMichael Ringgaard.
Phrase Reordering for Statistical Machine Translation Based on Predicate-Argument Structure Mamoru Komachi, Yuji Matsumoto Nara Institute of Science and.
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
Rapid and Accurate Spoken Term Detection Michael Kleber BBN Technologies 15 December 2006.
Approximating a Deep-Syntactic Metric for MT Evaluation and Tuning Matouš Macháček, Ondřej Bojar; {machacek, Charles University.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Ibrahim Badr, Rabih Zbib, James Glass. Introduction Experiment on English-to-Arabic SMT. Two domains: text news,spoken travel conv. Explore the effect.
Compact WFSA based Language Model and Its Application in Statistical Machine Translation Xiaoyin Fu, Wei Wei, Shixiang Lu, Dengfeng Ke, Bo Xu Interactive.
Korea Maritime and Ocean University NLP Jung Tae LEE
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
ACOUSTIC-PHONETIC UNIT SIMILARITIES FOR CONTEXT DEPENDENT ACOUSTIC MODEL PORTABILITY Viet Bac Le*, Laurent Besacier*, Tanja Schultz** * CLIPS-IMAG Laboratory,
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
Using Neural Network Language Models for LVCSR Holger Schwenk and Jean-Luc Gauvain Presented by Erin Fitzgerald CLSP Reading Group December 10, 2004.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Named Entities in Domain Unlimited Speech Translation Alex Waibel, Stephan Vogel, Tanja Schultz Carnegie Mellon University Interactive Systems Labs.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
2 Research Department, iFLYTEK Co. LTD.
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Statistical Machine Translation Part IV – Log-Linear Models
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Issues in Arabic MT Alex Fraser USC/ISI 9/22/2018 Issues in Arabic MT.
--Mengxue Zhang, Qingyang Li
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Automatic Speech Recognition: Conditional Random Fields for ASR
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

1 The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France) * Former name : CLIPS

2 OUTLINE 1 Baseline MT system -Task, data & tools -Restoring punctuation and case -Use of out-of-domain data -Adding a bilingual dictionary 2 Lattice decomposition for CN decoding -Lattice to CNs -Word lattices to sub-word lattices -What SRI-LM does -Our algo. -Examples in arabic 3 Speech translation experiments -Results on IWSLT06 -Results on IWSLT07 (eval)

3 OUTLINE 1 Baseline MT system -Task, data & tools -Restoring punctuation and case -Use of out-of-domain data -Adding a bilingual dictionary

4 Task, data & tools  First participation to IWSLT A/E task Conventional phrase-based system using Moses+Giza+sri-lm  Use of IWSLT-provided data (20k bitext) except A 84k A/E bilingual dictionary taken from The buckwalter morphological analyzer LDC’s Gigaword corpus (for english LM training)

5 Restoring punctuation and case  2 separated punct. and case restoration tools built using hidden-ngram and disambig commands from sri-lm => restore MT outputs Option (2) kept

6 Use of out-of-domain data  Baseline in-domain LM trained on the english part of A/E bitext  Interpolated LM between Baseline and Out- of-domain (LDC gigaword) : 0.7/0.3

7 Adding a bilingual dictionary  A 84k A/E bilingual dictionary taken from  Directly concatenated to the training data + retraining + retuning (mert) Submitted MT system (from verbatim trans.)

8 OUTLINE 2 Lattice decomposition for CN decoding -Lattice to CNs -Word lattices to sub-word lattices -What SRI-LM does -Our algo. -Examples in arabic

9 Lattice to CNs  Moses allows to exploit CN as interface between ASR and MT  Example of word lattice and word CN

10 Word lattices to sub-word lattices  Problem : word graphs provided for IWSLT07 do not have necessarily word decomposition compatible with the word decomposition used to train our MT models Word units vs sub-word units Different sub-word units used  Need for a lattice decomposition algorithm

11 What SRI-LM does  Example : CANNNOT splitted into CAN and NOT  -split-multiwords option of lattice- tool First node keeps all the information new nodes have null scores and zero-duration

12 Proposed lattice decomposition algorithm (1)  identify the arcs of the graph that will be split (decompoundable words)  each arc to be split is decomposed into a number of arcs that depends on the number of subword units  the start / end times of the arcs are modified according to the number of graphemes into each subword unit  so are the acoustic scores  the first subword of the decomposed word is equal to the initial LM score of the word, while the following subwords LM scores are made equal to 0  Freely available on

13 Proposed lattice decomposition algorithm (2)

14 Examples in arabic Word lattice

15 Examples in arabic Sub-Word lattice

16 OUTLINE 3 Speech translation experiments -Results on IWSLT06 -Results on IWSLT07 (eval)

17 Results on IWSLT06  Full CN decoding (subword CN as input) obtained after applying our word lattice decomposition algorithm all the parameters of the log-linear model used for the CN decoder were retuned on dev06 set  “CN posterior probability parameter” to be tuned ASR secondary ASR primary

18 Results on IWSLT07 (eval) AE ASR 1XXXX BLEU score = XXXX BLEU score = XXXX BLEU score = XXXX BLEU score = XXXX BLEU score = LIG_AE_ASR_primary_01 BLEU score = XXXX BLEU score = XXXX BLEU score = XXXX BLEU score = XXXX BLEU score = XXXX BLEU score =