Download presentation
Presentation is loading. Please wait.
Published byLisa Hill Modified over 9 years ago
1
Saab Mansour and Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University, Aachen, Germany NAACL-HLT 2013
2
Introduction Domain-adaption 是利用某一個 domain 內的 data 來提高 TM model 在 test domain 的 performance. TM adaption: 建立一個 general domain phrase table, 利用 in-domain data 修改 phrase probabilities.
3
Introduction 使用的 corpus 為 IWSLT(International Workshop On Spoken Language Translation) TED(Technology Entertainment Design) tasks 內的 Arabic-to-English 和 German-to-English.
4
Phrase Training 用 Forced alignment (FA) 來執行 phrase segmentation, alignment training 和 probability estimation. 用 SMT 來做 phrase training, 對一個 training set y, 產生 heuristic-based phrase table P y 0, 經過 FA training(sentence 會被 segmentation 和 alignment), 根據 output 來重新估計 phrase 的機率值, 產生新的 phrase table p’.
5
Adaption 對一個 training set y’, 產生 initial phrase table P y’ 0, 對 yin(in-domain training data) 做 FA training, bias the probability to in-domain, procedure 表示為 X-FA-IN. 用 leaving-one-out 來避免 over-fitting.
6
Experimental Setup Training Corpora: Arabic-to-English: In-domain: 90K TED sentences Other-domain: 7.9M sentences of United Nation data German-to-English: In-domain: 130K TED sentences Other-domain: 2.1M sentences from news- commentary and europarl corpora
8
Experimental Setup – Translation System Baseline system: built using SMT toolkit Jane 2.0 Measures: BLEU, TER. Arabic-English results are case sensitive German-English results are case insensitive
9
Results Heuristics: IN,OD,ALL standard phrase extraction using word-alignment training and heuristic phrase extraction over the word alignment. FA standard: IN-FA,OD-FA,ALL-FA standard FA phrase training where the same training set is used for initial phrase table generation as well as the FA procedure. FA adaptation: OD-FA0-IN, ALL-FA-IN FA based adaptation phrase training, where the initial table is generated from some general data and the FA training is performed on the IN data to achieve adaptation.
10
Results - measures BLEU: (Bilingual Evaluation Understudy) Candidate: the the the the the the the. Reference 1: The cat is on the mat. Reference 2: There is a cat on the mat. Standard unigram precision: 7/7 Modified unigram precision: 2/7
11
Results - measures TER: translation edit rate REF: SAUDI ARABIA denied THIS WEEK information published in the AMERICAN new york times HYP: THIS WEEK THE SAUDIS denied information published in the new york times TER = 4/13 4 (1 Shift, 2 Substitutions, and 1 Insertion)
13
Mixture Modeling Linear interpolation of IN and OD, IN and OD- FA0-IN, weight is uniform(0.5).
15
Conclusion 提出 phrase training procedure for adaptation using FA method. 對 Arabic-to-English 和 German-to-English TED lectures translation tasks, 都提高了 performance, BLEU 在 development set 提高 0.6%, TER 分別在 test, eval sets 減少了 0.8% 和 0.6% 最後用 mixture model 來比較, 結果顯示 adapted OD table performance 較 unadpated 的 OD table 好.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.