A Comparison of Statistical Post-Editing on Chinese and Japanese Midori Tatsumi and Yanli Sun Under the supervision of: Sharon O’Brien; Minako O’Hagan; Fred Hollowood; Johann Roturier
Outline Introduction1 Evaluation on Sentence Level3 Analysis on modifications made by SPE2 Conclusion4
Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text
Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text Statistical Machine Translation (SMT) –Two Stages: Training: automatically learn translation and language knowledge from parallel corpus Decoding: translate new sentences using the above learned knowledge
Introduction Rule-Based Machine Translation (RBMT) –Three Stages: Analysis: analyze a source text into abstract lexical and structural representations Transfer: convert the source language representations into target language representations Generation: generate the target text Statistical Machine Translation (SMT) –Two Stages: Training: automatically learn translation and language knowledge from parallel corpus Decoding: translate new sentences using the above learned knowledge Post-Editing (PE) –Human post-editing –Automatic post-editing –Statistical post-editing (SPE)
Introduction Statistical Post-editing (SPE) of Rule-Based Machine Translation (RBMT) Output Knight & Chander (1994) Simard et al. (2007a, 2007b) Flowchart of RBMT Human Post-editor Final output Output 2 Flowchart of SPE RBMT Source Final output Output 1 SPE module SMT Reference RBMT output RBMT Source Output 1 Human Post-editor
Introduction –Experimental setting SMT RBMT Human Post-editor SPE module Source Final output Output 1 Output 2 Reference RBMT output Moses Translation Memory: 529,822 (ZH) and 143,742 (JA) Systran -UD: 8,832 entries (ZH) and 6,363 entries (JA) Chinese (ZH); Japanese (JA) English
Introduction –Evaluate SPE: Compare Output 2 and output 1 SMT RBMT Human Post-editor SPE module Source Final output Output 1 Output 2 Reference RBMT output
Analysis of the Modifications Made by SPE Methodology Pilot project –Random selection of 100 sentences for each language Classify and Evaluate the changes –Classification( Vilar et al ) Alteration, Deletion, Addition of Content/Function words Form of Tense/Voice/Imperative/Formality (Politeness) Fixed expression Reordering Punctuation –Evaluation ( Dugast et al ) Improvement Degradation Equivalent
Analysis of the Modifications Made by SPE Quantitative Evaluation Modifications distribution in Japanese and Chinese ImprovementDegradationEquivalent ZHJAZHJAZHJA Alteration Content words Function words Deletion Content words Function words Addition Content words Function words Forms Tense or Voice Formality Imperative Fixed Expression Word / Phrase Reordering Punctuation Total
Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的 … 配 置…配 置… Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对 … 恢 复 到... SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 …. Alteration of function words
Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities Alteration of function words SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的 … 配 置…配 置… Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对 … 恢 复 到... SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 ….
Analysis of the Modifications Made by SPE Qualitative Evaluation Similarities Alteration of function words SourceMT outputSPE output the actions that you specify for that rule JA: あなたがその規則のために指定す る処理 そのルールに指定する処理 After you configure your … ZH: 在 您 配 置 您 的配 置配 置 Deletion of function words Punctuation SourceMT outputSPE output To maintain … JA: 保守するため … 維持するには … Reverts to … ZH: 恢 复 对恢 复 到恢 复 到 SourceMT outputSPE output MPE provides an option … JA: オプションを提供 します 。オプションがあります. while the synchronization is in progress… ZH: , 当 同 步 进 展 中 时 … 同 步 处 理 ….
Analysis of the Modifications Made by SPE Qualitative Evaluation Differences Alteration of content words Addition of function words SourceMT outputSPE output console commands JA: コンソールは命じます console コマンド number JA: 番号数 subdomainsZH: subdomains 子 域子 域 SourceMT outputSPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。黑 色 线 表 明 它 已 禁 用。黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡 … 在 Spim 选 项 卡 上 …
Analysis of the Modifications Made by SPE Qualitative Evaluation Differences Alteration of content words Addition of function words SourceMT outputSPE output console commands JA: コンソールは命じます console コマンド number JA: 番号数 subdomainsZH: subdomains 子 域子 域 SourceMT outputSPE output A black dash indicates that it is disabled. ZH: 黑 色 破 折 号 表 明 它 禁 用。黑 色 线 表 明 它 已 禁 用。黑 色 线 表 明 它 已 禁 用。 On the Spim tab… ZH: 在 Spim 选 项 卡 … 在 Spim 选 项 卡 上 …
Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁 … SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences
Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁 … SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences
Analysis of the Modifications Made by SPE Qualitative Evaluation Reordering SourceMT outputSPE output These threats are then… ZH: 这 些 威 胁 然 后 … 然 后, 这 些 威 胁…然 后, 这 些 威 胁… SourceMT outputSPE output (Imperative ending) JA: して下さいします SourceMT outputSPE output In general,… ZH: 一 般 情 况 下,… 通 常 情 况 下,… Fixed expression Imperatives forms Differences
Evaluation on Sentence Level Methodology –Same 100 segments –Effect of SPE on Fluency, Adequacy and PE time –Four evaluators per language –Random distribution of MT output and SPE output CriteriaChineseJapanese Fluency Adequacy Less PE time Kappa scores (Inter-evaluator agreement level) –Japanese: moderate to substantial agreement –Chinese: generally fair agreement Source_ENOutput 1Output 2FluencyAdequacyLess-PE time Turns on or off the special meaning of metacharacters. オン / オフ回転メタ文字の 特別な意味。 有効または無効にメタ文字 の特別な意味します. 1 / 2 / E
Evaluation on Sentence Level Results and Analysis Improvement by SPE: –Chinese ─ Fluency and Adequacy: ≈ 40%, PE time: ≈ 50% –Japanese ─ Fluency, Adequacy, PE time: ≈ 60% LanguageChineseJapanese CriteriaFluencyAdequacyLess PE TimeFluencyAdequacyLess PE Time MT SPE Equal Total100
Conclusions SPE generates more improvement than degradation Three fold for Japanese; Six fold for Chinese Linguistic changes vary between ZH and JA SPE changes are generally limited to word level SPE improves fluency, adequacy, and shortens PE time
Questions?