Download presentation
Presentation is loading. Please wait.
Published byMartin Bell Modified over 7 years ago
1
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock
2
Introduction Freelance translators need personalised machine translation (MT) to provide a first draft for post-editing. Used Moses for Mere Mortals (MMM) to build German to English MT engines. Conducted experiments with different amounts of data. Results. Implications
3
Equipment Tutorial for MMM very comprehensive.
PC used 8GB Ram 4 processors 148GB hard disk. Ubuntu Operating system 14.04(LTS)(64 bits)
4
Software Installation
Download a zipped archive of MMM files and unpack them. Use Ubuntu CLI to run scripts Install Create Demo corpus
5
Preparation of Corpora
The aligned German and English texts of Europarl downloaded from Internet. English Europarl text used for training Language module (LM). The 'Make-test-files' script extracts a 1000 segment test file from the Europarl corpus before using it for training.
6
Training Aligned texts ‘corpora-for-training’ folder.
Run ‘train’ script. Four basic trainings were built and tested: The whole corpus Then 200, ,000 and 800,000 segments. MMM generates reports for each training
7
Translation script
8
Moses Features The phrase translation table. The language model Wl
The distortion model Wd The word penalty Ww Wl, Wd and Ww have default values of 1,1 and 0.
9
Translating 1,000 segment test document was translated by each of the engines and given a Bleu score by the ‘score’ script. A sample of 50 segments from each translation was post-edited and evaluated by me.
10
Five point scale Bad: Many changes for an acceptable translation; no time saved. So So: Quite a number of changes, but some time saved. Good: Few changes; time saved. Very Good: Only minor changes, a lot of time saved. Fully correct: Could be used without any change, even if I would still change it if it were my own translation.
11
Adjusting tuning weights
Wd distortion model, Ww Word penalty, Wl language model, MBR Minimum Bayes risk
12
Increasing training data
13
Using 5 point scale
14
If Wl=0.5
15
What does this mean for freelancer?
An average translator working full time will produce 50,000 translation units a year. (Champollion, 2007, p2) segments represents 16 years work. If use is permissible – Starting with units and then increasing incrementally not likely to work.
16
The way forward More specific MT engines in terms of genre and language pair Harvesting data Novel MT features incorporated in Moses
17
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.