A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock  roger.haycock@myport.ac.uk.

Slides:



Advertisements
Similar presentations
Creating a Virtual Machine Researched and Created by Bryan Bankhead.
Advertisements

CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
MT Software Emulator Setup. 1. On the P Drive Here is the software 2. Create this folder in My Documents 3. Copy this.IMG File Into the folder 4. Execute.
Computer Basics Hit List of Items to Talk About ● What and when to use left, right, middle, double and triple click? What and when to use left, right,
First Year Progress Hieu Hoang Luxembourg Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language.
10 February Doors TM for Windows TM Software Installation.
TURKALATOR A Suite of Tools for English to Turkish MT Siddharth Jonathan Gorkem Ozbek CS224n Final Project June 14, 2006.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
 Contents 1.Introduction about operating system. 2. What is 32 bit and 64 bit operating system. 3. File systems. 4. Minimum requirement for Windows 7.
Installing software on personal computer
Best Western Green Bay CHEMS 2013 INSTALL OPTIONS.
ONLINE DATA STORAGE & DOCUMENTS Lesson 3. Lesson 3 – Online documents In this lesson we will be covering:  Online documents  Compression and expansion.
Moving to Win 7 Considerations Dean Steichen A2CAT 2010.
When I get my CTPP 2000 CD Home What should I do?.
Operating Systems Networking for Home and Small Businesses – Chapter 2 – Introduction To Networking.
Computer Technology 1 st Term Correct Keyboarding Technique Eyes on copy Fingers curved Correct fingers Key smooth Proper sitting posture.
Ling Guo Feb 15, 2010 Database(RDBMS) Software Review Oracle RDBMS (Oracle Cooperation) 4()6 Oracle 10g Express version DB2 (IBM) IBM DB2 Express-C SQL.
Tutorial 11 Installing, Updating, and Configuring Software
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Achieving Domain Specificity in SMT without Over Siloing William Lewis, Chris Wendt, David Bullock Microsoft Research Machine Translation.
Introduction to Computers Seminar I. Parts of the Computer Personal Computer a PC (any non-Mac computer) has four major pieces of hardware-- keyboard,
Chapter SIx Maintaining a Computer Part I: Configuring, Updating, and Upgrading the OS.
Learningcomputer.com SQL Server 2008 – Installation of SQL Server 2008.
1.First Go to
Name Region Assisting Partners to Do the Day in the Life Demonstration.
Google Earth By: Sandra Morales Jude Fernando Dilini Abeywarna Michael Rosa.
Evidence By Ryan Owen. Basic This means who I am sending the to. This how who I am sending a copy of the to. The Bcc is a copy.
Training & Support for End Users of Cisco Unified Communications Agenda:  Flash Demo  VoIPT on the Phone  Modules / Pricing  Installation Process.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
© 2015 by McGraw-Hill Education. This proprietary material solely for authorized instructor use. Not authorized for sale or distribution in any manner.
How to Install Eclipse Click hereClick here to download Eclipse.
Lecture 2 Programming life cycle. computer piano analogy Piano + player - computer hardware Musical score/notes - software or program Composer - programmer.
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Agya Adueni. Hardware  The machine featured in this tutorial is a Dell Dimension 8200 with 512mb of RAM and a P4 1.8GHz processor.  It ran Fedora Core.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
SAP Business One 9.0 integration for SAP NetWeaver Installation and Technical Configuration 2013 March.
Is Neural Machine Translation the New State of the Art?
Introduction to InVEST ArcGIS Tool
Licenses and Interpreted Languages for DHTC Thursday morning, 10:45 am
CSE 517 Natural Language Processing Winter 2015
Next Generation Health Checks
Computer Software.
Master of Translation An introduction to post-editing
8. Translation resources
KantanNeural™ LQR Experiment
INSTRUCTIONS FOR UPDATING THE MAPPING OF THE NAVIGATION SYSTEM
Suggestions for Class Projects
TO DOWNLOAD FREE TRIAL of Kurzweil 3000 Subscription
CIS 221 Lesson 1.
Computer courses in Chandigarh. CBitss Technologies classroom offers students a creative approach to learn Basics of Computer in Chandigarh. With experienced.
A pattern classifier based approach to Campaign planning
Unit# 8: Introduction to Computer Programming
Writing and Reading Supports for Microsoft Word
Build MT systems with Moses
Yuri Pettinicchi Jeny Tony Philip
Module 1 Introduction to PHP 11/30/2018 CS346 PHP.
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
ONEs - OHT NMT Evaluation score
Memory-augmented Chinese-Uyghur Neural Machine Translation
Operating Systems Networking for Home and Small Businesses – Chapter 2 – Introduction To Networking.
CSCI N317 Computation for Scientific Applications Unit 1 – 1 MATLAB
A Metric for Evaluating Static Analysis Tools
Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.
MT and Post-editing from a Translator’s Perspective
Domain Mixing for Chinese-English Neural Machine Translation
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
What is an operating system An operating system is the most important software that runs on a computer. It manages the computer's memory and processes,
Presentation transcript:

A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock  roger.haycock@myport.ac.uk

Introduction Freelance translators need personalised machine translation (MT) to provide a first draft for post-editing. Used Moses for Mere Mortals (MMM) to build German to English MT engines. Conducted experiments with different amounts of data. Results. Implications

Equipment Tutorial for MMM very comprehensive. PC used 8GB Ram 4 processors 148GB hard disk. Ubuntu Operating system 14.04(LTS)(64 bits)

Software Installation Download a zipped archive of MMM files and unpack them. Use Ubuntu CLI to run scripts Install Create Demo corpus

Preparation of Corpora The aligned German and English texts of Europarl downloaded from Internet. English Europarl text used for training Language module (LM). The 'Make-test-files' script extracts a 1000 segment test file from the Europarl corpus before using it for training.

Training Aligned texts ‘corpora-for-training’ folder. Run ‘train’ script. Four basic trainings were built and tested: The whole corpus Then 200,000 400,000 and 800,000 segments. MMM generates reports for each training

Translation script

Moses Features The phrase translation table. The language model Wl The distortion model Wd The word penalty Ww Wl, Wd and Ww have default values of 1,1 and 0.

Translating 1,000 segment test document was translated by each of the engines and given a Bleu score by the ‘score’ script. A sample of 50 segments from each translation was post-edited and evaluated by me.

Five point scale Bad: Many changes for an acceptable translation; no time saved. So So: Quite a number of changes, but some time saved. Good: Few changes; time saved. Very Good: Only minor changes, a lot of time saved. Fully correct: Could be used without any change, even if I would still change it if it were my own translation.  

Adjusting tuning weights Wd distortion model, Ww Word penalty, Wl language model, MBR Minimum Bayes risk

Increasing training data

Using 5 point scale

If Wl=0.5

What does this mean for freelancer? An average translator working full time will produce 50,000 translation units a year. (Champollion, 2007, p2) 800000 segments represents 16 years work. If use is permissible – Starting with 100000 units and then increasing incrementally not likely to work.

The way forward More specific MT engines in terms of genre and language pair Harvesting data Novel MT features incorporated in Moses

Questions?