Download presentation
Presentation is loading. Please wait.
Published byMitchell Blizzard Modified over 10 years ago
1
Statistical Machine Translation with Moses Hieu Hoang Localization World 2013 0.6227
2
Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 2
3
Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 3
4
What is Statistical Machine Translation? It is very tempting to say that a book written in Chinese is simply a book written in English which was coded into the Chinese code. If we have useful methods for solving almost any cryptographic problem, may it not be that with proper interpretation we already have useful methods for translation? Warren Weaver 1949 Moses by Hieu Hoang, University of Edinburgh 4
5
NLP Application – search engines, text mining etc. Big-data – bi-text from the Internet eg. multilingual websites, documents – large monolingual data Learn to translate – from previous translations – models of language What is Statistical Machine Translation? Moses by Hieu Hoang, University of Edinburgh 5
6
What is Statistical Machine Translation? Training Training Data Linguistic Tools bi-text monolingual data dictionary SMT System translation model language model lots of numbers… Using Source Text SMT System translation model language model lots of numbers… § § Source Text Moses by Hieu Hoang, University of Edinburgh 6
7
What is a model? Moses by Hieu Hoang, University of Edinburgh 7 thanks to Precision Translation Tools Translation Model Language Model – (of the target language)
8
What is a model? Translation model – source translation – probability Moses by Hieu Hoang, University of Edinburgh 8 sourcetargetprobability den Vorschlagthe proposal0.6227 s proposal0.1068 a proposal0.0341 the idea0.0250 this proposal0.0227 proposal0.0205 ….
9
What is a model? Language model – Likelihood of sentence – in target language Moses by Hieu Hoang, University of Edinburgh 9 textprobability I would like0.489 would like to0.905 like to commend0.002 to commend the0.472 commend the rapporteur 0.147 ….
10
Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 10
11
What is Moses? Replacement for Pharoah – Academic software – Closed-source Open source Re-written, clean code – More features Large developer community – Initiated by Hieu Hoang – Developed at NLP Workshop Moses by Hieu Hoang, University of Edinburgh 11
12
Agenda What is Statistical Machine Translation? What is Moses? – Timeline – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 12
13
What is Moses? Only for Linux Difficult to use Unreliable Only phrase-based Developed by one person Slow Common Misconceptions Moses by Hieu Hoang, University of Edinburgh 13
14
Only works on Linux Tested on – Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-bit – Debian 6.0, 32 and 64-bit – Fedora 17, 32 and 64-bit – openSUSE 12.2, 32 and 64-bit Project files for – Visual Studio – Eclipse on Linux and Mac OSX Moses by Hieu Hoang, University of Edinburgh 14
15
Difficult to use Easier compile and install – Boost bjam – No installation required Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends IRSTLM GIZA++ and MGIZA Ready-made models trained on Europarl Moses by Hieu Hoang, University of Edinburgh 15
16
Unreliable Monitor check-ins Unit tests More regression tests Nightly tests – Run end-to-end training – http://www.statmt.org/moses/cruise/ Tested on all major OSes Train Europarl models – Phrase-based, hierarchical, factored – 8 language-pairs – http://www.statmt.org/moses/RELEASE-1.0/models/ Moses by Hieu Hoang, University of Edinburgh 16
17
Only phrase-based model – replacement for Pharoah – extension of Pharaoh From the beginning – Factored models – Lattice and confusion network input – Multiple LMs, multiple phrase-tables since 2009 – Hierarchical model – Syntactic models Moses by Hieu Hoang, University of Edinburgh 17
18
Developed by one person ANYONE can contribute – 50 contributors git blame of Moses repository Moses by Hieu Hoang, University of Edinburgh 18
19
Slow thanks to Ken!! Decoding Moses by Hieu Hoang, University of Edinburgh 19
20
Slow Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical1030677 (65%) 473 (45%) 375 (36%) 8300 Training Moses by Hieu Hoang, University of Edinburgh 20
21
What is Moses? Common Misconceptions Only for Linux Difficult to use Unreliable Only phrase-based Developed by one person Slow Moses by Hieu Hoang, University of Edinburgh 21
22
What is Moses? Only for Linux Windows, Linux, Mac Difficult to use Easier compile and install Unreliable Multi-stage testing Only phrase-based Hierarchical, syntax model Developed by one person everyone Slow Fastest decoder, multithreaded training, less IO Common Misconceptions Moses by Hieu Hoang, University of Edinburgh 22
23
Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 23
24
Coming up… Moses by Hieu Hoang, University of Edinburgh 24 Code cleanup Incremental Training Better translation – smaller model – bigger data – faster training and decoding Applications – CAT tools – Speech translation
25
Applications EU Project – CASMACAT – MATECAT Moses by Hieu Hoang, University of Edinburgh 25 Computer-Aided Translation
26
Agenda What is Statistical Machine Translation? What is Moses? – Common misconceptions Coming up What can we do for you? Moses by Hieu Hoang, University of Edinburgh 26
27
What can we do for you? – simpler Moses – graphical interface – Windows compatibility – terminology and glossary – incremental training What can you do for us? – code – data – funding Moses by Hieu Hoang, University of Edinburgh 27
28
What can we do for you? – simpler Moses – graphical interface – Windows compatibility – terminology and glossary – incremental training What can you do for us? – code – data – funding Moses by Hieu Hoang, University of Edinburgh 28
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.