Moses, past, present, future Hieu Hoang XRCE 2013.

Slides:



Advertisements
Similar presentations
Symantec 2010 Windows 7 Migration Global Results.
Advertisements

ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala
Introduction to Computers Part II
Programming with Android: SDK install and initial setup Luca Bedogni Marco Di Felice Dipartimento di Scienze dellInformazione Università di Bologna.
& dding ubtracting ractions.
Software Installation Deck Big Data Workshop Saturday March 10 th, 2012.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 4 Computing Platforms.
Copyright © 2013 Elsevier Inc. All rights reserved.
Copyright © 2013 Elsevier Inc. All rights reserved.
We need a common denominator to add these fractions.
P.Fiévet February 13, 2006 Information technology support for IPC users IPC FORUM Geneva, February 13, 2006 Patrick FIÉVET World Intellectual Property.
CALENDAR.
EGit wird erwachsen Git, Github und Gerrit – alles in der IDE += Dr. Stefan Lay (SAP AG)
The 5S numbers game..
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Information Retrieval in Practice
The basics for simulations
What's new?. ETS4 for Experts - New ETS4 Functions - improved Workflows - improvements in relation to ETS3.
Oded Lachish Room: Mal 405 Visiting Hours: Wednesday 17:00 to 20:00 Module URL:
Statistical Machine Translation with Moses Hieu Hoang Localization World
Part 3. Description of a function code 1. Part 3. Description of a function code 2 As an example we will write a function code to find the outside diameter.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Ranking with Index Rong Jin.
Paging: Design Issues. Readings r Silbershatz et al: ,
lld The LLVM Linker Friday, April 13, 2012
Operating Systems Operating Systems - Winter 2011 Dr. Melanie Rieback Design and Implementation.
Operating Systems Operating Systems - Winter 2012 Dr. Melanie Rieback Design and Implementation.
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
Services Course Windows Live SkyDrive Participant Guide.
FAFSA on the Web Preview Presentation December 2013.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
What is an operating system? Is it software?
SCAPE Carl Wilson Open Planets Foundation SCAPE Training Guimarães Characterisation An introduction to the identification and characterisation of.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Figure 10–1 A 64-cell memory array organized in three different ways.
One-Degree Imager (ODI), WIYN Observatory What’s REALLY New in SolidWorks 2010 Richard Doyle, User Community Manager inspiration.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 v3.1 Module 9 TCP/IP Protocol Suite and IP Addressing.
& dding ubtracting ractions.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
First Year Progress Hieu Hoang Luxembourg Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language.
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
System Software Matt Bardsley Comm 165. System Software System software handles technical details Consists of four types of programs.
Chapter 8: Network Operating Systems and Windows Server 2003-Based Networking Network+ Guide to Networks Third Edition.
CS364 CH08 Operating System Support TECH Computer Science Operating System Overview Scheduling Memory Management Pentium II and PowerPC Memory Management.
Operating System.
Computer Organization Review and OS Introduction CS550 Operating Systems.
Chapter 4 Software Hardware matters little compared to software?
CS 355 – Programming Languages
Systems Software & Operating systems
Computers Are Your Future Eleventh Edition Chapter 4: System Software Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Standard Grade Computing System Software & Operating Systems.
Introduction to Interactive Media Interactive Media Tools: Software.
The HipHop Compiler from Facebook By Megha Gupta & Nikhil Kapoor.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
© Paradigm Publishing Inc. 4-1 OPERATING SYSTEMS.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,
Introduction to Computer Operating Systems
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
CS101 Computer Software. Software Software is... Two main types of software are…
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Development Environment
ICS103 Programming in C Lecture 1: Overview of Computers & Programming
Suggestions for Class Projects
Computer Software CS 107 Lecture 2 September 1, :53 PM.
Chapter 4.
Computer Science I CSC 135.
Git CS Fall 2018.
Compiler Structures 1. Overview Objective
Presentation transcript:

Moses, past, present, future Hieu Hoang XRCE 2013

Timeline 2002Pharoah decoder, precursor to Moses 2005Replacement for Pharoah 2006JHU Workshop extends Moses significantly since late 2006Funding by EU projects EuroMatrix, EuroMatrixPlus 2012MosesCore

What is Moses? Only the decoder Only for Linux Difficult to use Unreliable Only phrase-based No sparse features Developed by one person Slow Common Misconceptions

Only the decoder – replacement for Pharoah Training Tuning Decoder Other – XML Server. Phrase-table pruning/filtering. Domain adaptation. Experiment management system

Only works on Linux Tested on – Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-bit – Debian 6.0, 32 and 64-bit – Fedora 17, 32 and 64-bit – openSUSE 12.2, 32 and 64-bit Project files for – Visual Studio – Eclipse on Linux and Mac OSX

Difficult to use Easier compile and install – Boost bjam – No installation required Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends IRSTLM GIZA++ and MGIZA Ready-made models trained on Europarl

Unreliable Monitor check-ins Unit tests More regression tests Nightly tests – Run end-to-end training – Tested on all major OSes Train Europarl models – Phrase-based, hierarchical, factored – 8 language-pairs –

Only phrase-based model – replacement for Pharoah – extension of Pharaoh From the beginning – Factored models – Lattice and confusion network input – Multiple LMs, multiple phrase-tables since 2009 – Hierarchical model – Syntactic models

No Sparse Features Large number of sparse features – 1+ millions – Sparse AND dense features Available sparse features Different tuning – MERT – Mira – Batch Mira (Cherry & Foster, 2012) – PRO (Hopkins and May, 2011) Target BigramTarget NgramSource Word Deletion Sparse Phrase TablePhrase BoundaryPhrase Length Phrase PairTarget Word InsertionGlobal Lexical Model

Developed by one person ANYONE can contribute – 50 contributors git blame of Moses repository

Slow thanks to Ken!! Decoding

Slow Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical (65%) 473 (45%) 375 (36%) 8300 Training

What is Moses? Common Misconceptions Only the decoder Only for Linux Difficult to use Unreliable Only phrase-based No sparse features Developed by one person Slow

What is Moses? Only the decoder Decoding, training, tuning, server Only for Linux Windows, Linux, Mac Difficult to use Easier compile and install Unreliable Multi-stage testing Only phrase-based Hierarchical, syntax model No sparse features Sparse AND dense features Developed by one person everyone Slow Fastest decoder, multithreaded training, less IO Common Misconceptions

Future priorities Code cleanup MT applications – Computer-Aided Translation – Speech-to-speech Incremental Training Better translation – smaller model – bigger data – faster training and decoding

Code cleanup Framework for feature functions – Easier to add new feature functions Cleanup – Refactor – Delete old code – Documentation

MT Applications Computer-Aided Translation – integration with front-ends – better user of user-feedback information

MT Applications Speech-to-speech – ambiguous input lattices and confusion networks – translate prosody factored word representation

Incremental Training Incremental word alignment Dynamic suffix array Phrase-table update Better integration with rest of Moses

Smaller files Smaller binary – phrase-tables – language models Mobile devices Fits into memory faster decoding! Efficient data structures – suffix arrays – compressed file formats

Better Translations Consistently beat phrase-based models for every language pair Phrase-BasedHierarchical en-es es-en en-cs cs-en en-de de-en en-fr fr-en zh-en ar-en

The End