First Year Progress Hieu Hoang Luxembourg 2013. Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language.

Slides:



Advertisements
Similar presentations
Information Retrieval in Practice
Advertisements

An Overview Of Virtual Machine Architectures Ross Rosemark.
Statistical Machine Translation with Moses Hieu Hoang Localization World
Intro to C#. Programming Coverage Methods, Classes, Arrays Iteration, Control Structures Variables, Expressions Data Types.
Moses, past, present, future Hieu Hoang XRCE 2013.
Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.
Memory Address Decoding
Chapter 2 Machine Language.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Starting Out with Programming Logic & Design First Edition by Tony Gaddis.
System Software Matt Bardsley Comm 165. System Software System software handles technical details Consists of four types of programs.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
Development of Computer - Story of Steve. What is a computer A high intelligence machine A tool – make our life much convenient A very loyal servant Pretty.
SOFTWARE SYSTEMS SOFTWARE APPLICATIONS SOFTWARE PROGRAMMING LANGUAGES.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Virtual techdays INDIA │ august 2010 Building ASP.NET applications using SQL Server Compact Chaitanya Solapurkar │ Partner Technical Consultant,
WINDOWS 7 AND UBUNTU INSTALLING LINUX WITHIN WINDOWS.
UFCFX5-15-3Mobile Device Development UFCFX Mobile Device Development An Introduction to the Module.
Lesson 4 Computer Software
Parts of a Computer Why Use Binary Numbers? Source Code - Assembly - Machine Code.
StudioSysAdmins 2 nd Annual SIGGRAPH Birds-of-a-Feather John Hickson - 08/09/2011 StudioSysAdmins 2 nd Annual SIGGRAPH Birds-of-a-Feather John Hickson.
Chromium OS is an open-source project that aims to build an operating system that provides a fast, simple, and more secure computing experience for people.
Processor and Internal Stuff or the “guts” of the computer.
Chapter 4 Software Hardware matters little compared to software?
Computers Are Your Future Eleventh Edition Chapter 4: System Software Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Trilinos User Group Meeting Thursday, November 8 th, 2007 Timothy M. Shead (1424) Danny Dunlavy (1415) SAND P Sandia is a multiprogram laboratory.
Programming Fundamentals 2: Background/ F II Objectives – –give a non-technical overview of Java Semester 2, Background.
WaveMaker Visual AJAX Studio 4.0 Training Installation.
Standard Grade Computing System Software & Operating Systems.
Linux in a Virtual Environment Nagarajan Prabakar School of Computing and Information Sciences Florida International University.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Developing C/C++ applications with the Eclipse CDT David Gallardo.
C31102 IT 2 Computer Center Pluakdaengpittayakom
Running Kuali: A Technical Perspective Ailish Byrne - Indiana University Jay Sissom - Indiana University Foundation.
Copyright © 2007 Addison-Wesley. All rights reserved.1-1 Reasons for Studying Concepts of Programming Languages Increased ability to express ideas Improved.
Developing software and hardware in parallel Vladimir Rubanov ISP RAS.
Why Not Grab a Free Lunch? Mining Large Corpora for Parallel Sentences to Improve Translation Modeling Ferhan Ture and Jimmy Lin University of Maryland,
CSE 232: C++ Programming in Visual Studio Graphical Development Environments for C++ Eclipse –Widely available open-source debugging environment Available.
SOFTWARE REQUIREMENTS FOR MY PLATFORMS BY DAVID MISCHAK.
Compilers: Overview/1 1 Compiler Structures Objective – –what are the main features (structures) in a compiler? , Semester 1,
Introduction to Computer Operating Systems
FEISGILTT Dublin 2014 Yves Savourel ENLASO Corporation QuEst Integration in Okapi This presentation was made possible by This project is sponsored by the.
Running Kuali: A Technical Perspective Ailish Byrne (Indiana University) Jonathan Keller (University of California, Davis)
The Million Point PI System – PI Server 3.4 The Million Point PI System PI Server 3.4 Jon Peterson Rulik Perla Denis Vacher.
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
MACHINE TRANSLATION PAPER 1 Daniel Montalvo, Chrysanthia Cheung-Lau, Jonny Wang CS159 Spring 2011.
Performance Tuning John Black CS 425 UNR, Fall 2000.
Introduction Why are virtual machines interesting?
The business logic engine for Microsoft IIS Speaker T.M. Arnett.
1 FREE SAS SOFTWARE. 2 FREE SOFTWARE Free SAS ® software. SAS STUDIO; An interactive, online community. Superior training and documentation. And the analytical.
Lecture Set 1 Part B: Understanding Visual Studio and.NET – Structure and Terminology 1/16/ :04 PM.
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
IBM Worklight environment setup 1. Eclipse IDE Multi-purpose integrated development environment (IDE) Open source Supported for Windows, Mac OS X, Linux.
Build MT systems with Moses MT Marathon Americas 2016 Hieu Hoang.
GNU and Linux.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Computer Operations Part 2.
CMIT100 Chapter 14 - Programming.
LINUX WINDOWS Vs..
A Closer Look at Instruction Set Architectures
Microprocessor and Assembly Language
Open Source CAT Tool.
CPSC 315 – Programming Studio Spring 2012
Suggestions for Class Projects
Build MT systems with Moses
Microprocessor & Assembly Language
Learn. Imagine. Build. .NET Conf
CIS16 Application Development – Programming with Visual Basic
Microsoft Connect /1/2018 2:36 AM
Compiler Structures 1. Overview Objective
Running C# in the browser
Presentation transcript:

First Year Progress Hieu Hoang Luxembourg 2013

Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language Model training with IRSTLM Sparse Features Making use of TMX Lots more by other developers…

Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language Model training with IRSTLM Sparse Features Making use of TMX Lots more by other developers…

Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language Model training with IRSTLM Sparse Features Making use of TMX Lots more by other developers…

Achievements Cross-Platform Compatibility Ease of use / Installation Testing and Reliability Speed Language Model training with IRSTLM Sparse Features Making use of TMX Lots more by other developers…

Cross-Platform Compatibility Tested on – Windows 7 (32-bit) with Cygwin 6.1 – Mac OSX 10.7 with MacPorts – Ubuntu 12.10, 32 and 64-bit – Debian 6.0, 32 and 64-bit – Fedora 17, 32 and 64-bit – openSUSE 12.2, 32 and 64-bit Project files for – Visual Studio – Eclipse on Linux and Mac OSX

Ease of use / Installation Easier compile and install – Boost bjam – No installation required Binaries available for – Linux – Mac – Windows/Cygwin – Moses + Friends IRSTLM GIZA++ and MGIZA Ready-made models trained on Europarl (and others) – 10 language pairs – phrase-based, hierarchical, factored models

Testing and Reliability Monitor check-ins Unit tests More regression tests Nightly tests – Run end-to-end training – Tested on all major OSes Train Europarl models – Phrase-based, hierarchical, factored – 8 language-pairs –

Speed Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical (65%) 473 (45%) 375 (36%) 8300 Training

Speed Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical (65%) 473 (45%) 375 (36%) 8300 Training

Speed Multithreaded Reduced disk IO – compress intermediate files Reduce disk space requirement Time (mins)1-core2-cores4-cores8-coresSize (MB) Phrase- based 6047 (79%) 37 (63%) 33 (56%) 893 Hierarchical (65%) 473 (45%) 375 (36%) 8300 Training

Speed thanks to Ken!! Decoding

Language Model training with IRSTLM IRSTLM v. SRILM – Faster Parallelization – Train larger language models Uses less memory – Open-Source Integrated into Moses training pipeline – every part of pipeline now Open-Source Competition for LM training – LM training with KenLM

Sparse Features Large number of sparse features – 1+ millions – Sparse AND dense features Available sparse features Different tuning – MERT – Mira – Batch Mira (Cherry & Foster, 2012) – PRO (Hopkins and May, 2011) Target BigramTarget NgramSource Word Deletion Sparse Phrase TablePhrase BoundaryPhrase Length Phrase PairTarget Word InsertionGlobal Lexical Model

Making use of TMX Translation Memories – created by translators – highly repetitive In-domain translation Better use of TM – ‘Convergence of Translation Memory and Statistical Machine Translation’ AMTA 2010 – Use TM as templates – Preferred over MT rules

Achievements Cross-Platform Compatibility – Windows, Mac OSX, Linux Ease of use / Installation – No installation, binaries for download Testing and Reliability – Unit tests & regression test, nightly tests, end-to-end experiments Sparse Features – Millions of features, MIRA & PRO tuning Speed – Faster training, fastest open-source syntax decoder Language Model training with IRSTLM – Faster, bigger, open-source Making use of TMX – Use TMX as templates

Year 2 Priorities Code cleanup Incremental Training Better translation – smaller model – bigger data – faster training and decoding

Code cleanup Framework for feature functions – Easier to add new feature functions Cleanup – Refactor – Delete old code – Documentation

Incremental Training Incremental word alignment Dynamic suffix array Phrase-table update Better integration with rest of Moses

Smaller files Smaller binary – phrase-tables – language models Mobile devices Fits into memory  faster decoding Efficient data structures – suffix arrays – compressed file formats

Better Translations Consistently beat phrase-based models for every language pair Phrase-BasedHierarchical en-es es-en en-cs cs-en en-de de-en en-fr fr-en

Questions…