Www.systransoft.com1 www.systransoft.com 1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D.

Slides:



Advertisements
Similar presentations
Requirements Engineering Processes – 2
Advertisements

© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
October 1, 2008www.Connotative.com1 Commercializing Access to the Parallel Universe of Connotative Meaning.
Testing Relational Database
Fundamentals Fundamentals of Thermal Conductivity Measurement via ASTM 5470 by Dr. John W. Sofia Analysis Tech Inc
Chapter 8 Geocomputation Part B:
Chapter 26 Legacy Systems.
Chapter 7 System Models.
Software Re-engineering
OLIF V2 Gr. Thurmair April OLIF April 2000 OLIF: Overview Rationale Principles Entries Descriptions Header Examples Status.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
By Rick Clements Software Testing 101 By Rick Clements
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Copyright CompSci Resources LLC Web-Based XBRL Products from CompSci Resources LLC Virginia, USA. Presentation by: Colm Ó hÁonghusa.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 6 1 Microsoft Office Word 2003 Tutorial 6 – Creating Form Letters and Mailing Labels.
XP New Perspectives on Microsoft Office Word 2003 Tutorial 7 1 Microsoft Office Word 2003 Tutorial 7 – Collaborating With Others and Creating Web Pages.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Making the System Operational
Unité 3 Leçon oui! Yes! 2. mais oui! Sure! 3. Bien sûr! Of course! 4. Non! No 5. Mais non! Of course not! 6. Peut-être Maybe 7. Pierre est….. Pierre.
1. 2 Chairing a meeting… 3 OPENING Lets get started Lets start Shall we start Lets get down to work.
Web Site Integration using WordPress MySql A presentation (that should have been made) to WordPress Meetup By Peter Mantos; Mantos I.T.Consulting, Inc.
Introduction Lesson 1 Microsoft Office 2010 and the Internet
Knowledge Extraction from Technical Documents Knowledge Extraction from Technical Documents *With first class-support for Feature Modeling Rehan Rauf,
LABELING TURKISH NEWS STORIES WITH CRF Prof. Dr. Eşref Adalı ISTANBUL TECHNICAL UNIVERSITY COMPUTER ENGINEERING 1.
Radiographic Anatomy Quiz ©2007 Kenneth J. Young, D.C., D.A.C.B.R., F.C.C., F.E.A.C. (Radiology) Young Radiology Consulting Press the space bar or click.
Credit Card Operations Bülent Şenver
Eligibility, Benefits, and Pre-certifications
Configuration management
Software change management
October 2002www.qimpro.com1 SIX SIGMA BLACK BELT Summary of Steps.
Controlled Language in action for MT Johann Roturier May 2009.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Testing Workflow Purpose
Legacy Systems Older software systems that remain vital to an organisation.
Chapter 11: The X Window System Guide To UNIX Using Linux Third Edition.
The World Wide Web. 2 The Web is an infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that.
4 Oracle Data Integrator First Project – Simple Transformations: One source, one target 3-1.
Machine Translation II How MT works Modes of use.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software processes 2.
Chapter 10 Software Testing
Presented by Douglas Greer Creating and Maintaining Business Objects Universes.
How creating a course on the e-lastic platform 1.
25 seconds left…...
Copyright 2001 Advanced Strategies, Inc. 1 Data Bridging An Overview Prepared for DIGIT By Advanced Strategies, Inc.
Slide 1 of 29 Community news Slide 2 of 29 Nouvelles de la communauté…
Chapter 10: The Traditional Approach to Design
Systems Analysis and Design in a Changing World, Fifth Edition
We will resume in: 25 Minutes.
12 January 2009SDS batch generation, distribution and web interface 1 ExESS IT tool for SDS batch generation, distribution and web interface ExESS IT tool.
© Ericsson Interception Management Systems, 2000 CELLNET Drop Administering IMS Database Module Objectives To add a network elements to the database.
How to Use a Translation Memory Prof. Reima Al-Jarf King Saud University, Riyadh, Saudi Arabia Homepage:
Database Administration
Chapter 11 Describing Process Specifications and Structured Decisions
Chapter 13 The Data Warehouse
13-1 © Prentice Hall, 2004 Chapter 13: Designing the Human Interface (Adapted) Object-Oriented Systems Analysis and Design Joey F. George, Dinesh Batra,
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 14: Protection.
14-1 © Prentice Hall, 2004 Chapter 14: OOSAD Implementation and Operation (Adapted) Object-Oriented Systems Analysis and Design Joey F. George, Dinesh.
Use of EVDAS for monitoring purposes Piotr Nowicki, MD Warsaw, 06-Oct-2011.
From Model-based to Model-driven Design of User Interfaces.
Student Interface for Online Testing Training Module Copyright © 2014 American Institutes for Research. All rights reserved.
Student Interface for Online Testing Training Module Copyright © 2014 American Institutes for Research. All rights reserved.
© Copyright 2011 John Wiley & Sons, Inc.
1 XML Web Services Practical Implementations Bob Steemson Product Architect iSOFT plc.
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Presentation transcript:

1 TM Translating Subtitles using Machine Translation Practices, Problems, Methodology Elsa Sklavounou, Ph. D. Linguist, Co-funded Projects Technical Coordinator SYSTRAN

2 TM SYSTRAN MT Customization Methodology Overview A customization project involves three different customization levels that provide incremental higher translation quality: Basic Terminology Complex Terminology Linguistic Rules

3 TM SYSTRAN MT Customization Methodology Overview Basic Terminology The first step entails the creation of a User Dictionary that covers most of the noun terminology in the corpus, and various simple adjective and verb terms. Complex Terminology The second level concerns the coding of complex terminological entries; such as the coding of complex verbs with their complements (subject, object…) and their translations. Linguistic Rules The third level involves language-specific code modifications in the SYSTRAN linguistic modules.

4 TM SYSTRAN MT Customization Methodology Level 1 & Level 2 Customization level 1 and 2 focuses on the implementation in the systems of specialized terminology from the corpus. Level 1 and 2 tasks include: Simple and complex terms extraction ; Simple and complex terms translations ; Simple and complex terms coding ; Simple and complex terms review ;

5 TM SYSTRAN MT Customization Methodology Level 1 & Level 2 Step 1: Corpus installation and analysis Prerequisite 1: a formatted corpus Step 2: Term extraction Simple terms (nouns and noun expressions) Complex terms (verb patterns) DNT (Do Not Translate) integration

6 TM SYSTRAN MT Customization Methodology Level 3 Customization level 3 focuses on the implementation of linguistic rules uniquely adapted to language-specific syntactic and semantic issues found in translations taken from the corpus. Level 3 tasks include: Detailed linguistic evaluations and the development of a comprehensive customization plan: Implementation of customized rules Regression tests Correction of linguistic translation errors Acceptance testing before release

7 TM SYSTRAN MT Customization Methodology Quality Levels Estimate of the quality levels that may be achieved for each customization level.

8 TM SYSTRAN MT Customization Methodology Software Tools The process for coding simple and complex terms and related dictionary maintenance is managed by the SYSTRAN Linguistics Platform that integrates the following two tools, required to complete customization levels 1 and 2.

9 TM SYSTRAN MT Customization Methodology Software Tools SYSTRAN Dictionary Manager The SYSTRAN Dictionary Manager (SDM) enables translators to build and manage multilingual dictionaries. SDM includes preparation steps for dictionary coding tasks, an online dictionary lookup (via an HTML interface), and a compiler for runtime machine translation dictionaries. It is composed of three main components: a database, HTML query form (dictionary lookup, reports, logs, import and export) and a Windows client (interactive coding tool).

TM SYSTRAN Customization Methodology Software Tools The SYSTRAN Review Manager (SRM) is a productivity tool used for the review quality assessment and maintenance of linguistic resources used combined with a SYSTRAN system.

TM SYSTRAN Customization Methodology Prerequisite 1: a formatted grammatical corpus Grammar Writing Rules Using Articles Avoiding Speech Ambiguity Using Enumeration Ensuring Subject-Verb Agreement Using Prepositions Using Infinitives at the Beginning of Sentences Using Imperatives Observing Punctuation Rules Using Main Clauses Using Subordinate Clauses Using Relative Clauses Avoiding Multiple Stacking Using Compound Words Using Capitalization Using Spelling Variations Lexical Ambiguities Disambiguation of Product Names and Menus Avoiding Lexical Ambiguities Using Compounds Format and Typographical Issues Segmentation

TM SYSTRAN Customization Methodology for MUSA Two-process fully-automatically generated Corpus: Speech Recognition (KU Leuven), Automatic Sentence Compression (CNTS) First priority Subtitles Constraints Second Priority The least possible ambiguous content Lesson learned : No prerequisite

TM SYSTRAN MT Customization Methodology Upgraded Software Tools (Client Tools v5)

TM SYSTRAN Translation Project Manager Terminology Review Not Found Words Extraction Reviewing Terminology and Sentences The Terminology Review tab in the Review window lets you identify expressions such as Not Found Words or Terminology extracted by the software.

TM SYSTRAN Translation Project Manager Terminology Review Not Found Words Extraction Examples SRC_Id these parents know measles can be dangerous, but they don't want their child to have MMR, the triple vaccine which protects them from measles, mumps and rubella. Raw MT ces parents savent la rougeole peut être dangereuse, mais ils ne veulent pas que leur enfant a MMR, le vaccin triple qui les protège contre la rougeole, les oreillons et la rubéole.

TM SYSTRAN Translation Project Manager Alternative Meanings Alternative Meanings shows alternative translations based on different meanings of a source word or expression. The Alternative Meanings tab in the Review window shows alternative meanings for expressions in SYSTRAN or User Dictionaries

TM SYSTRAN Translation Project Manager Alternative Meanings Examples SRC_Id they'd rather pay for single vaccines at 60 pounds a shot, even though the government insists MMR is safe. Raw MT ils payeraient plutôt les vaccins uniques à 60 livres un coup de feu, quoique le gouvernement exige que MMR est sûr. Customized MT ils payeraient plutôt les vaccins uniques à 60 livres une injection, quoique le gouvernement exige que MMR est sûr.

TM SYSTRAN Dictionary Manager User Dictionaries (UDs) User Dictionaries (UDs) let you increase the quality of source language analyses, which also increases the translation output for all associated target languages. UDs can be used for a number of functions, including: Automatically translating Not Found Words in the SYSTRAN dictionary. Overriding the target-language meaning of a word or expression in the SYSTRAN dictionaries, a capability that lets you customize translation output to fit specific needs. Ensuring that an expression is always treated as a unit by SYSTRAN analysis programs.

TM SYSTRAN Dictionary Manager User Dictionaries (UDs) Metrics Type of Dictionary ENFR ENEL Do Not Translate Words 3532 entries (enxx) Proper Nouns 1495 entries (enfr) 1495 entries (enel) MUSA Terminology 1443 entries (enfr) 5228 entries (enel)

TM SYSTRAN Dictionary Manager User Dictionaries (UDs) Examples SRC_ID Andrew Wakefield ignited the debate over MMR by announcing the findings of research into a group with autism and bowel disease. Raw MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec la maladie d'autism et d'entrailles. Customized MT Andrew Wakefield a enflammé la discussion au-dessus de MMR en annonçant les résultats de la recherche dans un groupe avec autisme et maladie d'entrailles.

TM SYSTRAN Translation Project Manager Source Analysis Interactive Disambiguation The Source Analysis tab in the Review window shows how the software handled source ambiguities and allows you to override the software selections.

TM SYSTRAN Translation Project Manager Source Analysis Interactive Disambiguation Examples ID 523 At first we thought it was parts of the building but it was people, literally people falling all around us. Raw MT D'abord nous avons pensé que ce faisait partie du bâtiment mais c'était les gens, peuplent littéralement la chute tout autour de nous. Customized MT Dabord nous avons pensé que cetait des fragments du bâtiment, mais cétait des gens, littéralement des gens qui tombaient autour de nous.

TM SYSTRAN Dictionary Manager Normalization Dictionaries (NDs) Normalization Dictionaries (NDs) There are two types of Normalization Dictionaries (NDs): source normalization and target normalization. Source normalization normalizes source document before translation. Target normalization adapts translation output to user needs in term of terminology consistency. It can also provide a way to replace expressions chosen by the softwares translation engine with user-defined expressions.

TM SYSTRAN Dictionary Manager Normalization Dictionaries (NDs) Examples SRC_IDs we did n't know she had measles but we do. I mean I ca n't help... Raw MT nous avons fait le n't savons qu'il a eu la rougeole mais nous faisons. Je veux dire l'aide de n't d'I ca… Customized MT via SRC Normalization nous n'avons pas su qu'il a eu la rougeole mais nous faisons. Je veux dire que je ne peux pas aider

TM SYSTRAN Translation Project Manager Sentence Review for Translation Memory Construction The Sentence Review tab in the Review window compares sentences in the source and target. You can then check the sentences you want to send to User Dictionaries, where you can work with them further in order to post-edit them and construct Translation Memories.

TM SYSTRAN Dictionary Manager Translation Memories (TMs) Translation Memory (TM) A set of translated and validated sentences that can be integrated into the translation process. Translation Memories (TMs) are databases of aligned pre-translated sentences. Unlike Dictionaries, TM entries can be formatted (for example, italic or bold) and are used by the translation engine to perform matches on full sentences in the source document. TMs are not usually created manually, but are built using SYSTRANs Translation Project Export or from TMX files.

TM SYSTRAN Dictionary Manager Translation Memories (TMs) Examples ID 370 Now people kind of started panicking and said we've got to leave no matter what. Raw MT Maintenant sorte de personnes de panique commencée et dite nous avons pour laisser n'importe ce que. Customized MT Les gens maintenant avaient lair de paniquer disant quils devaient à tout prix partir.

TM SYSTRAN Dictionary Manager Translation Memories (TMs) Translation Memory Import/Export Already existent Tmx standard translation memory exchange files can be imported/exported via SYSTRAN Dictionary Manager.