Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin.

Slides:



Advertisements
Similar presentations
Dr. Stephen Doherty & Dr. Sharon O’Brien
Advertisements

European Patent Office Wolfgang Täger December 2006 European Patent Office European Machine Translation Programme.
Mini Presentations: How To
© 2000 XTRA Translation Services Is MT technology available today ready to replace human translators?
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Controlled Language in action for MT Johann Roturier May 2009.
Machine Translation The Translator s Choice Heidi Düchting Sylke Krämer Johann Roturier.
Machine Translation II How MT works Modes of use.
Post-Editing – Professional translation service redefined
1 Rules Based Machine Translation Fred Hollowood Consultant RBMT and CL.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Probabilistic Detection of Context-Sensitive Spelling Errors Johnny Bigert Royal Institute of Technology, Sweden
Languages & The Media, 4 Nov 2004, Berlin 1 Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System.
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
April 2004 TM RASMAT 2004 – Uppsala Business Needs and Practices Pierre-Yves Foucou CTO - SYSTRAN.
Machine Translation Anna Sågvall Hein Mösg F
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
KS2 English Parent Workshop January 2015
A community-based project Maria Carreira. Background Spanish 250: A class for Spanish HL speakers; Six-units, hybrid. Meets two days a week for a total.
An innovative platform to allow translation and indexing of internet sites Localization World
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
Automating Translation in the Localisation Factory An Investigation of Post-Editing Effort Sharon O’Brien Dublin City University.
Stefan Kreckwitz Senior System Engineer across Systems GmbH „Future Web-Based Translation Environments“ Localisation Research Forum 28 September 2007,
© 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1 Use Cases Descriptions and Use Case Models.
MECHANICS OF WRITING C.RAGHAVA RAO.
Blending SEO with Localisation Andy Atkins-Krüger.
FLAVIUS Technical presentation (Overblog, Qype, TVTrip) - WP2 Platform architecture.
Stephen Doherty, CNGL/SALIS
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Development and Impact of Software Solutions Application of software development approaches.
Scientific Writing Fred Tudiver, MD Karen Smith, MA Ivy Click, MA Amelia Nichols, MS.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Getting the Language Right ITSW 1410 Presentation Media Software Instructor: Glenda H. Easter.
practical aspects1 Translation Tools Translation Memory Systems Text Concordance Tools Useful Websites.
1 Developing a Departmental Style Guide by Jean Hollis Weber Presented by Elliot Jones.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
Digital Information and Heritage INFuture Zagreb, Sentence Alignment as the Basis For Translation Memory Database Sanja Seljan Faculty of.
Sofia Garcia/Roberto Silva Tutorial Workshop, GrenobleDate: 31/Jan/2007 The work of a professional translator and the translation agency V1.0.
GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.
Case Study Summary Link Translation entered a partner agreement with Autodesk to provide translation solutions integrating human and machine translation.
Gerrit Schutte OHIM 9th of December, 2011 Trademark terminology control.
Error Correction: For Dummies? Ellen Pratt, PhD. UPR Mayaguez.
Systems Analysis and Design in a Changing World, Fourth Edition
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, January 2003.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
ITC Software ITC LOCALIZATION TESTING SERVICES.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Development of an Intelligent Translation Memory MorphoLogic SZAK Publishers Balázs Kis
1 Machine translation or Automatic translation or Computer-assisted translation.
Requirement engineering & Requirement tasks/Management. 1Prepared By:Jay A.Dave.
Category 2 Category 6 Category 3.
© University of Manchester Creative Commons Attribution-NonCommercial 3.0 unported 3.0 license Quality Assurance, Ontology Engineering, and Semantic Interoperability.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Setting up localization collaboration for successful globalization. Sanghwan Lee.
This has been created by QA InfoTech. Choose QA InfoTech as your Automated testing partner. Visit for more information.
1 January 31, Documenting Software William Cohen NCSU CSC 591W January 31, 2008.
Year 6 Objectives: Writing
Words, Phrases, Clauses, & Sentences
Software Word Processors.
SAT Writing and Language/ACT English:
The Difference Between Revision and Editing
Chapter 13 Quality Management
Statistical n-gram David ling.
Communicating Effectively in Meetings and Conversations
EWS – Year 7 Food and Nutrition – Spaghetti Bolognese
TECHNICAL REPORTS WRITING
System Model Acquisition from Requirements Text
Presentation transcript:

Can Controlled Language Rules increase the value of MT? Fred Hollowood & Johann Rotourier Symantec Dublin

Localisation Challenge Databases filled with English content Large volumes Perishable Technical Fast delivery Cost effective

Goals Reduce cost of Translation to 30% Implement CL within the authoring community Foster the use of editor software to police the CL rule set Identify the most efficient MT system for each target language Develop Post-Editing guidelines Refine Symantec glossaries to assist in dictionary preparation

Controlled Language and MT Controlled Language MT system Rule Sets Terminology Style Editors Language Pairs Jp, De, Fr, It, Es Post Editing Assessment

Sequence of Events Identify a corpus Develop a test suite Develop terminology Work with MT engines Assess results

Two Questions How effective are CL rules in terms of post-editing effort? Which CL rules provide the best results?

Corpus Selection Origin stream of XML messages Volume 30,000 words Process Use TM technology to pre-process raw XML to provide strings for MT Use Macros to tidy up untranslatable text

Terminology Extraction Extraction Tools: Wordsmith Tools 4 Removal of duplicates Spelling variants Hyphenation variants Capitalisation variants Symbol/Plain Abbreviation/Plain Removal of synonyms

Custom Dictionaries Current MT systems Systran Premium 4.0 Logomedia Translate Pro —Differing capabilities —Differing function Per target language Grammars Styles

Test Suite 59 rules examined 17 of which already encapsulated in Symantec’s writing guidelines Classification 8 lexical 40 syntactic 11 textual

Controlled Language Sources

Testing the Rules Process Find an example sentence that does not conform to the rule Edit it to conform to all other rules under study Minimize the linguistic complexity (single test) Apply the CL rule Repeat the procedure to obtain 3 test examples Test Suite 59 rules expressed as 177 sentences

Post Editing Guidelines Ensure information transfer Modify what is grammatically deviant from commercial quality Modify what is lexically essential for understanding in target. Avoid the use of synonyms for the sake of originality Don’t forget that all the words are probably present in the output ( possibly wrong order) Remember style does not matter but information accuracy does. Don’t dally, if an improvement is not obvious, move along

Metrics Generation Quality levels Excellent (4), Good (3), Medium (2), Poor (1) Uncontrolled source generates output A Controlled source generates output B Focus is on Usability Evaluation by native speakers Further study is being done to link into other systems of quality evaluation Blackjack SAE J 2450

Overall evaluation (French)

Overall evaluation (Japanese)

Overall evaluation (German)

Preliminary Results CL significant impact Benefit varies by language Lots of scope for further study Some rules are more effective than others (score range: 0- 17) Symantec’s implied rules have mixed effectiveness Recommend 7 additional rules

Additional rules Rules with an impact in all languages Do not omit words within lexical items, even when the term has already been used in the sentence (12). Repeat the head noun with conjoined articles or prepositions. (15) Do not use slashes to list lexical items (except for product names). (14) Always write a verb next to its particle. (17) Only use the modal ‘could’ when the sentence contains ‘if’, otherwise use ‘can’. (10) Be very careful with the –ing words: If it is a gerund, use an article in front of it. (7). If it is introducing a new clause, use ‘by’ in front it (8). If it is modifying a noun in a non-finite clause, replace it with a relative clause. (5) Make sure that every segment can stand syntactically alone. (11) Avoid footnotes in the middle of a segment. Turn footnotes into independent segments. (11)

Next Steps Apply subsets of rules to a larger corpus. Language checker Acrolinx Increase the number of MT engines studied Comprendium/Prompt (European languages) Fujitsu/Nova’s PC Transer (Japanese) Further refine Post Editing guidelines Keep abreast of upgrades in current systems Bugs fixed New versions of software Move to a production pilot project