Modern MT Systems and the Myth of Human Translation: Real World Status Quo ● Intro ● MT & HT Definitions ● Comparison MT vs. HT ● Evaluation Methods ●

Slides:



Advertisements
Similar presentations
Dr. Stephen Doherty & Dr. Sharon O’Brien
Advertisements

Statistical modelling of MT output corpora for Information Extraction.
Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
Arthur Chan Prepared for Advanced MT Seminar
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
MT Evaluation: Human Measures and Assessment Methods : Machine Translation Alon Lavie February 23, 2011.
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven
Academic Communication Lesson 2 Pick up two different handouts per person from the desk at the front of the room: –“Choose a result” homework –“Strategy.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
THE TRANSLATION NETWORK Overview  Easily manage your multilingual sites  Synchronize content and manage changes  Translate content on the fly  Use.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
BLEU, Its Variants & Its Critics Arthur Chan Prepared for Advanced MT Seminar.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Minimum Error Rate Training in Statistical Machine Translation By: Franz Och, 2003 Presented By: Anna Tinnemore, 2006.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
Natural Language Processing Expectation Maximization.
An Event-based Digital Forensic Investigation Framework Brian D. Carrier Eugene H. Spafford DFRWS 2004.
Translation & Interpretation as a Profession Presented by CHICATA The Chicago Area Translators & Interpreters Association.
CSC 110 – Intro to Computing Lecture 2: More Computing History & Binary Numbers.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Computer-Aided Language Processing Ruslan Mitkov University of Wolverhampton.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
1 Design of Engineering Experiments Part 2 – Basic Statistical Concepts Simple comparative experiments –The hypothesis testing framework –The two-sample.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Arthur Chan Prepared for Advanced MT Seminar
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
12/08/1999 JHU CS /Jan Hajic 1 Introduction to Natural Language Processing ( ) Statistical Translation: Alignment and Parameter Estimation.
Public Speaking in Professional Environments. Types of Presentations in Public Speaking What you are used to from classroom speaking:  Informative/Demonstrative.
Automatic Post-editing (pilot) Task Rajen Chatterjee, Matteo Negri and Marco Turchi Fondazione Bruno Kessler [ chatterjee | negri | turchi
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Translation Memory System (TMS)1 Translation Memory Systems Presentation by1 Melina Takanen & Julianna Ekert CAT Prof. Thorsten Trippel University.
1 Machine Assisted Human Translation (MAHT) (…aka “Translation Memory” or “CAT tool”) …and what it does for the translator…
Processing of large document collections Part 5 (Text summarization) Helena Ahonen-Myka Spring 2005.
Chapter 3 Part II Describing Syntax and Semantics.
The Challenges and Core Competencies for China-based Localization Companies Henry Wang CEO, GlocalExpert International
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Impact of automated translation on mining knowledge from text data , Brno Luděk Svozil.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
A Multilingual Hierarchy Mapping Method Based on GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of.
Newmark: semantic & communicative translation
Paper II Topic Scotland and the impact of the Great Introduction and How Useful.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Approaches to Machine Translation
KantanNeural™ LQR Experiment
Statistical NLP: Lecture 9
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
Approaches to Machine Translation
Statistical vs. Neural Machine Translation: a Comparison of MTH and DeepL at Swiss Post’s Language service Lise Volkart – Pierrette Bouillon – Sabrina.
Translating Collocations for Bilingual Lexicons
Statistical NLP : Lecture 9 Word Sense Disambiguation
Presentation transcript:

Modern MT Systems and the Myth of Human Translation: Real World Status Quo ● Intro ● MT & HT Definitions ● Comparison MT vs. HT ● Evaluation Methods ● FAE Framework ● Conclusion ● Discussion

Is This for Me? ● (Freelance) translators and agencies ● Developers and vendors of MT systems ● People concerned with MT evaluation ● People concerned with HT evaluation This talk may be of benefit for: Not for interpreters and speech/non-text based issues

Introduction ● What is Machine Translation (MT)? ● What is [Human] Translation (HT)? „MT is the automatic translation of human language by computers.“ „The process of transforming text from one language into another language.“ „A written communication in a second language having the same meaning as the written communication in a first language.“

Introduction II ● Is there such a thing as HT? „Pure Human Translation“ „Machine Aided Human Translation“ „Human Aided Machine Translation“ ● Is HT equal to HT? „Native Speaker“ „Speaks Language X“ „[Trained] Professional“ „Trained Prof. specialized in X“

HT/MT Examples & Quizshow Original: Einzigartiger Freizeitpark für Groß und Klein T1: Singular recreational park for large and small T2: Unique leisure time park for largely and small T3: Ein Fantastische DinoPark ferrcoitung T4: Unique Freizeitpark at big and little T5: Unique amusement park for great and Klein T6: Unique leisure park for big and little T1: Babelfish/SYSTRAN T2: SDL FreeTranslation.com T3: Human T4: InterTran T5: Linguatex eTranslation T6: PetaMem LangSuite MT

Summary HT Quality ● Not all HTs are equal ● Significant amount done by untrained people ● Better performance of good(!) MT systems on these examples suggests rising MT competitiveness

Issues with MT & HT Evaluation ● Evaluation vs. Similarity Ngram does work? Why? ● Reference Translations: Cost & Availability Multiples – which „Axiomatic Truth“ ● Judging Expensive Questionable results ● Using MT-eval methods: limitations just mentioned

Mission Impossible? ● Fully automatic evaluation method for both MT & HT – with no human Intervention? ● Purpose: Automatic QA of translations – at least safe rejection of bad results ● Part of an iterative process (with faith in the translator)

We need it – should we give up?

Let's Try Anyway! ● Text Metrics Length Word/Sentence/Paragraph count ● Statistics Character/Word occurrence Ngram Collocations ● Translator Parameters ● Monolingual Corpora for SL & TL Statistical reference ● Dictionaries & Thesauri Adequacy check Translation distance Sentence Alignment ● Parallel Corpora Translation Length Ratio Extract Information Reference Data

Workflow

Conclusion ● Translation results of the best contemporary MT systems can be considered on par with the average HT ● The presented evaluation framework is just the beginning of an automatic evaluation method for both MT & HT ● It is a robust and reliable validation method with safe rejection of invalid/bad translations ● In production Q1/2005

Thanks! Q & A