Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka

Slides:



Advertisements
Similar presentations
Statistical modelling of MT output corpora for Information Extraction.
Advertisements

Rationale for a multilingual corpus for machine translation evaluation Debbie Elliott Anthony Hartley Eric Atwell Corpus Linguistics 2003, Lancaster, England.
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Language Assessment What it measures and how Jill Kerper Mora, Ed.D.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Arthur Chan Prepared for Advanced MT Seminar
MEANT: semi-automatic metric for evaluating for MT evaluation via semantic frames an asembling of ACL11,IJCAI11,SSST11 Chi-kiu Lo & Dekai Wu Presented.
I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan,
Word Sense Disambiguation for Machine Translation Han-Bin Chen
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Machine Translation Anna Sågvall Hein Mösg F
Orange: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation Chin-Yew Lin & Franz Josef Och (presented by Bilmes) or Orange: a.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Evaluating an MT French / English System Widad Mustafa El Hadi Ismaïl Timimi Université de Lille III Marianne Dabbadie LexiQuest - Paris.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
1 Lending a Hand: Sign Language Machine Translation Sara Morrissey NCLT Seminar Series 21 st June 2006.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
© 2014 The MITRE Corporation. All rights reserved. Stacey Bailey and Keith Miller On the Value of Machine Translation Adaptation LREC Workshop: Automatic.
An Automatic Segmentation Method Combined with Length Descending and String Frequency Statistics for Chinese Shaohua Jiang, Yanzhong Dang Institute of.
Automated Essay Evaluation Martin Angert Rachel Drossman.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Profile The METIS Approach Future Work Evaluation METIS II Architecture METIS II, the continuation of the successful assessment project METIS I, is an.
Arthur Chan Prepared for Advanced MT Seminar
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Leveraging Reusability: Cost-effective Lexical Acquisition for Large-scale Ontology Translation G. Craig Murray et al. COLING 2006 Reporter Yong-Xiang.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Digital Information and Heritage INFuture Zagreb, Sentence Alignment as the Basis For Translation Memory Database Sanja Seljan Faculty of.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Coşkun Mermer, Hamza Kaya, Mehmet Uğur Doğan National Research Institute of Electronics and Cryptology (UEKAE) The Scientific and Technological Research.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Tracking Language Development with Learner Corpora Xiaofei Lu CALPER 2010 Summer Workshop July 12, 2010.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
Korea Maritime and Ocean University NLP Jung Tae LEE
Modern MT Systems and the Myth of Human Translation: Real World Status Quo ● Intro ● MT & HT Definitions ● Comparison MT vs. HT ● Evaluation Methods ●
Chinese Word Segmentation Adaptation for Statistical Machine Translation Hailong Cao, Masao Utiyama and Eiichiro Sumita Language Translation Group NICT&ATR.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Introduction Chapter 1 Foundations of statistical natural language processing.
Information Transfer through Online Summarizing and Translation Technology Sanja Seljan*, Ksenija Klasnić**, Mara Stojanac*, Barbara Pešorda*, Nives Mikelić.
Towards the Use of Linguistic Information in Automatic MT Evaluation Metrics Projecte de Tesi Elisabet Comelles Directores Irene Castellon i Victoria Arranz.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Error Analysis of Two Types of Grammar for the purpose of Automatic Rule Refinement Ariadna Font Llitjós, Katharina Probst, Jaime Carbonell Language Technologies.
Pastra and Saggion, EACL 2003 Colouring Summaries BLEU Katerina Pastra and Horacio Saggion Department of Computer Science, Natural Language Processing.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Statistical Machine Translation Part II: Word Alignments and EM
Approaches to Machine Translation
Monoligual Semantic Text Alignment and its Applications in Machine Translation Alon Lavie March 29, 2012.
An Overview of Machine Translation
Using Translation Memory to Speed up Translation Process
Eiji Aramaki* Sadao Kurohashi* * University of Tokyo
The CoNLL-2014 Shared Task on Grammatical Error Correction
Approaches to Machine Translation
Machine Translation(MT)
Presentation transcript:

Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka Tomislav Vičić Freelance teacher of economics and translator Sanja Seljan Department of Information Sciences Faculty of Humanities and Social Sciences, University of Zagreb

I.Machine translation  Syntactic transfer  Example-based translation  Statistically-based translation II.Evaluation  Manual  Automatic III.Experimental study  Google Translate Service (Croatian → English)  Comparison and analysis  Manual evaluation IV.Conclusion 2/18 Outline

3/18 I. Machine translation basics Speeding up translation processSpeeding up translation process Limited human componentLimited human component Multilingual access to written materialMultilingual access to written material Limited capabilitiesLimited capabilities Help in discovering general idea behind textHelp in discovering general idea behind text For limited use onlyFor limited use only

4/18 Approaches Word-for-wordWord-for-word Syntactic transfer*Syntactic transfer* InterlinguaInterlingua Controlled languageControlled language Example based*Example based* Statistically based*Statistically based* Various combinationsVarious combinations

5/18 Syntactic transfer Involves some linguistic rulesInvolves some linguistic rules Analyzes sources and translates using intermediary linguistic representationsAnalyzes sources and translates using intermediary linguistic representations Usage still limited for particular purposes (i.e. scientific, marketing, etc.)Usage still limited for particular purposes (i.e. scientific, marketing, etc.) Examples: Systran, Eurotra, MetalExamples: Systran, Eurotra, Metal

6/18 Example-based Uses blocks of words (example sentences)Uses blocks of words (example sentences) Utilizes analogy principleUtilizes analogy principle Needs to be fed with infoNeeds to be fed with info System “learns” during augmenting stageSystem “learns” during augmenting stage Suitable for structurally completely different languagesSuitable for structurally completely different languages Example: translation memoriesExample: translation memories

7/18 Statistically-based (a.k.a. SMT) Utilizes statistical modelsUtilizes statistical models  Parameters derived from bilingual corpora Phrases as n-grams (n is number of terms in a phrase)Phrases as n-grams (n is number of terms in a phrase) Requires vast quantities of matched bilingual textsRequires vast quantities of matched bilingual texts Outputs most likely match inputsOutputs most likely match inputs Does not apply linguistic rulesDoes not apply linguistic rules Attempts to match language patternsAttempts to match language patterns 

ProblemsProblems  Modeling / Learning / Decoding ApproachesApproaches  Word-based / Phrase-based / Syntax-based ExampleExample  Google Translate Service 8/18 Statistically based ( cont. ) 

ManualManual  Human bilingual or monolingual evaluators score outputs according to fluency (grammar) and adequacy (preservation of information)  Time-consuming, expensive and very subjective Automatic (BLEU, METEOR, etc.)Automatic (BLEU, METEOR, etc.)  Reference translations  Goal  higher degree of correlation with human judgements 9/18 II. Evaluation

10/18 III. Experimental study Croatian – EnglishCroatian – English  “Very odd couple”  A lot of systematic, idiosyncratic and lexical differences Three types of texts:Three types of texts:  Corpus linguistics, annotation and research methods  Enterprises and Government's reform plan  Washing machine manual

Google translate web UI

12/18 SMT Offers Croatian as source and target languageOffers Croatian as source and target language Statistically basedStatistically based  Monolingual target language texts  Aligned texts (human translations) Fluency and adequacy highly depend on available corporaFluency and adequacy highly depend on available corpora

Reference translations vs. candidate translationsReference translations vs. candidate translations Levels of analysis:Levels of analysis:  Lexical (misuse of words, zerotones)  Morphological (wrong word forms)  Syntactic (word order)  Semantic (preservation of original message)  Usage of punctuation marks 13/18 Task

14/18 Manual Evaluation Procedure 6 bilingual evaluators and 21 sentences (machine translation and reference translation)6 bilingual evaluators and 21 sentences (machine translation and reference translation) 1 – 5 scale1 – 5 scale Fluency Adequacy 1 incomprehensible none 2 disfluent English little meaning 3 non- native English much meaning 4 good English most meaning 5 flawless English all meaning

HypothesesHypotheses  There is no significant difference in assigning score 3 according to both criteria (fluency and adequacy).  There are no significant differences in assigning score 3 to fluency and adequacy per each evaluator. 15/18 Chi-square Test (  2 )

 T here is no significant difference in assigning score 3 according to both criteria.  There are no significant differences in assigning score 3 to fluency and adequacy per each evaluator.  There are significant differences in assigning score 3 to fluency and adequacy for half of the evaluators. 16/18 Results st2nd3rd4th5th6th Fluency Adequacy Fluency and adequacy per average judgements

Usage:Usage:  Basic information transfer  Personal use only Improvements:Improvements:  Integration with language dependent modules  Human post-editing  Greater number of evaluators needed 17/18 IV. Conclusion

Thank you on your attention!