I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia Marija Brkić, University of Rijeka, Department of Informatics, Croatia Vlasta Kučiš, University of Maribor, Department of Translation Studies, Slovenia FF Zagreb – Informacijske znanosti
I Aim Text evaluation from four domains (city description, law, football, monitors) Cro-Eng - by four free online translation services (Google Translate, Stars21, InterTran and Translation Guide) En- Croatian - by Google Translate Measuring of inter-rater agreement (Fleiss kappa) influence of error types on the criteria of fluency and adequacy Pearson’s correlation
I I.Introduction II.MT evaluation III.Experimental study Translation tools Test set description Evaluation Error analysis Correlations IV.Conclusion
I I INTRODUCTION increased use of online in recent years, even among less widely spoken languages Desirable: moderate to good quality translations evaluation from the user's perspective Tools and evaluation mainly for widely spoken languages Possible use: gisting translations, information retrieval, i.e. question-answering systems 1976 Systran - first MT for the Commission of the European Communities + online tool + different versions first online translation tool - Babel Fish using Systran technology Important: realistic expectations
I Studies for popular languages Considerable difference in the quality of translation dependent on the language pair German-French (GT, ProMT, WorldLingo) three popular online tools Spanish-English (introductory textbook) 2008 – 13 languages into English (6 tools: BabelFish, Google Translate, ProMT, SDL free translator, Systran, World Lingo)
I MT evaluation – important in research and product design measure system performance identify weak points and adjust parameter settings language independent algorithms (BLEU, NIST) Better metric – closer to human evaluation need for qualitative evaluation of different linguistic phenomena
I II EXPERIMENTAL STUDY evaluation of free online translation services (FTS) – from user’s perspective undergraduate and graduate students of languages, linguistics and information sciences attending courses on language technologies at the University of Zagreb, Faculty of Humanities and Social Science Test set description texts 4 domains (city description, law, football, monitors) Cca 7-9 sentence per domain (17.8 word/ sent.) Cro-En, En-Cro
I Evaluators Cro-En: 48 students, final year of undergraduate and graduate levels En-Cro: 50 students, native speakers 75% of students attended language technology course(s) Average grades for free language resources on the Internet Evaluation – before pilot study
I Croatian tools/resourcesTools/ resources in general
I Desirable tools/ resources of appropriate quality
I Evaluation Manual evaluation fluency (indicating how much the translation is fluent in the target language) adequacy (indicating how much of the information is adequately transmitted) evaluation enriched by translation errors analysis −morphological errors, −untranslated words −lexical errors and word omissions −syntactic errors
I Tools Cro-En translations Google Translate (GT) Stars21 (S21) InterTran (IT) Translation Guide (TG) - guide.com guide.com En-Cro translations obtained from Google Translate
I Google Translate translation service provided by Google Inc. statistical MT based on huge amount of corpora It supports 57 languages, Croatian since 2008 S21 service powered by GT translations not always the same InterTran powered by NeuroTran and WordTran sentence-by-sentence and word-by-word Translation Guide powered by IT Different translations
I Results - Cro-En either low grades (TG and IT) or high grades (S21 and GT), in comparison to the average value (3.04) S21(4.66) : GT (4.62) – city description, legal GT – football, monitors Best average result – legal domain, then monitors and football Lowest – city description (the most free in style)
I Results - Cro-En -En-Cro - lower average results than the reverse direction: football (3.75 : 4.84), law, monitors -Higher average grade in city description (shorter sentences, mostly nominative constructions, frequent terms) -Football domain - specific terms, non-nominative constructions
I Error analysis En-Cro Translations offered by GT and S21 are very similar, although not identical TG and IT – difference in number of untranslated words TG does not recognize words with diacritics Cro-En the highest number of lexical errors, including also errors in style (av ) Untranslated words (1.83), morphological (1.75), syntactic errors (1.38) Lowest score, highest number of errors - football domain (mostly lexical errors and untranslated words) best score – in city description domain (lexcial errors) Lowest no. errors – legal domain (evenly distributed)
I Morphological errors – mostly in domain of monitors, the smallest no. in city desription (dominant value 1) Untranslated words - by far mostly in the football translation grades - mostly influenced by untranslated words Dominant values Morphological errors: 1 in city description and monitors, 3 in the legal and football Lexical errors: 1 in city description, others higher untranslated words - 1 in all domains syntactic errors - 1 in all domains but football (2-3)
I Pearson’s correlation smaller number of errors augments the average grade correlation between errors types and the criteria of fluency and adequacy fluency - more affected by the increase of lexical and syntactic errors, adequacy is more affected by untranslated words
I Fleiss' kappa for assessing the reliability of agreement among raters when giving ratings to the sentences Indicating extent to which the observed amount of agreement among raters exceeds what would be expected if all the raters made their ratings completely randomly. Score - between 0 and 1 (perfect agreement) slight agreementN – total of subjects fair agreement n – no. of raters per subject moderate agreement i – extent to which raters substantial agreement agree on i-subject almost perfect agreementj - categories
I relatively high level of the agreement among raters per domain and per system in Cro-En translations moderate (for IT translation service), substantial agreement (S21 and GT) perfect agreement (TG – the worst tool) En-Cro translations - inter-rater agreement per domain lowest level of agreement has been detected in the domains of football and law (from fair & moderate) – larger and more complex sentences substantial agreement ( ) – in city description level of inter-rater agreement is lower for En-Cro translations in all domains
I Conclusion evaluation study of MT in 4 domains Cro-En – 4 free online translation services En-Cro translations – by Google Translate Evaluator’s profile high interest in use of translation resources and tools Critical evaluation System evaluation perfect agreement in the ranking of TG as the worst translation service substantial agreement is achieved for S21 and GT services moderate agreement is shown for IT, which has performed slightly better than TG.
I Cro-En translations S21 and GT ( 4.63 to 4.84) - football, law and monitors city description - Cro-En lower than in En-Cro En-Cro direction – by GT lower grades than in the opposite direction (specific terms, non-nominative constructions, multi-word units) Except city description domain - containing mostly nominative constructions, frequent words, no specific terms Error analysis translation grades are mostly influenced by untranslated words (especially the criteria of adequacy) morphological and syntactic errors reflect grades in smaller proportion (fluency) ,
I Google Translate service used in both translation directions harvesting data from the Web, seems to be well trained and suitable for the translation of frequent expressions Doesn’t perform well where language information is needed, e.g. gender agreement, in MW expressions Further research Better quantitavie analysis per domain more detailed analysis of specific language phenomena
I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia Marija Brkić, University of Rijeka, Department of Informatics, Croatia Vlasta Kučiš, University of Maribor, Department of Translation Studies, Slovenia FF Zagreb – Informacijske znanosti