Presentation is loading. Please wait.

Presentation is loading. Please wait.

I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan,

Similar presentations


Presentation on theme: "I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan,"— Presentation transcript:

1 I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia Marija Brkić, mbrkic@uniri.hrmbrkic@uniri.hr University of Rijeka, Department of Informatics, Croatia Vlasta Kučiš, asta.kucis@siol.netasta.kucis@siol.net University of Maribor, Department of Translation Studies, Slovenia FF Zagreb – Informacijske znanosti

2 I Aim  Text evaluation from four domains (city description, law, football, monitors)  Cro-Eng - by four free online translation services (Google Translate, Stars21, InterTran and Translation Guide)  En- Croatian - by Google Translate  Measuring of inter-rater agreement (Fleiss kappa)  influence of error types on the criteria of fluency and adequacy  Pearson’s correlation

3 I I.Introduction II.MT evaluation III.Experimental study  Translation tools  Test set description  Evaluation  Error analysis  Correlations IV.Conclusion

4 I I INTRODUCTION  increased use of online in recent years, even among less widely spoken languages  Desirable: moderate to good quality translations  evaluation from the user's perspective  Tools and evaluation mainly for widely spoken languages  Possible use: gisting translations, information retrieval, i.e. question-answering systems  1976 Systran - first MT for the Commission of the European Communities + online tool + different versions  1997 - first online translation tool - Babel Fish using Systran technology  Important: realistic expectations

5 I  Studies for popular languages  Considerable difference in the quality of translation dependent on the language pair  2010 - German-French (GT, ProMT, WorldLingo)  2011- three popular online tools  2006 - Spanish-English (introductory textbook)  2008 – 13 languages into English (6 tools: BabelFish, Google Translate, ProMT, SDL free translator, Systran, World Lingo)

6 I  MT evaluation – important in research and product design  measure system performance  identify weak points and adjust parameter settings  language independent algorithms (BLEU, NIST)  Better metric – closer to human evaluation  need for qualitative evaluation of different linguistic phenomena

7 I II EXPERIMENTAL STUDY  evaluation of free online translation services (FTS) – from user’s perspective  undergraduate and graduate students of languages, linguistics and information sciences attending courses on language technologies at the University of Zagreb, Faculty of Humanities and Social Science Test set description  texts 4 domains (city description, law, football, monitors)  Cca 7-9 sentence per domain (17.8 word/ sent.)  Cro-En, En-Cro

8 I Evaluators  Cro-En: 48 students, final year of undergraduate and graduate levels  En-Cro: 50 students, native speakers  75% of students attended language technology course(s) Average grades for free language resources on the Internet Evaluation – before pilot study

9 I Croatian tools/resourcesTools/ resources in general

10 I Desirable tools/ resources of appropriate quality

11 I Evaluation Manual evaluation  fluency (indicating how much the translation is fluent in the target language)  adequacy (indicating how much of the information is adequately transmitted)  evaluation enriched by translation errors analysis −morphological errors, −untranslated words −lexical errors and word omissions −syntactic errors

12 I Tools Cro-En translations  Google Translate (GT) - http://translate.google.com http://translate.google.com  Stars21 (S21) - http://stars21.com/translator http://stars21.com/translator  InterTran (IT) - http://transdict.com/translators/intertran.html http://transdict.com/translators/intertran.html  Translation Guide (TG) - http://www.translation- guide.com http://www.translation- guide.com En-Cro translations  obtained from Google Translate

13 I Google Translate  translation service provided by Google Inc.  statistical MT based on huge amount of corpora  It supports 57 languages, Croatian since 2008 S21 service  powered by GT  translations not always the same InterTran  powered by NeuroTran and WordTran  sentence-by-sentence and word-by-word Translation Guide  powered by IT  Different translations

14 I Results - Cro-En  either low grades (TG and IT) or high grades (S21 and GT), in comparison to the average value (3.04)  S21(4.66) : GT (4.62) – city description, legal  GT – football, monitors  Best average result – legal domain, then monitors and football  Lowest – city description (the most free in style)

15 I Results - Cro-En -En-Cro - lower average results than the reverse direction: football (3.75 : 4.84), law, monitors -Higher average grade in city description (shorter sentences, mostly nominative constructions, frequent terms) -Football domain - specific terms, non-nominative constructions

16 I Error analysis En-Cro  Translations offered by GT and S21 are very similar, although not identical  TG and IT – difference in number of untranslated words  TG does not recognize words with diacritics Cro-En  the highest number of lexical errors, including also errors in style (av. 2.44 )  Untranslated words (1.83), morphological (1.75), syntactic errors (1.38)  Lowest score, highest number of errors - football domain (mostly lexical errors and untranslated words)  best score – in city description domain (lexcial errors)  Lowest no. errors – legal domain (evenly distributed)

17 I  Morphological errors – mostly in domain of monitors, the smallest no. in city desription (dominant value 1)  Untranslated words - by far mostly in the football  translation grades - mostly influenced by untranslated words Dominant values  Morphological errors: 1 in city description and monitors, 3 in the legal and football  Lexical errors: 1 in city description, others higher  untranslated words - 1 in all domains  syntactic errors - 1 in all domains but football (2-3)

18 I Pearson’s correlation  smaller number of errors augments the average grade  correlation between errors types and the criteria of fluency and adequacy  fluency - more affected by the increase of lexical and syntactic errors,  adequacy is more affected by untranslated words

19 I Fleiss' kappa  for assessing the reliability of agreement among raters when giving ratings to the sentences  Indicating extent to which the observed amount of agreement among raters exceeds what would be expected if all the raters made their ratings completely randomly.  Score - between 0 and 1 (perfect agreement)  0.0-0.20 slight agreementN – total of subjects  0.21-0.40 fair agreement n – no. of raters per subject  0.41-0.60 moderate agreement i – extent to which raters  0.61-0.80 substantial agreement agree on i-subject  0.81-1.00 almost perfect agreementj - categories

20 I  relatively high level of the agreement among raters per domain and per system in Cro-En translations  moderate 0.41-0.60 (for IT translation service),  substantial agreement 0.61-0.80 (S21 and GT)  perfect agreement 0.80-1.00 (TG – the worst tool)  En-Cro translations - inter-rater agreement per domain  lowest level of agreement has been detected in the domains of football and law (from 0.4-0.49 fair & moderate) – larger and more complex sentences  substantial agreement (0.61-0.80) – in city description  level of inter-rater agreement is lower for En-Cro translations in all domains

21 I Conclusion  evaluation study of MT in 4 domains  Cro-En – 4 free online translation services  En-Cro translations – by Google Translate  Evaluator’s profile  high interest in use of translation resources and tools  Critical evaluation  System evaluation  perfect agreement in the ranking of TG as the worst translation service  substantial agreement is achieved for S21 and GT services  moderate agreement is shown for IT, which has performed slightly better than TG.

22 I Cro-En translations  S21 and GT ( 4.63 to 4.84) - football, law and monitors  city description - Cro-En lower than in En-Cro En-Cro direction – by GT  lower grades than in the opposite direction (specific terms, non-nominative constructions, multi-word units)  Except city description domain - containing mostly nominative constructions, frequent words, no specific terms Error analysis  translation grades are mostly influenced by untranslated words (especially the criteria of adequacy)  morphological and syntactic errors reflect grades in smaller proportion (fluency) ,

23 I Google Translate service  used in both translation directions  harvesting data from the Web, seems to be well trained and suitable for the translation of frequent expressions  Doesn’t perform well where language information is needed, e.g. gender agreement, in MW expressions Further research  Better quantitavie analysis per domain  more detailed analysis of specific language phenomena

24 I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan, sseljan@ffzg.hrsseljan@ffzg.hr University of Zagreb - Faculty of Humanities and Social Sciences, Department of Information Sciences, Croatia Marija Brkić, mbrkic@uniri.hrmbrkic@uniri.hr University of Rijeka, Department of Informatics, Croatia Vlasta Kučiš, asta.kucis@siol.netasta.kucis@siol.net University of Maribor, Department of Translation Studies, Slovenia FF Zagreb – Informacijske znanosti


Download ppt "I Evaluation of Free Online Machine Translations for Croatian-English and English-Croatian Language Pairs Sanja Seljan,"

Similar presentations


Ads by Google