Joke Daems Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same.

Joke Daems joke.daems@ugent.be www.lt3.ugent.be/en/projects/robot Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same coin assessing translation quality through adequacy and acceptability error analysis

What makes error analysis so complicated? “There are some errors for all types of distinctions, but the most problematic distinctions were for adequacy/fluency and seriousness.” – Stymne & Ahrenberg, 2012  Does a problem concern adequacy, fluency, both, neither?  How do we determine the seriousness of an error?

Two types of quality “Whereas adherence to source norms determines a translation's adequacy as compared to the source text, subscription to norms originating in the target culture determines its acceptability.” - Toury, 1995  Why mix?

2-step TQA approach Acceptability = target norms Adequacy = target vs. source Quality Assessment

Subcategories Acceptability Grammar & Syntax Lexicon Spelling & typos Style & register Coherence AdequacyContradictionDeletionAdditionWord senseMeaning shift

Acceptability: fine-grained Grammar & SyntaxLexiconSpelling & TyposStyle & RegisterCoherence articlewrong prepositioncapitalizationregisterconjunction comparative/superlativewrong collocationspelling mistakeuntranslatedmissing info singular/pluralword nonexistentcompoundrepetitionlogical problem verb form punctuationdisfluentparagraph article-noun agreement typoshort sentencesinconsistency noun-adj agreement long sentencecoherence - other subject-verb agreement text type reference style – other missing superfluous word order structure grammar – other

Adequacy: fine-grained Meaning shift contradiction meaning shift caused by misplaced word word sense disambiguationdeletion hyponymyaddition hyperonymyexplicitation terminologycoherence quantityinconsistent terminology timeother meaning shift caused by punctuation

How serious is an error? “Different thresholds exist for major, minor and critical errors. These should be flexible, depending on the content type, end-user profile and perishability of the content.” - TAUS, error typology guidelines, 2013  Give different weights to error categories depending on text type & translation brief

Reducing subjectivity Flexible error weights More than one annotator Consolidation phase

TQA: Annotation (brat) 1) Acceptability 2) Adequacy

Application example: comparative analysis

Next step: diagnostic & comparative evaluation What makes a ST-passage problematic? How problematic is this passage really? (i.e.: how many translators make errors) Which PE errors are caused by MT? Which MT errors are hardest to solve?  Link all errors to corresponding ST-passage

Source text-related error sets ST: Changes in the environment that are sweeping the planet... MT: Veranderingen in de omgeving die het vegen van de planeet tot stand brengen... (wrong word sense) "Changes in the environment that bring about the brushing of the planet..." PE1: Veranderingen in de omgeving die het evenwicht op de planeet verstoren... (other type of meaning shift) "Changes in the environment that disturb the balance on the planet..." PE2: Veranderingen in de omgeving die over de planeet rasen... (wrong collocation + spelling mistake) "Changes in the environment that raige over the planet..."

Application example: impact of MT errors on PE

Summary Improve error analysis by: – judging acceptability and adequacy separately – making error weights depend on translation brief – having more than one annotator – introducing consolidation phase Improve diagnostic and comparative evaluation by: – linking errors to ST-passages – taking number of translators into account

Open questions How can we reduce annotation time? – Ways of automating (part) of the process? – Limit annotation to subset of errors? How to better implement ST-related error sets? – Ways of automatically aligning ST, MT, and various TT’s at word-level?

Thank you for listening For more information, contact: joke.daems@ugent.be Suggestions? Questions?

Quantification of ST-related error sets ST MT (1) MT1(0.5) wrong word sense (0.5) MT2 (0.5) PE (1) PE1 (0.5) other meaning shift (0.5) PE2(0.5) wrong collocation (0.25) spelling mistake (0.25)

Inter-annotator agreement HT&PE acceptability HT&PE adequacyMT acceptabilityMT adequacy Exp1Exp2Exp1Exp2Exp1Exp2Exp1Exp2 Initial agreement 39% (κ=0.32) 50% (κ=0.44) 42% (κ=0.31) 46% (κ=0.30) 53% (κ=0.49) 79% (κ=0.77) 57% (κ=0.46) 51% (κ=0.41) Agreement after consolidation 67% (κ=0.65) 81% (κ=0.80) 82% (κ=0.79) 94% (κ=0.92) 84% (κ=0.83) 95% (κ=0.94) 94% (κ=0.92) 86% (κ=0.83) Correlation between annotators r=0.67, n=38, p<0.001 r=0.95, n=34, p<0.001 r=0.87, n=38, p<0.001 r=0.86, n=34, p<0.001 n/a Agreement on categories 90% (κ=0.89) 89% (κ=0.88) 89% (κ=0.87) 88% (κ=0.83) 83% (κ=0.81) 93% (κ=0.93) 86% (κ=0.79) 86% (κ=0.82)

Joke Daems Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same.

Similar presentations

Presentation on theme: "Joke Daems Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Joke Daems Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same.

Similar presentations

Presentation on theme: "Joke Daems Supervised by: Lieve Macken, Sonia Vandepitte, Robert Hartsuiker Two sides of the same."— Presentation transcript:

Similar presentations

About project

Feedback