Trying to Standardinslation Quality – What Were They Thinking?

Alan K. Melby (FIT, LTAC Global, and BYU) Dace Dzeguze (TAUS) Arle Lommel (CSA Research)

Trying to Standardinslation Quality – What Were They Thinking?
Trying to Standardize Translation Quality – What Were They Thinking? ) Arle Lommel (CSA Research)

Session Overview – They were thinking they should build on the MQM project from QT21 within ASTM International What is ASTM WK46396? Staying on the road to one error typology Defining translation quality DQF-MQM error categories and severities Process for creating and applying MQM metrics Data collection Key takeaways Note: Assessment and evaluation can be contrasted but are not here

PART 1 What is ASTM WK46396?

A (soon to be) standard that defines
A taxonomy of translation errors A process to move from translation specifications to task-specific analytic metrics that share a common basis A scoring method to produce relevant numeric indications of translation quality Metrics relevant to any sort of translation as well as evaluation of source-text quality

It is not… A single, one-size-fits all metric An automatic approach
A reference-based score (à la BLEU) A holistic metric A complete solution to translation quality evaluation MQM metrics can be combined with holistic metrics in a procedure Translation quality evaluation fits into a quality management system WK46396 depends on ASTM F2575 and a W3C Community Group

PART 1 Staying on the Road

Staying on the road to one error typology with ASTM

Roads that went elsewhere
SAE J2450 (released in 2001) Pro: maintained by a standards body Con: too narrow LISA QA (version 3.1 released in 2006) Pro: more flexible (perhaps too flexible) Con: LISA slid off the road and crashed in 2011

New Roads: Which to Take?
TAUS DQF (Dynamic Quality Framework) Based on many metrics (incl. LISA QA) Initial study 2011 DFKI MQM (Multidimensional Quality Metrics) Based on many metrics (incl. LISA QA) From QT Launchpad

Harmony! DQF and MQM error typologies were harmonized under QT21 in 2015 Three person team from TAUS, DFKI & LTAC DQF-MQM is the the short name for the TAUS subset of the large MQM error typology

Scare but ASTM to the rescue
QT21 Project was scheduled to end by January 2018; parallel to LISA’s end? TAUS and DFKI agreed to provide IP to ASTM International for long-term maintenance

Involvement in ASTM DKFI rep: Aljoscha Burchardt TAUS rep: David Koot
Working Group Chair: Arle Lommel (CSA) Others on editorial team from government and industry YOU can join ASTM as an individual or a company (

PART 3 Defining Translation Quality

Defining Translation Quality
A quality translation demonstrates accuracy and fluency required for the audience and purpose and complies with all other specifications negotiated between the requester and provider, taking into account requester and end-user needs.

What do we mean by “specifications”
Defined in ASTM F2575 – 2014 – Section 8 Cover all aspects of translation projects Linguistic Work Product Process Project Environment Relationships Focus today is on Linguistic Work Product Source-content information Target-text requirements

Source-Content Information
textual characteristics source language text type audience purpose specialized language subject field terminology volume complexity origin

Target-Content Requirements
target language requirements target language (locale) target terminology audience purpose content correspondence register format style style guide style relevance layout

DQF-MQM Error Typology
PART 4 DQF-MQM Error Typology

Error Typology One of the core ideas in the DQF-MQM framework is the categorization of issues according to error types. DQF-MQM uses a categorization system with multiple layers of granularity. This model provides a method to identify an issue in a translation project as a certain error type. Subsequently, it allows to evaluate the translation in a more analytical way than what could be described as a holistic way of quality evaluation. Without reference to specific errors and the type and severity of the errors, quality evaluation tends to be subjective and has very limited and only anecdotal value.

Accuracy

Design

Fluency

Internationalization

Locale convention

Verity

Terminology … and zooming in even further, first to the left, and then to the right …

DQF-MQM Process for creating and applying metrics
PART 5 DQF-MQM Process for creating and applying metrics

Creating and applying metrics
1 State specifications Including audience and purpose. See Translation Parameter and Overt-Covert handouts. Specs instead of quality levels in MQM 2 Select relevant dimensions for quality evaluation. (Terminology, Fluency, Accuracy, etc. in blue in the DQF subset below) 3 Finish QE system Complete the metric by selecting fine-graining error types. See scoring card handout for evaluation stages. Train the evaluators to use metric. Conduct evaluations. Then improve QE system as needed.

Creating specifications
Not required every time a metric is used Needed for new metrics (create templates for similar projects) Provides a way to document and share requirements between stakeholders

Selecting relevant dimensions and error types
Draw from DQF-MQM where possible; otherwise use full MQM Tie error types to specifications E.g., if style relevance is high, check Style E.g., for transcreation, you would need to check Verity, but probably would not for support content for a consumer device Each each error type, determine what it is checking in the specifications (may apply to multiple areas) Aim for a minimal set of error types, but be granular enough to meet your needs

Training Not enough to give names of error categories and definitions
Provide guidance (e.g., decision trees, sample tasks and results) Testing of evaluators is essential Ongoing as new issues arise

Validity Validity is a property of a metric
Are you measuring what you think you are? Example: Measuring words/hour is not a valid quality metric Example: A quality measure that doesn’t account for the length of the document won’t help you evaluate quality Requires verification against independent criteria (e.g., user satisfaction relative to requirements, diversion of support calls, sales conversions…)

Reliability Reliability is a property of an evaluation system, not a metric Two types of reliability: Inter-rater reliability: Do multiple trained and competent evaluators obtain the same result within reasonable tolerance? Intra-rater reliability: Does the same evaluator consistently produce the same, expected result? Testing based on evaluating translations that have been previously evaluated by experts (comparing results), to make sure a set of new evaluators aren’t all wrong in same way Reasonable tolerance: >0.7 (Cohen’s Kappa)

PART 6 Data Collection

Benchmarking Assumes reliable systems
Requires consistent use of metadata (DQF API provides this) Needs stability of systems over time Requires collective agreement On framework (DQF:MQM) On specific metrics (e.g., for medical information leaflets, automotive service manuals) ASTM WK46396 will provide the framework for benchmarking Proper evaluator training is needed to know if the results are comparable across organizations, that is, for benchmarking

PART 7 Key Takeaway: They (QT21) were thinking that MQM needed a home and found one: ASTM International (

Key takeaways (based on ASTM WK46396)
Translation quality is well-defined but not static General quality management principles apply to translation Translation specifications matter MQM provides a standardized yet flexible way to evaluate translation quality analytically Collect data and start benchmarking (use DQF API integration when feasible; online MQM scorecard also available) Test for validity and reliability and train evaluators Get involved in the ASTM effort and the TAUS DQF community

Trying to Standardinslation Quality – What Were They Thinking?

Similar presentations

Presentation on theme: "Trying to Standardinslation Quality – What Were They Thinking?"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Trying to Standardinslation Quality – What Were They Thinking?

Similar presentations

Presentation on theme: "Trying to Standardinslation Quality – What Were They Thinking?"— Presentation transcript:

Similar presentations

About project

Feedback