Evaluating Translation Memory Software Francie Gow MA Translation, University of Ottawa Translator, Translation Bureau, Government of Canada
Motivation for Research Volume of translation work increasing Machine translation not yet ready to meet new demand in a significant way Translators increasingly turning to translation memory (TM) software to increase productivity Available tools are difficult to compare
What is Translation Memory? Translation support software Allows users to recycle repetitive translation material Compares segments of new text with source material in the database If matches are found, retrieves corresponding target text and inserts it into new document
Automatic Search and Retrieval: Two Approaches Sentence-based approach –Example: TRADOS Character-string-within-a-bitext-based approach (CSB-based approach) –Example: MultiTrans
Two Approaches to TM Evaluation Primarily objective approach –Edit distance Primarily subjective approach –Human rating systems
Edit Distance Definition: smallest number of insertions, deletions, and substitutions required to change one string […] into another –National Institute of Standards and Technology
Edit Distance Advantages –Programmable –Once algorithm is developed, evaluation is fast and inexpensive Disadvantages –Loose approximation of usefulness –Definition of edit distance vague and variable –Assumes model translation
Human Rating Systems Example of a rating system (Sato, 1990) –(A) exact match –(B) “the example provides enough information about the translation of the whole input” –(C) “the example provides information about the translation of the whole input” –(F) “the example provides almost no information about the translation of the whole input”
Human Rating Systems Advantage –More valid than computer-generated results Disadvantages –Time consuming –Applicable to sentences, but not to mixed- language output of MultiTrans –Human is influenced by proposals
An evaluation system should be: reliable valid efficiently applicable –EAGLES Evaluation of Natural Language Processing Systems: Final Report
Measuring Usefulness Usefulness is a function of –Validity –Time Gain –Time Loss
New Evaluation Methodology Construction of a corpus in both tools Analysis and mark-up of new texts Processing of new texts in both tools Application of scores Analysis of scores
Conclusions Resulting methodology produces valid and reliable results Not as efficiently applicable as an edit distance algorithm, but highly customizable to a variety of translation contexts Applicable to any combination of tools, determines which is best for a given job