Download presentation
Presentation is loading. Please wait.
Published byLeonard Morris Modified over 9 years ago
1
A Syntactic Translation Memory Vincent Vandeghinste Centre for Computational Linguistics K.U.Leuven vincent@ccl.kuleuven.be
2
Translation In a globalizing world there is a growing demand for translation Multilingual websites frequent updates user content Shortage of human translators Speed up the translation process by means of TECHNOLOGY
3
Tools of the traditional translator Dictionaries general dictionaries domain specific Grammar, spelling Traditional filing system Reference material similar documents previous versions of the same document
4
Tools of the contemporary translator Similar to traditional translator, but electronically Online dictionaries, dictionaries on CD-ROM Word processors with spell and grammar checkers Files with client specific terminology (Excel) Reference material (PDF)
5
Tools of the contemporary translator CAT-Tools (Computer Aided Translation) Terminology tools keep track of, list and recognize terms Concordance Tools look up text in a corpus of translated documents Translation Memories
6
Translation Memories A dictionary of sentences Segmentation problem: what is a sentence? punctuation, capitalization, lay-out Retrieval of sentences Same sentence = 100% match Sometimes even better: whole paragraphs What counts as a difference? spaces, capitals, punctuation, lay-out Similar sentence = fuzzy match How much difference? “Edit distance” Edit distance is measured in characters/words
7
Examples of fuzzy matches A cat sat on the mat= original A dog sat on the groundED = 66% (w) 60% (c) => 65% The cat sits on a matED = 50% (w) 53% (c) => 50% On the mat sat a catED = 50% (w) 53% (c) => 50% Cats are sitting on the matsED = 33% (w) 33% (c) => 33% Issues: no sentence gets the threshold of 70% every word counts the same word variants are not recognized ED = 90% word diff + 10 % char diff
8
Machine Translation as a Tool When is MT a useful tool for the translator? respect client specific terminology translations are comparable with matches from translation memory consistency speed
9
Types of MT Rule-based – using syntax / linguistic knowledge – transformations on tree structures / interlingua – hand-made rules vs. induced rules Statistical – using no syntax, no linguistics – transformations on strings (flat) – induced translation models and language models
10
Parse and Corpus-based MT Syntactic machine translation Rule-based Data-driven: rules are induced from parallel corpus – general domain, publicly available (including Dutch) Proceedings European Parliament Translation Memory of DG-Translation (EU) OPUS corpus (subtitles, Open source manuals...) – private translation memories
11
PaCo-MT match source analysis with database of translation rules – when match is found, translation is found – when no match is found, recursively try smaller subtree, and use back-off models to connect translations of found subtrees back-off models have a lower accuracy
12
PaCo-TM match source analysis with database of translation rules – when match is found, translation is found – when no match is found, recursively try smaller subtree, and use human judgement to connect translations of found subtrees human judgement = translator judgement
13
PaCo-TM uses existing translations as source material (client specific) specific terminlogy the more translations RESEMBLE translations in the source material, the better the translation syntactic analysis of source and target sentences categorization of words: verbs, adjectives, nouns... word clusters: noun phrase, infinitival phrase lemmatization
14
Examples of syntactic fuzziness RESEMBLE = syntactical resemblance Fuzzy: not in characters or words A cat sat on the mat= original A dog sat on the ground= same syntax / insert translation of 'dog' The cat sits on a mat= same syntax / present tense / indef. mat On the mat sat a cat= same syntax / different order Cats are sitting on the mats = different structure / different phrases / different number for subject and object
15
Translation memories vs. PaCo-MT Translation memory offers translator information Translator makes choices Inflexible fuzzy matching PaCo-MT uses information PaCo-MT makes choices Syntactic flexibility in fuzzy matching Confidence metrics can be used as threshold
16
Conclusions Flexibility of translation memories can be improved syntactic replace words (partial MT) Translator can set thresholds depending on amount of domain data depending on language pair Continuum full manual translation memory full machine translation
17
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.