1 Session 1 Advantages and Disadvantages of Translation Technology (TT) - Historical development of translation technology - Focus on TM and MT (Theory and Practice) Objectives: - To understand historical backgrounds to TT developments - To develop a balanced view on pros and cons of TT - To understand the basic principle behind TM and its difference from MT
2 (Japanese) Translator’s Life in New Zealand c.a Life without Life without - word-processing - word-processing - electronic dictionary - electronic dictionary - Google - Google - translation memory - translation memory Amazon.com - Amazon.com
3 Translation Paradigm (80s into early 90s) after-thought word-processing asynchronous text for paper-based circulation no engineering input work in isolation
4 Historical Developments of Translation Technology (TT) penetration of PCs - Desktop publishing (DTP) speech recognition PCs connected via modem telework Internet (Web) Sony PlayStation Google mobile phones -texting 1990s software localisation services localisation tools data-driven MT online term banks free WebMT (1997: Babelfish) web localisation services Translation Memory ICT Development TT Development
5 Translation Technology Continuum automation human involvement Automatic Translation Unaided Translation Computer-aided Translation (CAT) Translation process automated by use of Machine Translation Translation process aided by electronic tools such as Translation Memory Translation process not aided by any electronic tools Adapted from Hutchins & Somers (1992)
6 Machine Translation (MT) ………..Translation is a fine and exacting art, but there is much about it that is mechanical and routine, if this were given over to a machine, the productivity of the translator would not only be magnified but this work would become more rewarding, more exciting, more human.” Martin Kay (1987) “A computer is a device that can be used to magnify human productivity. Properly used, it does not dehumanize by imposing its own Orwellian stamp on the products of human spirit ………. Rationale for Technology Applications to Translation
7 Machine Translation (MT) MT research began in 1950’s – Warren Weaver’s 1949 Memo: “When I look at an article in Russian, I say: This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” (in Locke and Booth 1955:18)
8 Machine Translation Initially based on some misconception about human translation Initially based on some misconception about human translation - knowledge of two language systems suffice - it is a matter of looking up dictionaries - it is easy to define “a good translation” - there is only one correct translation possible - there is only one correct translation possible
9 Machine Translation (MT) MT history milestones (pre-ALPAC) 1954: Georgetown system demo successful translation of 49 Russian sentences into English : $50m spent in 20 research centres in USA 1966: Automatic Language Processing Advisory Committee (ALPAC) Report concludes: - MT was slower, less accurate and twice as expensive as Human Translation - there was no prospect of useful MT either immediately or in the future
10 Machine Translation (MT) MT history milestones (post-ALPAC) 1969 – privately funded projects Logos system (1969); Weidner-CAT (1977); ALPS (1980) 1975 – Météo project in Canada 1976 – European Commission acquires Systran 1979 – Eurotra project in Europe for Multilingual system 1980 – PC-based system 1990 – data-driven system; WebMT
11 Machine Translation (MT) 1975 Météo project in Canada Automatic translation of weather forecasts (En -> Fr) Sublanguage approach (domain-specific MT) Most successful MT application to date - public broadcasting since Fr -> En available since only 4% of output needs post-editing - rapid translation staff turnover no longer a problem
12 Machine Translation (MT) Technological factors Technological factors - prevalence of PC with improved processing power Translation market factors Translation market factors - official bilingualism/multilingualism create institutional needs - globalisation creates huge commercial needs Advances in computational linguistics Advances in computational linguistics More realistic user expectations More realistic user expectations Internet creates casual access to multilingual information Internet creates casual access to multilingual information Renewed interest in MT in 80s and 90s
13 Machine Translation (MT) MT Design Rule-based vs Data-driven Systems (SMT & EBMT) –Rule-based systems by far the more common Architecture for Rule-based Systems –Direct 1 st generation MT systems –Transfer –Interlingua 2 nd generation MT systems
14 Machine Translation (MT) MT Design Transfer-based Systems - based on systematic linguistic theory - convenient way of categorising linguistic problems (monolingual or contrastive) - modular - need detailed coding of monolingual and bilingual dictionaries and grammars - a dedicated transfer component is needed for each language pair, in each direction
15 Machine Translation (MT) transfer direct translation Source TextTarget Text analysis generation Interilingua
16 Machine Translation (MT) MT Design Data-driven systems: Statistical MT ( SMT) - linguistic knowledge not encoded - takes advantage of a bilingual parallel corpus to arrive at probable translations of each word - corpus-dependent - At run-time the best translation is searched (Carl & Way, 2003) e.g. IBM’s experiment with Canadian Hansard corpus: Candid (1988)
17 Machine Translation (MT) MT Design Statistical MT: Candid In Canadian Hansard (parliamentary debates of 40K sentences in each of en and fr) the le p=.610 (ie 610 times out of 1000) the lap=.178 the l’ p=.083 the les p=.023 the ce p=
18 Machine Translation (MT) MT Design Data-driven systems: Example-based MT ( EBMT) inspired by Nagao (1984) who talked about translation by analogy “Man does not translate a simple sentence by doing deep linguistic analysis, rather, man does translation, first, by properly decomposing an input sentence into certain fragmental phrases, then by translating these phrases into other language phrases, and finally by properly composing these fragmental translations into one long sentence. The translation of each fragmental phrase will be done by the analogy translation principle with proper examples as its reference”
19 Machine Translation (MT) MT Design Example-based MT ( EBMT) It operates on a bilingual corpus with alignments of translation units on word, phrase, and sentence level During runtime, the system checks whether an adequate translation is stored in the corpus Best results are obtained if large coherent parts are found in the corpus
20 Machine Translation (MT) MT Design Example-based MT ( EBMT) 1.He buys a book on international politics [ST]. 2.a. (E) He buys a notebook. (J) Kare wa noto o kau. He [topic] notebook[obj] buy. b. (E) I read a book on international politics. (J) Watashi wa kokusaiseiji nitsuite kakareta hon o yomu. I [topic]international politics about concerned book[obj] read 3. Kare wa kokusaiseiji nitsuite kakareta hon o kau [TT]. (Sato & Nagao, 1990)
21 EBMT Principle transfer direct translation Source TextTarget Text analysis generation matching exact match recombination (Somers, 2003:8) alignment
22 Translation Memory (TM) A database of aligned SL and TL segments (translation units) to allow the translator to: - propagate translations of internal repetitions in the source text through the target text - recycle translations for previously encountered source text segments (exact matches or fuzzy matches with some edits) - analyse new source texts for repetitions and matches with already translated texts stored in a translation memory
23 Translation Memory (TM) How it works: software segments source language (SL) text human translator translates an SL segment software stores the SL and the corresponding TL segment as a translation unit software checks an incoming SL segment against the stored SL segments and brings up a relevant translation unit in case of match translator determines whether or not to use or edit the previous translation called up by the software
24 Translation Memory (TM) Advantages: the translator can find out the degree of internal repetitions within SL text before translating sentence-level matches and similarities are automatically brought to the translator’s attention for re-use productivity boosted when the text type is suitable (ie repetitive, frequent updates, sim-ship etc) TM normally integrates concordance and terminology management units to assist consistency of use of words and terminology
25 Translation Memory (TM) Disadvantages: Previous errors contained in TM propagated: - the translator forgets to update TM - the translator asked not to change the existing translation in TM A ‘sentence salad’ phenomenon (Bédard, 2000) whereby creating a text less coherent or readable due to: - the translator confined to work on sentence-level - the translator trying to maximise the recyclability - TM consisting of varying texts translated by different translators (Bowker & Barlow, 2004) Similarities in form rather than semantic similarities picked up Potential de-skilling of the translator (Kenny, 2004)
26 Exercises on MT and TM Machine Translation Reverse engineering with Model Zero approach (Pérez-Ortiz & Forcada, 2001) This exercise is designed to allow you to make an intelligent guess about what goes on inside an MT system without looking inside (black box as opposed to glass box evaluation) Translation Memory Understanding how internal and external repetitions are processed by the system