Presentation is loading. Please wait.

Presentation is loading. Please wait.

Machine Translation MÖSG vt 2004 Anna Sågvall Hein.

Similar presentations


Presentation on theme: "Machine Translation MÖSG vt 2004 Anna Sågvall Hein."— Presentation transcript:

1 Machine Translation MÖSG vt 2004 Anna Sågvall Hein

2 @Anna Sågvall Hein, MÖSG 2004 Can computers translate? Not a simple yes or no depends on the text the purpose of the translation the required quality

3 @Anna Sågvall Hein, MÖSG 2004 Classical problems with MT unrealistic expectations bad translations difficulties in integrating MT in the work flow –the Ericsson case

4 @Anna Sågvall Hein, MÖSG 2004 What is MT proper? To be considered as MT, a system should provide mininally correct morphology minimal syntactic processing minimal semantic processing handle and produce full sentences Hutchins, J., 2000, The IAMT Certification initiative and defining translation system categories (http://ourworld.compuserve.com/homepages/WJHut chins/IAMTcert.htm)http://ourworld.compuserve.com/homepages/WJHut chins/IAMTcert.htm

5 @Anna Sågvall Hein, MÖSG 2004 Basic translation strategies direct translation transfer-based translation statistical translation combined strategies

6 @Anna Sågvall Hein, MÖSG 2004 Direct translation, 1 no intermediary sentence structure the most important language component is a translation dictionary translation proceeds mostly word by word, or phrase by phrase translation problems are handled more or less case by case by means of specific rules

7 @Anna Sågvall Hein, MÖSG 2004 Direct translation, 2 quality –typically browsing quality –depends on the quality of the translation dictionary the coverage of the translation rules –editing quality may be achieved problems with –ambiguity –inflection –word order –structural differences

8 @Anna Sågvall Hein, MÖSG 2004 Advanced classical approach (Tucker 1987) source text dictionary lookups and morphological analysis identification of homographs identification of compounds identification of nouns and verb phrases processing of idioms

9 @Anna Sågvall Hein, MÖSG 2004 Advanced approach, cont. processing of prepositions subject-predicate identification syntactic ambiguity identification synthesis and morphological processing of target text rearrangement of words and phrases in target text

10 @Anna Sågvall Hein, MÖSG 2004 Feasibility of the direct translation strategy Is it possible to carry out the direct translation steps as suggested by Tucker with sufficient precision without relying on a sentence grammar and an intermediary structure?

11 @Anna Sågvall Hein, MÖSG 2004 SYSTRAN SYStem TRANslation developped in the US by Peter Toma first version 1969 (Ru-En) EC bought the rights of Systran in 1976 Systran SA, France, is the current owner of the rights of Systran currently 18 language pairs, excl. Swedish Swedish-->English is being introduced, starting in June 2004 (http://babelfish.altavista.com/)

12 @Anna Sågvall Hein, MÖSG 2004 Systran, cont. more than 1,600,000 dictionary units 20 domain dictionaries daily use by EC translators, administrators of the European institutions originally a direct translation strategy –see H&S to-day more of a transfer-based strategy

13 @Anna Sågvall Hein, MÖSG 2004 Ex. 1: fairly good translation /Systran sv-en "Enskilda företagare som inte bildat bolag klassificeras hit." "Individual entrepreneurs that have not formed companies are classified here.” Systemet har känt igen bildat som en perfektform och översätter tempusformen korrekt have formed med negationen not på rätt plats.

14 @Anna Sågvall Hein, MÖSG 2004 Ex. 2: word order problem/ Systran sv-en "När byarna kontaktades hade de inte ens utsatts för influensa." "When the villages were contacted had they not even been exposed to flu.” Systemet har inte hittat subjekt och predikat och ger därför fel ordföljd.

15 @Anna Sågvall Hein, MÖSG 2004 Ex. 3: ambiguity problem/ Systran sv-en "Vad kan vi lära av Arrawetestammen?" "What can we faith of the Arawete?” Systemet hittar inte sambandet mellan kan och lära och ser därför inte att lära är ett verb.

16 @Anna Sågvall Hein, MÖSG 2004 Ex. 4: ambiguity problem/ Systran sv-en ”Extrapoleringen går till så här. " ”The extrapolation goes to so here.” Systemet känner inte till partikelverbet känna till och översätter därför felaktigt ord för ord.

17 @Anna Sågvall Hein, MÖSG 2004 Motivations for transfer-based translation lexical ambiguity structural differences See further Ingo 91 (6), Wikholm (89)

18 @Anna Sågvall Hein, MÖSG 2004 Transfer-based translation,1 intermediary sentence structure provides a basis for the systematic handling of grammatical problems and lexical choices basic processes –analysis –transfer –generation (synthesis)

19 @Anna Sågvall Hein, MÖSG 2004 Transfer-based translation, 2 knowledge-intensive language modules –dictionary and grammar of source language –transfer dictionary and transfer rules –dictionary and grammar of target language

20 @Anna Sågvall Hein, MÖSG 2004 Multra transfer-based translation engine high quality focus on restricted domains developped at Uppsala University

21 @Anna Sågvall Hein, MÖSG 2004

22 Multra formalisms intermediary structure –feature structure grammatical function & constituency analysis grammar –procedural transfer –unification based (Beskow 93) synthesis –PATR-like style (Beskow 93)

23 @Anna Sågvall Hein, MÖSG 2004 Simplistic approach sentence splitting tokenisation handling capital letters dictionary look-up and lexical substitution copying unknown words, digits, signs of punctuation etc. formal editing

24 @Anna Sågvall Hein, MÖSG 2004 Ex. 1: Multra Sv. I oljefilterhållaren sitter en överströmningsventil.  En. The oil filter retainer has an overflow valve. (from the Scania corpus) sitter  has adv  subj subj  obj

25 @Anna Sågvall Hein, MÖSG 2004 Ex. 2 Sv. Fyll på olja i växellådan.  En. Fill gearbox with oil. (from the Scania corpus) fyll på  fill obj  adv adv  obj

26 @Anna Sågvall Hein, MÖSG 2004 Ex. 3: Multra Detta filter ska bytas med jämna mellanrum.  This filter must be renewed at regular intervals. Lexical choices in the context ska - must byta –renew med - at jämna – regular mellanrum - interval

27 @Anna Sågvall Hein, MÖSG 2004 Ex. 4: Multra Beskrivningen gäller för automatväxellådor med beteckning ZF 4/HP500, 590 och 600.  The description applies to automatic gearboxes with the designations ZF 4/5HP500, 590 and 600. gäller – applies to beteckning – the designations

28 @Anna Sågvall Hein, MÖSG 2004 Feasibility of machine translation Re-use of translations Quality in relation to purpose Sublanguage Spell checked and grammar checked SL Controlled language Human machine interaction Evalution data and criteria

29 @Anna Sågvall Hein, MÖSG 2004 Re-use of previous translations translation memories translation dictionaries statistical machine translation

30 @Anna Sågvall Hein, MÖSG 2004 Re-use techniques,1 sentence alignment –linking source and target sentences pairwise –success rate close to 100 % –translation memories

31 @Anna Sågvall Hein, MÖSG 2004 Re-use techniques, 2 word alignment –linking sub-sentence segments, typically, source and target words and phrases pairwise –large-scale processing –success rate close to 80 % –translation dictionaries –statistical machine translation

32 @Anna Sågvall Hein, MÖSG 2004 A word alignment example Jag tar mittplatsen, som jag inte tycker om. I take the middle seat, which I dislike. jag – I tar – take mittplatsen – the middle seat som – which jag – I inte tycker om – dislike (from Tiedemann 2003)

33 @Anna Sågvall Hein, MÖSG 2004 Statistical machine translation large scale word alignment –raw translation dictionary direct translation using the dictionary –no translation rules smoothing the translation by means of a language model –statistically based decoding algorithm cruical arabic – english hindi - english

34 @Anna Sågvall Hein, MÖSG 2004 Quality publishing quality –high quality translation, good enough for publishing, typically, after inspection and minor editing browsing quality –low quality translation, comprehensible, typically, not good enough for editing and publishing, may contain grammatical errors, errors in word order, and wrong words

35 @Anna Sågvall Hein, MÖSG 2004 Translation purposes translation –publishing quality browsing –browsing quality gisting –browsing quality drafting –publishing/browsing quality? cross-language information retrieval –browsing quality

36 @Anna Sågvall Hein, MÖSG 2004 MT as a cross-language communication tool MT is used not only for pure translation purposes but also for writing in a foreign language and for browsing (Hutchins 2001) Hutchins, J., 2001, Towards a new vision for MT, Introductory speech at MT Summit VIII conference, 18-22 September 2001 (http://ourworld.compuserve.com/homepages/ WJHutchins/MTS-2001.htm)http://ourworld.compuserve.com/homepages/ WJHutchins/MTS-2001.htm

37 @Anna Sågvall Hein, MÖSG 2004 Restrictions on the input language –sublanguage text type domain – controlled language – spell checked – grammar checked

38 @Anna Sågvall Hein, MÖSG 2004 Typically general language – browsing quality restricted language – high quality

39 @Anna Sågvall Hein, MÖSG 2004 Spell checking and grammar checking If there are spelling errors or typos in the SL dictionary search will fail If there are grammatical errors in the SL grammatical analysis will fail Where and how should spell and grammar checking be accounted for? Before or during the process?

40 @Anna Sågvall Hein, MÖSG 2004 Controlled language controlled vocabulary –full lexical coverage, e.g. Scania Swedish controlled grammar –full grammatical coverage language checker –e.g. Scania Checker

41 @Anna Sågvall Hein, MÖSG 2004 Human intervention before –language checking during –e.g. ambiguity resolution after –post-editing

42 @Anna Sågvall Hein, MÖSG 2004 Evaluation of MT coverage (recall) quality (precision)

43 @Anna Sågvall Hein, MÖSG 2004 Current trends in direct translation re-use of translations –translation memories of sentences and sub-sentence units such as words, phrases and larger units –example-based translation –statistical translation Will re-use of translations overcome the problems with the direct translation approach that were discussed above? If so, how can the problems be handled?

44 @Anna Sågvall Hein, MÖSG 2004 Why machine translation? cheaper faster more consequent when it succeeds..

45 @Anna Sågvall Hein, MÖSG 2004 Assignment: Hable Con Ella (en-sv) Make a general quality assessment of the translation. Suggest a possible use of a translation of this kind. Identify the steps that were taken in the translation. Specify the translation errors that were made and discuss them. Suggest improvements in the framework of the direct translation strategy. Motivate them. Formalise them in a framework of your own choice. Discuss their general adequacy in the translation of Swedish to English.


Download ppt "Machine Translation MÖSG vt 2004 Anna Sågvall Hein."

Similar presentations


Ads by Google