Download presentation
Presentation is loading. Please wait.
1
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003
2
Outline Machine Translation Definition Applications Algorithms Language Types Algorithms Transfer method Inter-lingua Direct translation Statistical Machine Translation
3
Definitions Machine Translation Automatically translate text from one language to another Speech-to-Speech Translation Speech Recognition, Machine Translation, Text to Speech
4
Machine Translation: Applications Rough translation, e.g., Systran Babel Fish Computer-aided human translation Post-editing by human Multilingual systems in limited domains, e.g., Dialogue systems: high quality speech to speech translation Information kiosks Cross-language information retrieval
5
Machine Translation: Algorithms Transfer Model Maps languages at the syntactic parse tree level Maps languages at the syntactic parse tree level Interlingua Maps language via a common new virtual language Maps language via a common new virtual language Direct Translation Maps language at the lexical, phrase and shallow syntactic level Statistical Machine Translation Statistical Direct Translation models Models fluency of generated text and faithfulness of translation
6
Language Types Universal features Syntactic classes: noun, verbs Words have common meaning Language typology Morphological: isolating (one morpheme per word) - polysynthetic agglutinative (clean morpheme boundaries) - fusion languages Syntactical: SVO (subject-verb-object) SOV (e.g., Hindi, Japanese) VSO (e.g., Irish)
7
Other Language Differences Morphology richness: ENG The dog belongs to the tall child GR Ο σκύλος είναι του ψηλού παιδιού Use of articles: ENG I am runing GR Τρέχω Lexical differences (aka lexical gap)lexical gap ENG older elder GR μεγαλύτερος Format differences, e.g., dates, numbers
8
Transfer Models Three stages Analysis: syntactic parse (ambiguity might not be a problem) Transfer: syntactic transformation rules Transfer Generation: lexical and syntactic transfer Lexical transfer Function words are syntactically transferred, i.e., part of the rules Content words are lexically transferred
9
Transfer Model: Example Example (english to french): ENG bad road adjective: bad noun: road (analysis) noun: road adjective: bad(transfer) noun: route adjective: mauvaise (generation) FR route mauvaise Altavista Babel Fish (Systran): http://babelfish.altavista.com/ http://babelfish.altavista.com/ ENG: bad road FR: mauvaise route (wrong road) ENG: wrong road FR: mauvaise route (wrong road)
10
Inter-lingua Motivation: Translation between n languages n(n-1)/2 pairs of rules With inter-lingua 2n pairs Assertion: There is a common semantic representation across languages Algorithm: Algorithm Perform syntactic analysis Perform semantic analysis in inter-lingua ontology representation Perform semantic analysis in inter-lingua ontology representation Generate syntactic tree in new language Generate surface form using lexical transfer Problem: inter-lingua is often english! Useful for small domains
11
Direct Translation Motivation Syntactic and (especially) semantic parsing often fails Inter-lingua and transfer models hard to build Direct translation algorithm Morphological analysis Lexical transfer of content words Various work relating to prepositions SVO re-arrangements Miscellany Morphological generation Example: Japanese to EnglishJapanese to English
12
Direct Translation: Overall A realistic approach Uses syntax/semantics as needed Robust (island) parsing Shallow parsing Works only for language pairs Can be extended with (e.g.) English as inter-lingua
13
Statistical Machine Translation Motivation Why write rules? Machine learning techniques can do the job for you Requirement Large bi-lingual (parallel) corpora Typically alignment required at the sentence level Baysesian Formulation (Brown et al, 1993)
14
Statistical Machine Translation Step 1: Preprocessing (manual or semi-automatic) Clean parallel corpus Segment at the sentence level Step 2: Alignment (automatic) ENG: And the program has been implemented GR: Tο πρόγραμμα τέθηκε σε εφαρμογή The(1) program(2) has(3) been(3) implemented(3,4,5) The(1) program(2) has(3,4,5) been(3,4,5) implemented(3,4,5) Step 3: Translation Models: Pr(G,A,E) G and E are the Greek and English strings A is a random alignment between them
15
Statistical Machine Translation Models (Brown Model 1, 2, 3,4, 5): Greek g and English e strings Number of words in Greek string m Alignment of j the word is aj
16
Statistical Machine Translation Best translation Faithfulness Fluency Example-based machine translation Ability to store phrases into bi-lingual dictionary Translation memory Systran and most translation houses use this
17
Evaluation Edit cost Distance between standard (human-produced) and machine- generated translation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.