Download presentation
Presentation is loading. Please wait.
Published byBeatrix Alexander Modified over 9 years ago
1
Machine translation Context-based approach Lucia Otoyo
2
Machine translation Computerized task of translating from one natural language to another Human vs. machine translation Difficulties of MT
3
Brief history of MT 17 th century Descartes & Leibniz 1930 bilingual dictionary + rules After war (Warren Wiewer)–decoding msg. 1954 – first public demonstration of MT IBM (spawned research) 1966 ALPAC – less accurate & more cost 1980 increasing demand, rule-based born 1990 parallel corpora approach
4
MT approaches Rule based Parallel corpora based Context based Conclusion
5
Rule Based approach Dominant in 1980 Resourses: Set of rules & bilingual dict. Steps: Syntax -grammar Semantics - meaning Pragmatics – difference btw. Lang. Disadvantages: - language experts for rules -new language pair - new rules -not possible to include all the rules -rules have exceptions MT diagram
6
Parallel corpora based Example based (word freq. & combination) Statistical (phrase extract. & combination) Resources: parallel corpora (pre-trans.), decoder, alignment software Steps: disassemble test into phrases, search the corpora and match phrases, substitute, align phrases to form text Advantages vs. Disadvantages - Easy to apply to new language -more readable as using human pre-translated text -General translation vs. Specific domain -Lexical ambiguity MT diagram
7
Context Based MT Target Language N-gram Connector Overlap-based decoder N-gram candidates Substitution request Stored n-gram pairs approved n-gram pairs Source Language N-gram segmenter Cache database Cross-language n-gram database Resources Bilingual dictionary Target corpora Source corpora Gazetteers N-gram builder Flooder Edge Locker Synonym generator MT diagram
8
CBMT edge illustration ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’ edge locking
9
CBMT n-grams Break down source text into n-grams(4-8) ‘This context based machine translation approach looks very interesting’. If ‘n’ = 4 then n-grams as follows: 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’
10
CBMT n-grams ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’
11
CBMT n-grams ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’
12
CBMT n-grams ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’
13
CBMT n-grams ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’
14
CBMT n-grams ‘This context based machine translation approach looks very interesting’. 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting’ diagram
15
CBMT Flooding Search the monolingual corpora with translated n-grams Produces large number of n-grams with different translations for each word words can be in any order, taking into account differences between languages each n-gram 100-3000 high density matches diagram
16
CBMT Target language lattice overlap maximization Align all the n-grams with each other choose the ones, with the highest number of left and right side overlaps Eliminate non or partially overlapping n-grams 1. n-gram ‘This approach for computer’ 2. n-gram ‘This context based machine’ 3. n-gram ‘based machine translation approach’ diagram
17
CBMT Cross language database stores cross language n-gram correspondences for later use to speed up the translation process diagram
18
CBMT target language Find globally longest target language overlap with the highest match density 1.‘This context based machine’ 2. ‘context based machine translation’ 3. ‘based machine translation approach’ 4. ‘machine translation approach looks’ 5. ‘translation approach looks very ’ 6. ‘approach looks very interesting ‘This context based machine translation approach looks very interesting’. diagram
19
CBMT – synonymy Word and Phrasal Synonymy -increase accuracy if no or only partial overlaps found -dynamic synonyms, no predefined coded patterns Stages: 1.Search for the word in corpus(1000-100000 context related phrases) 1. ‘This establishment was founded in the year’ 2. ‘The number of people working in the establishment is far greater than’ 3. ‘The establishment is the first hotel’, etc
20
CBMT – synonymy cont. 2. Search the corpus only with the phrases 1. ‘This ________ was founded in the year’ 2. ‘The number of people working in the _______ is far greater than’ 3. ‘The ________ is the first hotel’, etc 3. This may return: 1. ‘This company was founded in the year’ 2. ‘The number of people working in the business is far greater than’ 3. ‘The institution is the first hotel’, etc 4. Rank synonyms according to various criteria and flood diagram
21
CBMT Edge locking First and last words only confirmed by overlap once or few times search for other source sentences, where first & last words in original n-gram also in middle of newly found n- gram this confirms suitability within a particular context Use also for words around interior punctuation illustration diagram
22
CBMT Target corpora monolingual Very large (50GB – 1 TB) The bigger the more accurate translation Easy to obtain from the web diagram
23
CBMT Bilingual dictionary Very large The bigger the more accurate translation Usually widely available for most languages Used to translate the n-grams large number of n-grams different translations for each word Words can be in any order, taking into account differences between languages each n-gram 100-3000 high density matches diagram
24
Conclusion Can we? –Create a universal foundation for all languages –Eliminate the need for human translators –Solve the biggest obstacle in MT – ambiguity
25
Conclusion Can we? –Create a universal foundation for all languages –Eliminate the need for human translators –Solve the biggest obstacle in MT – ambiguity It does not seem so in the foreseeable future
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.