Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky

Similar presentations


Presentation on theme: "Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky"— Presentation transcript:

1 Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky
Thanks to Bonnie Dorr for some of these slides!! 11/18/2018

2 Outline for MT Week Intro and a little history
Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 11/18/2018

3 What is MT? Translating a text from one language to another automatically. 11/18/2018

4 Machine Translation Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears come As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry. 11/18/2018

5 Machine Translation The Story of the Stone
=The Dream of the Red Chamber (Cao Xueqin 1792) Issues: Breaking up into words Breaking up into sentences Zero-anaphora Penetrate -> penetrated Bamboo tip plaintain leaf -> bamboos and plantains Curtain -> curtains of her bed Rain sound sigh drop -> insistent rustle of the rain 11/18/2018

6 What is MT not good for? Really hard stuff Really important stuff
Literature Natural spoken speech (meetings, court reporting) Really important stuff Medical translation in hospitals, 911 11/18/2018

7 What is MT good for? Tasks for which a rough translation is fine
Web pages, Tasks for which MT can be post-edited MT as first pass “Computer-aided human translation Tasks in sublanguage domains where high-quality MT is possible 11/18/2018

8 Sublanguage domain Weather forecasting
“Cloudy with a chance of showers today and Thursday” “Low tonight 4” Can be modeling completely enough to use raw MT output Word classes and semantic features like MONTH, PLACE, DIRECTION, TIME POINT 11/18/2018

9 MT History 1946 Booth and Weaver discuss MT at Rockefeller foundation in New York; idea of dictionary-based direct translation 1949 Weaver memorandum popularized idea 1952 all 18 MT researchers in world meet at MIT 1954 IBM/Georgetown Demo Russian-English MT lots of labs take up MT 11/18/2018

10 History of MT: Pessimism
1959/1960: Bar-Hillel “Report on the state of MT in US and GB” Argued FAHQT too hard (semantic ambiguity, etc) Should work on semi-automatic instead of automatic His argument Little John was looking for his toy box. Finally, he found it. The box was in the pen. John was very happy. Only human knowledge let’s us know that ‘playpens’ are bigger than boxes, but ‘writing pens’ are smaller His claim: we would have to encode all of human knowledge 11/18/2018

11 History of MT: Pessimism
The ALPAC report Headed by John R. Pierce of Bell Labs Conclusions: Supply of human translators exceeds demand All the Soviet literature is already being translated MT has been a failure: all current MT work had to be post-edited Sponsored evaluations which showed that intelligibility and informativeness was worse than human translations Results: MT research suffered Funding loss Number of research labs declined Association for Machine Translation and Computational Linguistics dropped MT from its name 11/18/2018

12 History of MT 1976 Meteo, weather forecasts from English to French
Systran (Babelfish) been used for 40 years 1970’s: European focus in MT; mainly ignored in US 1980’s ideas of using AI techniques in MT (KBMT, CMU) 1990’s Commercial MT systems Statistical MT Speech-to-speech translation 11/18/2018

13 Language Similarities and Divergences
Some aspects of human language are universal or near-universal, others diverge greatly. Typology: the study of systematic cross-linguistic similarities and differences What are the dimensions along with human languages vary? 11/18/2018

14 Morphological Variation
Isolating languages Cantonese, Vietnamese: each word generally has one morpheme Vs. Polysynthetic languages Siberian Yupik (`Eskimo’): single word may have very many morphemes Agglutinative languages Turkish: morphemes have clean boundaries Vs. Fusion languages Russian: single affix may have many morphemes 11/18/2018

15 Syntactic Variation SVO (Subject-Verb-Object) languages SOV Languages
English, German, French, Mandarin SOV Languages Japanese, Hindi VSO languages Irish, Classical Arabic SVO lgs generally prepositions: to Yuriko VSO lgs generally postpositions: Yuriko ni 11/18/2018

16 Segmentation Variation
Not every writing system has word boundaries marked Chinese, Japanese, Thai, Vietnamese Some languages tend to have sentences that are quite long, closer to English paragraphs than sentences: Modern Standard Arabic, Chinese 11/18/2018

17 Inferential Load Some languages require the hearer to do more “figuring out” of who the various actors in the various events are: Japanese, Chinese, Other languages are pretty explicit about saying who did what to whom. English 11/18/2018

18 Inferential Load (2) All noun phrases in blue do not appear
in Chinese text … But they are needed for a good translation 11/18/2018

19 Lexical Divergences Word to phrases: POS divergences
English “computer science” = French “informatique” POS divergences Eng. ‘she likes/VERB to sing’ Ger. Sie singt gerne/ADV Eng ‘I’m hungry/ADJ Sp. ‘tengo hambre/NOUN 11/18/2018

20 Lexical Divergences: Specificity
Grammatical constraints English has gender on pronouns, Mandarin not. So translating “3rd person” from Chinese to English, need to figure out gender of the person! Similarly from English “they” to French “ils/elles” Semantic constraints English `brother’ Mandarin ‘gege’ (older) versus ‘didi’ (younger) English ‘wall’ German ‘Wand’ (inside) ‘Mauer’ (outside) German ‘Berg’ English ‘hill’ or ‘mountain’ 11/18/2018

21 Lexical Divergence: one-to-many
11/18/2018

22 Lexical Divergence: lexical gaps
Japanese: no word for privacy English: no word for Cantonese ‘haauseun’ or Japanese ‘oyakoko’ (something like `filial piety’) English ‘cow’ versus ‘beef’, Cantonese ‘ngau’ 11/18/2018

23 Event-to-argument divergences
English The bottle floated out. Spanish La botella salió flotando. The bottle exited floating Verb-framed lg: mark direction of motion on verb Spanish, French, Arabic, Hebrew, Japanese, Tamil, Polynesian, Mayan, Bantu familiies Satellite-framed lg: mark direction of motion on satellite Crawl out, float off, jump down, walk over to, run after Rest of Indo-European, Hungarian, Finnish, Chinese 11/18/2018

24 Structural divergences
G: Wir treffen uns am Mittwoch E: We’ll meet on Wednesday 11/18/2018

25 Head Swapping E: X swim across Y S: X crucar Y nadando
E: I like to eat G: Ich esse gern E: I’d prefer vanilla G: Mir wäre Vanille lieber 11/18/2018

26 Thematic divergence Y me gusto I like Y G: Mir fällt der Termin ein
E: I forget the date 11/18/2018

27 Divergence counts from Bonnie Dorr
32% of sentences in UN Spanish/English Corpus (5K) Categorial X tener hambre Y have hunger 98% Conflational X dar puñaladas a Z X stab Z 83% Structural X entrar en Y X enter Y 35% Head Swapping X cruzar Y nadando X swim across Y 8% Thematic X gustar a Y Y likes X 6% 11/18/2018

28 MT on the web Babelfish: 11/18/2018

29 3 methods for MT Direct Transfer Interlingua 11/18/2018

30 Three MT Approaches: Direct, Transfer, Interlingual
This slide from Bonnie Dorr! Original metaphor due to Bernard Vauquois Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Analysis Semantic Generation Semantic Transfer Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text 11/18/2018

31 The Transfer Model Idea: apply contrastive knowledge, i.e., knowledge about the difference between two languages Steps: Analysis: Syntactically parse Source language Transfer: Rules to turn this parse into parse for Target language Generation: Generate Target sentence from parse tree 11/18/2018

32 Transfer architecture
11/18/2018

33 English to French Generally English: Adjective Noun
French: Noun Adjective Note: not always true Route mauvaise ‘bad road, badly-paved road’ Mauvaise route ‘wrong road’) But is a reasonable first approximation Rule: 11/18/2018

34 Example: English to Japanese Transfer
Rule for Existential-there: delete “there” and convert 4th constituent to relative clause modifying the noun Rule for relative clauses: reverse the order of them Syntax is done: apply lexical transfer. 11/18/2018

35 English to Japanese Transfer
From “niqa no teire o suru ojiisan ita” Add “ga” to mark subject Chose verb to agree with subject Inflect verbs Linearize tree: Niwa no teire o shite ita ojiisan ga ita Garden GEN upkeep OBJ do PASTPROG old man SUBJ was “There was an old man gardening” 11/18/2018

36 E-to-J Transfer: rules used
Existential-There-Sentence There1 Verb2 NP3 Postnominal4 -> (NP -> NP3 Relative-Clause4) Verb2 NP -> Np1 Relative-Clause2 NP -> Relative-Clause2 NP1 11/18/2018

37 Lexical Transfer Man: Can treat like lexical ambiguity,
Ojisan ‘old man’ Man is the only linguistic animal -> Ningen ‘man, human being’ Or Hito ‘person, persons’ Can treat like lexical ambiguity, Disambiguate during parsing 11/18/2018

38 Transfer: some problems
N2 sets of transfer rules! Grammar and lexicon full of language-specific stuff Hard to build, hard to maintain 11/18/2018

39 MT Method 2: Interlingua
Intuition: Instead of lg-lg knowledge rules, use the meaning of the sentence to help Steps: 1) translate source sentence into meaning representation 2) generate target sentence from meaning. 11/18/2018

40 Interlingua for there was an old man gardening
EVENT: GARDENING AGENT: MAN NUMBER SG DEFINITENESS INDEF ASPECT: PROGRESSIVE TENSE: PAST 11/18/2018

41 Interlingua Idea is that some of the MT work that we need to do is part of other NLP tasks E.g., disambiguating E:book S:‘libro’ from E:book S:‘reservar’ So we could have concepts like BOOKVOLUME and RESERVE and solve this problem once for each language 11/18/2018

42 Vauqois diagram 11/18/2018

43 Direct Translation Idea: more robust, word-specific models
Start with a Source language sentence Write little transformations, directly on words, to turn it into a Target language sentence. 11/18/2018

44 Direct MT J-to-E Watashihatsukuenouenopenwojonniageta.
1. Morphological analysis Watashi h tsukue no ue no pen wo jon ni ageru PAST 2) lexical transfer of content words I ha desk no ue no pen wo John ni give PAST 3) various preposition work I ha pen on desk wo John to give PAST. 4) SVO rearrangements I give PAST pen on desk John to. 5) miscellany I give PAST the pen on the desk to John. 6) morphological generation I gave the pen on the desk to John. 11/18/2018

45 Direct MT stage 2, (ex. from Panov 1960 via Hutchins 1986)
Function direct-translate-much/many If preceding word is ‘how’ Return skol’ko Else if preceding word is ‘as’ Return skol’ko zhe Else if word is ‘much’ If preceding words is ‘very’; Return nil (not translated) Else if following word is a noun Return ‘mnogo’ Else /*word is many*/ If preceding word is PREP and following is NOUN Return ‘mnogii’ Else return ‘mnogo’ 11/18/2018

46 Three MT Approaches: Direct, Transfer, Interlingual
This slide from Bonnie Dorr! Original metaphor due to Bernard Vauquois Semantic Composition Semantic Decomposition Semantic Structure Semantic Structure Semantic Analysis Semantic Generation Semantic Transfer Syntactic Structure Syntactic Structure Syntactic Transfer Syntactic Analysis Syntactic Generation Word Structure Word Structure Direct Morphological Analysis Morphological Generation Source Text Target Text 11/18/2018

47 3 methods pros and cons Thanks to Bonnie Dorr! 11/18/2018

48 Direct MT: pros and cons (Bonnie Dorr)
Fast Simple Cheap No translation rules hidden in lexicon Cons Unreliable Not powerful Rule proliferation Requires lots of context Major restructuring after lexical substitution 11/18/2018

49 Interlingual MT: pros and cons (B. Dorr)
Avoids the N2 problem Easier to write rules Cons: Semantics is HARD Useful information lost (paraphrase) 11/18/2018

50 Summary Intro and a little history
Language Similarities and Divergences Four main MT Approaches Transfer Interlingua Direct Statistical Evaluation 11/18/2018


Download ppt "Lecture 9: Machine Translation (I) October 25, 2005 Dan Jurafsky"

Similar presentations


Ads by Google