Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing.

Similar presentations


Presentation on theme: "Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing."— Presentation transcript:

1 Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing Lecture #33: Intro. To Machine Translation This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative Commons Attribution-Share Alike 3.0 Unported License

2 Announcements  Reading Report #13: M&S ch. 13 on alignment and MT  Due now; discussing at end of lecture today or on the group  Homework 0.3 Feedback  Question one did not contribute to your grade  Compare with the key  Homework 0.4  Posted Tuesday

3 Final Project  Project #4  Note the updates to the tutorial with the flowchart slides from lecture #29  Project #5  Instructions to be updated today  Help session: Tuesday  Propose-your-own  Move forward  Feedback to be sent today  Project Report:  Early: Wednesday after Thanksgiving  Due: Friday after Thanksgiving  Check the schedule  Plan enough time to succeed!

4 Quiz – keep the ideas fresh 1.What are the four steps of the Expectation Maximization (EM) algorithm?  Think of the document clustering example, if that helps 2.What is the primary purpose of EM?

5 Objectives  Introduce the problem of machine translation  Appreciate the need for alignment in statistical approaches to translation

6 Machine Translation is Hard REF: According to the data provided today by the Ministry of Foreign Trade and Economic Cooperation, as of November this year, China has actually utilized 46.959 billion US dollars of foreign capital, including 40.007 billion US dollars of direct investment from foreign businessmen. the Ministry of Foreign Trade and Economic Cooperation, including foreign direct investment 40.007 billion US dollars today provide data include that year to November china actually using foreign 46.959 billion US dollars and today’s available data of the Ministry of Foreign Trade and Economic Cooperation shows that china’s actual utilization of November this year will include 40.007 billion US dollars for the foreign direct investment among 46.959 billion US dollars in foreign capital IBM4: Yamada & Knight:

7 But MT is Real http://www.microsofttranslator.com/ http://translate.google.com/

8 Why so hard?  What makes translation so hard?

9 Problem: Non-Literal Translation Un train s'est également arrêté sans qu'aucun passager ne soit blessé. Injuries were also avoided by the automatic shutdown of a train.

10 History  1950’s: Intensive research activity in MT  Roll video …

11

12 History  1950’s: Intensive research activity in MT  Roll video …  1960’s: Direct word-for-word replacement  1966 (ALPAC): NRC Report on MT  Conclusion: MT no longer worthy of serious scientific investigation.  1966-1975: “Recovery period”  1975-1985: Resurgence (Europe, Japan)  1985-present: Gradual Resurgence (US)

13 How?  How would you implement automatic translation on a computer?

14 Big Idea: Word Alignment  Start with parallel corpora  Learn word alignment  Hidden variable: alignment from foreign (target) word to source word.  Use EM!

15  How would you implement automatic translation on a computer?

16 Vauquois Triangle Interlingua Semantic Structure Semantic Structure Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct

17 Approaches Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct (Vauquois triangle)

18 Methods  Rule-based Methods  Expert system-like rewrite systems  Lexicons constructed by people  Can be very fast, and can accumulate a lot of knowledge over time  e.g., SysTran – the engine behind the venerable Babelfish  Statistical Methods  Word-to-word translation  Phrase-based translation  Syntax-based translation (tree-to-tree, tree-to-string, etc.)  Trained on parallel corpora  Usually noisy-channel (at least in spirit), but increasingly direct

19 Your Questions  Take the discussion online

20  To be continued …


Download ppt "Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing."

Similar presentations


Ads by Google