Download presentation
Presentation is loading. Please wait.
Published byRalph Owen Modified over 9 years ago
1
Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing Lecture #33: Intro. To Machine Translation This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative Commons Attribution-Share Alike 3.0 Unported License
2
Announcements Reading Report #13: M&S ch. 13 on alignment and MT Due now; discussing at end of lecture today or on the group Homework 0.3 Feedback Question one did not contribute to your grade Compare with the key Homework 0.4 Posted Tuesday
3
Final Project Project #4 Note the updates to the tutorial with the flowchart slides from lecture #29 Project #5 Instructions to be updated today Help session: Tuesday Propose-your-own Move forward Feedback to be sent today Project Report: Early: Wednesday after Thanksgiving Due: Friday after Thanksgiving Check the schedule Plan enough time to succeed!
4
Quiz – keep the ideas fresh 1.What are the four steps of the Expectation Maximization (EM) algorithm? Think of the document clustering example, if that helps 2.What is the primary purpose of EM?
5
Objectives Introduce the problem of machine translation Appreciate the need for alignment in statistical approaches to translation
6
Machine Translation is Hard REF: According to the data provided today by the Ministry of Foreign Trade and Economic Cooperation, as of November this year, China has actually utilized 46.959 billion US dollars of foreign capital, including 40.007 billion US dollars of direct investment from foreign businessmen. the Ministry of Foreign Trade and Economic Cooperation, including foreign direct investment 40.007 billion US dollars today provide data include that year to November china actually using foreign 46.959 billion US dollars and today’s available data of the Ministry of Foreign Trade and Economic Cooperation shows that china’s actual utilization of November this year will include 40.007 billion US dollars for the foreign direct investment among 46.959 billion US dollars in foreign capital IBM4: Yamada & Knight:
7
But MT is Real http://www.microsofttranslator.com/ http://translate.google.com/
8
Why so hard? What makes translation so hard?
9
Problem: Non-Literal Translation Un train s'est également arrêté sans qu'aucun passager ne soit blessé. Injuries were also avoided by the automatic shutdown of a train.
10
History 1950’s: Intensive research activity in MT Roll video …
12
History 1950’s: Intensive research activity in MT Roll video … 1960’s: Direct word-for-word replacement 1966 (ALPAC): NRC Report on MT Conclusion: MT no longer worthy of serious scientific investigation. 1966-1975: “Recovery period” 1975-1985: Resurgence (Europe, Japan) 1985-present: Gradual Resurgence (US)
13
How? How would you implement automatic translation on a computer?
14
Big Idea: Word Alignment Start with parallel corpora Learn word alignment Hidden variable: alignment from foreign (target) word to source word. Use EM!
15
How would you implement automatic translation on a computer?
16
Vauquois Triangle Interlingua Semantic Structure Semantic Structure Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct
17
Approaches Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct (Vauquois triangle)
18
Methods Rule-based Methods Expert system-like rewrite systems Lexicons constructed by people Can be very fast, and can accumulate a lot of knowledge over time e.g., SysTran – the engine behind the venerable Babelfish Statistical Methods Word-to-word translation Phrase-based translation Syntax-based translation (tree-to-tree, tree-to-string, etc.) Trained on parallel corpora Usually noisy-channel (at least in spirit), but increasingly direct
19
Your Questions Take the discussion online
20
To be continued …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.