Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Machine Translation II How MT works Modes of use.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Symmetric Probabilistic Alignment Jae Dong Kim Committee: Jaime G. Carbonell Ralf D. Brown Peter J. Jansen.
ME 221Lecture 161 ME 221 Statics Lecture #16 Section 4.6.
CS 188: Artificial Intelligence Fall 2009 Lecture 19: Hidden Markov Models 11/3/2009 Dan Klein – UC Berkeley.
APPLIED LOGISTICS The Productivity Improvement Program for Warehousing and Transportation.
MACHINE TRANSLATION A precious key to communicate beyond linguistic barriers 1.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
CS 312: Algorithm Analysis Lecture #3: Algorithms for Modular Arithmetic, Modular Exponentiation This work is licensed under a Creative Commons Attribution-Share.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
CS 312: Algorithm Analysis
CS 312: Algorithm Design & Analysis Lecture #34: Branch and Bound Design Options for Solving the TSP: Tight Bounds This work is licensed under a Creative.
Lecture 1 Page 1 CS 111 Summer 2015 Introduction CS 111 Operating System Principles.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
CS 312: Algorithm Analysis Lecture #4: Primality Testing, GCD This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.
CS 312: Algorithm Design & Analysis Lecture #17: Connectedness in Graphs This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
CS6700 Advanced AI Bart Selman. Admin Project oriented course Projects --- research style or implementation style with experimental component. 1 or 2.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
CS 312: Algorithm Analysis Lecture #8: Non-Homogeneous Recurrence Relations This work is licensed under a Creative Commons Attribution-Share Alike 3.0.
Recent Major MT Developments at CMU Briefing for Joe Olive February 5, 2008 Alon Lavie and Stephan Vogel Language Technologies Institute Carnegie Mellon.
CSE403 Software Engineering Autumn 2001 Design (Information Hiding) Gary Kimura Lecture #8 October 17, 2001.
Advanced MT Seminar Spring 2008 Instructors: Alon Lavie and Stephan Vogel.
CS 312: Algorithm Design & Analysis Lecture #23: Making Optimal Change with Dynamic Programming Slides by: Eric Ringger, with contributions from Mike Jones,
CS 312: Algorithm Design & Analysis Lecture #12: Average Case Analysis of Quicksort This work is licensed under a Creative Commons Attribution-Share Alike.
CS 312: Algorithm Analysis Lecture #1: Algorithms and Efficiency This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
CPSC 121: Models of Computation Unit 0 Introduction George Tsiknis Based on slides by Patrice Belleville and Steve Wolfman.
CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike.
1 CSI 5180: Topics in AI: Natural Language Processing, A Statistical Approach Instructor: Nathalie Japkowicz Objectives of.
CS 312: Algorithm Design & Analysis Lecture #2: Asymptotic Notation This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
CS 312: Algorithm Design & Analysis Lecture #35: Branch and Bound Design Options: State Spaces Slides by: Eric Ringger, with contributions from Mike Jones,
Course Instructor: K ashif I hsan 1. Chapter # 1 Kashif Ihsan, Lecturer CS, MIHE2.
CSA2050 Introduction to Computational Linguistics Lecture 1 What is Computational Linguistics?
For Wednesday No reading Homework –Chapter 23, exercise 15 –Process: 1.Create 5 sentences 2.Select a language 3.Translate each sentence into that language.
4/25/2013 CS152, Spring 2013 CS 152 Computer Architecture and Engineering Lecture 22: Putting it All Together Krste Asanovic Electrical Engineering and.
Compiler Construction (CS-636)
CS 312: Algorithm Design & Analysis Lecture #37: A* (cont.); Admissible Heuristics Credit: adapted from slides by Stuart Russell of UC Berkeley. This work.
Today’s Agenda is…. Pre-algebra Friday, October 23 (Homework-Skill 6-No Calculators- is due on Friday) 1.Check Homework 2.Computer Activities
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
Slides by: Eric Ringger, adapted from slides by Stuart Russell of UC Berkeley. CS 312: Algorithm Design & Analysis Lecture #36: Best-first State- space.
CS 312: Algorithm Analysis Lecture #7: Recurrence Relations a.k.a. Difference Equations Slides by: Eric Ringger, with contributions from Mike Jones, Eric.
CS 312: Algorithm Analysis Lecture #33: Branch and Bound, Job Assignment This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported.
CS 312: Algorithm Analysis
CM220 College Composition II Friday, January 29, Unit 1: Introduction to Effective Academic and Professional Writing Unit 1 Lori Martindale, Instructor.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
CS 312: Algorithm Analysis Lecture #31: Linear Programming: the Simplex Algorithm, part 2 This work is licensed under a Creative Commons Attribution-Share.
CS 312: Algorithm Analysis Lecture #35: Branch and Bound Design Options - State Spaces Slides by: Eric Ringger, with contributions from Mike Jones, Eric.
CS 312: Algorithm Analysis Lecture #31: Linear Programming: the Simplex Algorithm, part 2 This work is licensed under a Creative Commons Attribution-Share.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
CS 312: Algorithm Analysis Lecture #9: Recurrence Relations - Change of Variable Slides by: Eric Ringger, with contributions from Mike Jones, Eric Mercer,
CS 312: Algorithm Design & Analysis Lecture #26: 0/1 Knapsack This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.
CS 312: Algorithm Analysis Lecture #30: Linear Programming: Intro. to the Simplex Algorithm This work is licensed under a Creative Commons Attribution-Share.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
CS 312: Algorithm Analysis Lecture #27: Network Flow This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative.
Related Courses CMPT 411: Knowledge Representation. Mainly Logic. CMPT 413: Computational Linguistics. Dealing with Natural Language. CMPT 419/726: Often.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Introduction to Machine Translation
Approaches to Machine Translation
Introduction to Machine Translation
Presentation transcript:

Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing Lecture #33: Intro. To Machine Translation This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.Creative Commons Attribution-Share Alike 3.0 Unported License

Announcements  Reading Report #13: M&S ch. 13 on alignment and MT  Due now; discussing at end of lecture today or on the group  Homework 0.3 Feedback  Question one did not contribute to your grade  Compare with the key  Homework 0.4  Posted Tuesday

Final Project  Project #4  Note the updates to the tutorial with the flowchart slides from lecture #29  Project #5  Instructions to be updated today  Help session: Tuesday  Propose-your-own  Move forward  Feedback to be sent today  Project Report:  Early: Wednesday after Thanksgiving  Due: Friday after Thanksgiving  Check the schedule  Plan enough time to succeed!

Quiz – keep the ideas fresh 1.What are the four steps of the Expectation Maximization (EM) algorithm?  Think of the document clustering example, if that helps 2.What is the primary purpose of EM?

Objectives  Introduce the problem of machine translation  Appreciate the need for alignment in statistical approaches to translation

Machine Translation is Hard REF: According to the data provided today by the Ministry of Foreign Trade and Economic Cooperation, as of November this year, China has actually utilized billion US dollars of foreign capital, including billion US dollars of direct investment from foreign businessmen. the Ministry of Foreign Trade and Economic Cooperation, including foreign direct investment billion US dollars today provide data include that year to November china actually using foreign billion US dollars and today’s available data of the Ministry of Foreign Trade and Economic Cooperation shows that china’s actual utilization of November this year will include billion US dollars for the foreign direct investment among billion US dollars in foreign capital IBM4: Yamada & Knight:

But MT is Real

Why so hard?  What makes translation so hard?

Problem: Non-Literal Translation Un train s'est également arrêté sans qu'aucun passager ne soit blessé. Injuries were also avoided by the automatic shutdown of a train.

History  1950’s: Intensive research activity in MT  Roll video …

History  1950’s: Intensive research activity in MT  Roll video …  1960’s: Direct word-for-word replacement  1966 (ALPAC): NRC Report on MT  Conclusion: MT no longer worthy of serious scientific investigation.  : “Recovery period”  : Resurgence (Europe, Japan)  1985-present: Gradual Resurgence (US)

How?  How would you implement automatic translation on a computer?

Big Idea: Word Alignment  Start with parallel corpora  Learn word alignment  Hidden variable: alignment from foreign (target) word to source word.  Use EM!

 How would you implement automatic translation on a computer?

Vauquois Triangle Interlingua Semantic Structure Semantic Structure Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct

Approaches Interlingua Semantic Structure Semantic Structure Syntactic Structure Syntactic Structure Word Structure Word Structure Source Text Target Text Semantic Composition Semantic Decomposition Semantic Analysis Semantic Generation Syntactic Analysis Syntactic Generation Morphological Analysis Morphological Generation Semantic Transfer Syntactic Transfer Direct (Vauquois triangle)

Methods  Rule-based Methods  Expert system-like rewrite systems  Lexicons constructed by people  Can be very fast, and can accumulate a lot of knowledge over time  e.g., SysTran – the engine behind the venerable Babelfish  Statistical Methods  Word-to-word translation  Phrase-based translation  Syntax-based translation (tree-to-tree, tree-to-string, etc.)  Trained on parallel corpora  Usually noisy-channel (at least in spirit), but increasingly direct

Your Questions  Take the discussion online

 To be continued …