Approaches to Machine Translation

Slides:



Advertisements
Similar presentations
Machine Translation II How MT works Modes of use.
Advertisements

Natural Language Understanding Difficulties: Large amount of human knowledge assumed – Context is key. Language is pattern-based. Patterns can restrict.
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction to Compiling
Introduction to MT CSE 415 Fei Xia Linguistics Dept 02/24/06.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
Text segmentation Amany AlKhayat. Before any real processing is done, text needs to be segmented at least into linguistic units such as words, punctuation,
Machine Translation Course 2 Diana Trandab ă ţ Academic year:
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A method to restrict the blow-up of hypotheses... A method to restrict the blow-up of hypotheses of a non-disambiguated shallow machine translation system.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
10/31/00 1 Introduction to Cognitive Science Linguistics Component Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo.
Compiler Design (40-414) Main Text Book:
Approaches to Machine Translation
Chapter 1 Introduction.
Introduction to Machine Translation
PRESENTED BY: PEAR A BHUIYAN
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Statistical NLP: Lecture 3
Basic Parsing with Context Free Grammars Chapter 13
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Chapter 1 Introduction.
Natural Language Processing (NLP)
Compiler Lecture 1 CS510.
Statistical NLP: Lecture 13
CS416 Compiler Design lec00-outline September 19, 2018
--Mengxue Zhang, Qingyang Li
Lecture 2: General Structure of a Compiler
Introduction CI612 Compiler Design CI612 Compiler Design.
Lecture 7: Introduction to Parsing (Syntax Analysis)
R.Rajkumar Asst.Professor CSE
What do ELLs Need Differently in Math
CS4705 Natural Language Processing
Introduction to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Linguistic Essentials
CS416 Compiler Design lec00-outline February 23, 2019
Natural Language Processing (NLP)
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Artificial Intelligence 2004 Speech & Natural Language Processing
Natural Language Processing (NLP)
Presentation transcript:

Approaches to Machine Translation CSC 4598 Machine Translation Dr. Tom Way

How humans do translation? Learn a foreign language: Memorize word translations Learn some patterns: Exercise: Passive activity: read, listen Active activity: write, speak Translation: Understand the sentence Clarify or ask for help (optional) Translate the sentence Training stage Translation lexicon Templates, transfer rules Reinforced learning? Reranking? Decoding stage Parsing, semantics analysis? Interactive MT? Word-level? Phrase-level? Generate from meaning?

What kinds of resources are available to MT? Translation lexicon: Bilingual dictionary Templates, transfer rules: Grammar books Parallel data, comparable data Thesaurus, WordNet, FrameNet, … NLP tools: tokenizer, morph analyzer, parser, …  More resources for major languages, less for “minor” languages.

Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach

The MT triangle Synthesis Analysis Meaning (interlingua) word Word Transfer-based Also known as the Vauquois Triangle Represents levels of analysis Starting at the bottom level, translation is simplest but least precise words are replaced by other words, statistics could be used to pick best word (SMT) SMT = statistical machine translation EBMT = example based machine translation Next level works on phrases, using same basic ideas Top level is transfer-based, which tries to understand meaning in one language and translate to the words that express that meaning in the other As you go up, analysis and synthesis get more difficult Phrase-based SMT, EBMT Word-based SMT, EBMT word Word

Transfer-based MT Analysis, transfer, generation: Resources required: Parse the source sentence Transform the parse tree with transfer rules Translate source words Get the target sentence from the tree Resources required: Source parser A translation lexicon A set of transfer rules An example: Mary bought a book yesterday.

Transfer-based MT (cont) Parsing: linguistically motivated grammar or formal grammar? Transfer: context-free rules? A path on a dependency tree? Apply at most one rule at each level? How are rules created? Translating words: word-to-word translation? Generation: using LM or other additional knowledge? How to create the needed resources automatically?

Interlingua For n languages, we need n(n-1) MT systems. Interlingua uses a language-independent representation. Conceptually, Interlingua is elegant: we only need n analyzers, and n generators. Resource needed: A language-independent representation Sophisticated analyzers Sophisticated generators

Interlingua (cont) Questions: Does language-independent meaning representation really exist? If so, what does it look like? It requires deep analysis: how to get such an analyzer: e.g., semantic analysis It requires non-trivial generation: How is that done? It forces disambiguation at various levels: lexical, syntactic, semantic, discourse levels. It cannot take advantage of similarities between a particular language pair.

Example-based MT Basic idea: translate a sentence by using the closest match in parallel data. First proposed by Nagao (1981). Ex: Training data: w1 w2 w3 w4  w1’ w2’ w3’ w4’ w5 w6 w7  w5’ w6’ w7’ w8 w9  w8’ w9’ Test sent: w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’

EMBT (cont) Types of EBMT: Types of data required by EBMT systems: Lexical (shallow) Morphological / POS analysis Parse-tree based (deep) Types of data required by EBMT systems: Parallel text Bilingual dictionary Thesaurus for computing semantic similarity Syntactic parser, dependency parser, etc.

EBMT (cont) Word alignment: using dictionary and heuristics  exact match Generalization: Clusters: dates, numbers, colors, shapes, etc. Clusters can be built by hand or learned automatically. Ex: Exact match: 12 players met in Paris last Tuesday  12 Spieler trafen sich letzen Dienstag in Paris Templates: $num players met in $city $time  $num Spieler trafen sich $time in $city

Statistical MT Basic idea: learn all the parameters from parallel data. Major types: Word-based Phrase-based Strengths: Easy to build, and it requires no human knowledge Good performance when a large amount of training data is available. Weaknesses: How to express linguistic generalization?

Comparison of resource requirement Transfer-based Interlingua EBMT SMT dictionary + Transfer rules parser + (?) semantic analyzer parallel data others Universal representation thesaurus

Hybrid MT Basic idea: combine strengths of different approaches: Syntax-based: generalization at syntactic level Interlingua: conceptually elegant EBMT: memorizing translation of n-grams; generalization at various level. SMT: fully automatic; using LM; optimizing some objective functions. Types of hybrid HT: Borrowing concepts/methods: SMT from EBMT: phrase-based SMT; Alignment templates EBMT from SMT: automatically learned translation lexicon Transfer-based from SMT: automatically learned translation lexicon, transfer rules; using LM … Using two MTs in a pipeline: Using transfer-based MT as a preprocessor of SMT Using multiple MTs in parallel, then adding a re-ranker.