Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

Slides:



Advertisements
Similar presentations
Machine Translation II How MT works Modes of use.
Advertisements

Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Improving Machine Translation Quality via Hybrid Systems and Refined Evaluation Methods Andreas Eisele DFKI GmbH and Saarland University Helsinki, November.
Machine Translation (Level 2) Anna Sågvall Hein GSLT Course, September 2004.
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
C SC 620 Advanced Topics in Natural Language Processing Lecture 20 4/8.
Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06.
1 A Tree Sequence Alignment- based Tree-to-Tree Translation Model Authors: Min Zhang, Hongfei Jiang, Aiti Aw, et al. Reporter: 江欣倩 Professor: 陳嘉平.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
Inducing Information Extraction Systems for New Languages via Cross-Language Projection Ellen Riloff University of Utah Charles Schafer, David Yarowksy.
Introduction to CL Session 1: 7/08/2011. What is computational linguistics? Processing natural language text by computers  for practical applications.
The current status of Chinese-English EBMT research -where are we now Joy, Ralf Brown, Robert Frederking, Erik Peterson Aug 2001.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.
MT Summit VIII, Language Technologies Institute School of Computer Science Carnegie Mellon University Pre-processing of Bilingual Corpora for Mandarin-English.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
BİL744 Derleyici Gerçekleştirimi (Compiler Design)1.
Machine translation (I) MT overview Ling 571 Fei Xia Week 9: 11/22/05 – 11/29/05.
Lecture 2 Phases of Compiler. Preprocessors, Compilers, Assemblers, and Linkers Preprocessor Compiler Assembler Linker Skeletal Source Program Source.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Machine translation Context-based approach Lucia Otoyo.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Syntax for MT EECS 767 Feb. 1, Outline Motivation Syntax-based translation model  Formalization  Training Using syntax in MT  Using multiple.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Thanks to Dan Klein of UC Berkeley and Chris Manning of Stanford for many of the materials used in this lecture. CS 479, section 1: Natural Language Processing.
CSC 338: Compiler design and implementation
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Transfer-based MT with Strong Decoding for a Miserly Data Scenario Alon Lavie Language Technologies Institute Carnegie Mellon University Joint work with:
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Compiler design Lecture 1: Compiler Overview Sulaimany University 2 Oct
Daisy Arias Math 382/Lab November 16, 2010 Fall 2010.
Topic #1: Introduction EE 456 – Compiling Techniques Prof. Carl Sable Fall 2003.
Introduction to Compiling
Introduction to MT CSE 415 Fei Xia Linguistics Dept 02/24/06.
Compiler Design Introduction 1. 2 Course Outline Introduction to Compiling Lexical Analysis Syntax Analysis –Context Free Grammars –Top-Down Parsing –Bottom-Up.
Jan 2005CSA4050 Machine Translation II1 CSA4050: Advanced Techniques in NLP Machine Translation II Direct MT Transfer MT Interlingual MT.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Machine Translation Course 2 Diana Trandab ă ţ Academic year:
Approaching a New Language in Machine Translation Anna Sågvall Hein, Per Weijnitz.
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Natural Language Processing Group Computer Sc. & Engg. Department JADAVPUR UNIVERSITY KOLKATA – , INDIA. Professor Sivaji Bandyopadhyay
CSC 4181 Compiler Construction
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
CS416 Compiler Design1. 2 Course Information Instructor : Dr. Ilyas Cicekli –Office: EA504, –Phone: , – Course Web.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Jan 2012MT Architectures1 Human Language Technology Machine Translation Architectures Direct MT Transfer MT Interlingual MT.
Introduction to Machine Translation
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Approaches to Machine Translation
Introduction to Machine Translation
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
CS416 Compiler Design lec00-outline September 19, 2018
Introduction CI612 Compiler Design CI612 Compiler Design.
Approaches to Machine Translation
Introduction to Machine Translation
CS416 Compiler Design lec00-outline February 23, 2019
Lec00-outline May 18, 2019 Compiler Design CS416 Compiler Design.
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way

How humans do translation? Learn a foreign language: –Memorize word translations –Learn some patterns: –Exercise: Passive activity: read, listen Active activity: write, speak Translation: –Understand the sentence –Clarify or ask for help (optional) –Translate the sentence Training stage Decoding stage Translation lexicon Templates, transfer rules Parsing, semantics analysis? Interactive MT? Word-level? Phrase-level? Generate from meaning? Reinforced learning? Reranking?

What kinds of resources are available to MT? Translation lexicon: –Bilingual dictionary Templates, transfer rules: –Grammar books Parallel data, comparable data Thesaurus, WordNet, FrameNet, … NLP tools: tokenizer, morph analyzer, parser, …  More resources for major languages, less for “minor” languages.

Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach

The MT triangle word Word Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) Analysis Synthesis

Transfer-based MT Analysis, transfer, generation: 1.Parse the source sentence 2.Transform the parse tree with transfer rules 3.Translate source words 4.Get the target sentence from the tree Resources required: –Source parser –A translation lexicon –A set of transfer rules An example: Mary bought a book yesterday.

Transfer-based MT (cont) Parsing: linguistically motivated grammar or formal grammar? Transfer: –context-free rules? A path on a dependency tree? –Apply at most one rule at each level? –How are rules created? Translating words: word-to-word translation? Generation: using LM or other additional knowledge? How to create the needed resources automatically?

Interlingua For n languages, we need n(n-1) MT systems. Interlingua uses a language-independent representation. Conceptually, Interlingua is elegant: we only need n analyzers, and n generators. Resource needed: –A language-independent representation –Sophisticated analyzers –Sophisticated generators

Interlingua (cont) Questions: –Does language-independent meaning representation really exist? If so, what does it look like? –It requires deep analysis: how to get such an analyzer: e.g., semantic analysis –It requires non-trivial generation: How is that done? –It forces disambiguation at various levels: lexical, syntactic, semantic, discourse levels. –It cannot take advantage of similarities between a particular language pair.

Example-based MT Basic idea: translate a sentence by using the closest match in parallel data. First proposed by Nagao (1981). Ex: –Training data: w1 w2 w3 w4  w1’ w2’ w3’ w4’ w5 w6 w7  w5’ w6’ w7’ w8 w9  w8’ w9’ –Test sent: w1 w2 w6 w7 w9  w1’ w2’ w6’ w7’ w9’

EMBT (cont) Types of EBMT: –Lexical (shallow) –Morphological / POS analysis –Parse-tree based (deep) Types of data required by EBMT systems: –Parallel text –Bilingual dictionary –Thesaurus for computing semantic similarity –Syntactic parser, dependency parser, etc.

EBMT (cont) Word alignment: using dictionary and heuristics  exact match Generalization: –Clusters: dates, numbers, colors, shapes, etc. –Clusters can be built by hand or learned automatically. Ex: –Exact match: 12 players met in Paris last Tuesday  12 Spieler trafen sich letzen Dienstag in Paris –Templates: $num players met in $city $time  $num Spieler trafen sich $time in $city

Statistical MT Basic idea: learn all the parameters from parallel data. Major types: –Word-based –Phrase-based Strengths: –Easy to build, and it requires no human knowledge –Good performance when a large amount of training data is available. Weaknesses: –How to express linguistic generalization?

Comparison of resource requirement Transfer- based InterlinguaEBMTSMT dictionary+++ Transfer rules + parser+++ (?) semantic analyzer + parallel data++ othersUniversal representation thesaurus

Hybrid MT Basic idea: combine strengths of different approaches: –Syntax-based: generalization at syntactic level –Interlingua: conceptually elegant –EBMT: memorizing translation of n-grams; generalization at various level. –SMT: fully automatic; using LM; optimizing some objective functions. Types of hybrid HT: –Borrowing concepts/methods: SMT from EBMT: phrase-based SMT; Alignment templates EBMT from SMT: automatically learned translation lexicon Transfer-based from SMT: automatically learned translation lexicon, transfer rules; using LM … –Using two MTs in a pipeline: Using transfer-based MT as a preprocessor of SMT –Using multiple MTs in parallel, then adding a re-ranker.