Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
UNIT-III By Mr. M. V. Nikum (B.E.I.T). Programming Language Lexical and Syntactic features of a programming Language are specified by its grammar Language:-
Shallow Processing: Summary Shallow Processing Techniques for NLP Ling570 December 7, 2011.
Introduction to MT Ling 580 Fei Xia Week 1: 1/03/06.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
1 Improving a Statistical MT System with Automatically Learned Rewrite Patterns Fei Xia and Michael McCord (Coling 2004) UW Machine Translation Reading.
Machine Translation Prof. Alexandros Potamianos Dept. of Electrical & Computer Engineering Technical University of Crete, Greece May 2003.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Machine translation (I) MT overview Ling 571 Fei Xia Week 9: 11/22/05 – 11/29/05.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Does Syntactic Knowledge help English- Hindi SMT ? Avinesh. PVS. K. Taraka Rama, Karthik Gali.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
Natural Language Processing Expectation Maximization.
Natural Language Processing Lab Northeastern University, China Feiliang Ren EBMT Based on Finite Automata State Transfer Generation Feiliang Ren.
Finding parallel texts on the web using cross-language information retrieval Achim Ruopp Joint work with Fei Xia University of Washington.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Syntax Directed Definitions Synthesized Attributes
English-Persian SMT Reza Saeedi 1 WTLAB Wednesday, May 25, 2011.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Week 9: resources for globalisation Finish spell checkers Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and.
Evaluation of the Statistical Machine Translation Service for Croatian-English Marija Brkić Department of Informatics, University of Rijeka
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
Globalisation and machine translation Machine Translation (MT) The ‘decoding’ paradigm Ambiguity Translation models Interlingua and First Order Predicate.
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad
The ICT Statistical Machine Translation Systems for IWSLT 2007 Zhongjun He, Haitao Mi, Yang Liu, Devi Xiong, Weihua Luo, Yun Huang, Zhixiang Ren, Yajuan.
NUDT Machine Translation System for IWSLT2007 Presenter: Boxing Chen Authors: Wen-Han Chao & Zhou-Jun Li National University of Defense Technology, China.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
What’s in a translation rule? Paper by Galley, Hopkins, Knight & Marcu Presentation By: Behrang Mohit.
Challenges of Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
1 Compiler Design (40-414)  Main Text Book: Compilers: Principles, Techniques & Tools, 2 nd ed., Aho, Lam, Sethi, and Ullman, 2007  Evaluation:  Midterm.
NRC Report Conclusion Tu Zhaopeng NIST06  The Portage System  For Chinese large-track entry, used simple, but carefully- tuned, phrase-based.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Introduction to MT CSE 415 Fei Xia Linguistics Dept 02/24/06.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.
2003 (c) University of Pennsylvania1 Better MT Using Parallel Dependency Trees Yuan Ding University of Pennsylvania.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Improving a Statistical MT System with Automatically Learned Rewrite Rules Fei Xia and Michael McCord IBM T. J. Watson Research Center Yorktown Heights,
Large Vocabulary Data Driven MT: New Developments in the CMU SMT System Stephan Vogel, Alex Waibel Work done in collaboration with: Ying Zhang, Alicia.
October 10, 2003BLTS Kickoff Meeting1 Transfer with Strong Decoding Learning Module Transfer Rules {PP,4894} ;;Score: PP::PP [NP POSTP] -> [PREP.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
ICS312 Introduction to Compilers Set 23. What is a Compiler? A compiler is software (a program) that translates a high-level programming language to machine.
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
LING 575 Lecture 5 Kristina Toutanova MSR & UW April 27, 2010 With materials borrowed from Philip Koehn, Chris Quirk, David Chiang, Dekai Wu, Aria Haghighi.
Learning to Generate Complex Morphology for Machine Translation Einat Minkov †, Kristina Toutanova* and Hisami Suzuki* *Microsoft Research † Carnegie Mellon.
LingWear Language Technology for the Information Warrior Alex Waibel, Lori Levin Alon Lavie, Robert Frederking Carnegie Mellon University.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi
Approaches to Machine Translation
Statistical NLP: Lecture 13
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Approaches to Machine Translation
Challenges of Machine Translation
Introduction to Machine Translation
Statistical Machine Translation Papers from COLING 2004
Extracting Recipes from Chemical Academic Papers
Presentation transcript:

Course Summary LING 575 Fei Xia 03/06/07

Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics

Introduction to MT

Major challenges Translation is hard. Getting the right words: –Choosing the correct root form –Getting the correct inflected form –Inserting “spontaneous” words Putting the words in the correct order: –Word order: SVO vs. SOV, … –Unique constructions: –Divergence

Lexical choice Homonymy/Polysemy: bank, run Concept gap: no corresponding concepts in another language: go Greek, go Dutch, fen sui, lame duck, … Coding (Concept  lexeme mapping) differences: –More distinction in one language: e.g., kinship vocabulary. –Different division of conceptual space:

Major approaches Transfer-based Interlingua Example-based (EBMT) Statistical MT (SMT) Hybrid approach

The MT triangle word Word Meaning Transfer-based Phrase-based SMT, EBMT Word-based SMT, EBMT (interlingua) Analysis Synthesis

Comparison of resource requirement Transfer- based InterlinguaEBMTSMT dictionary+++ Transfer rules + parser+++ (?) semantic analyzer + parallel data++ othersUniversal representation Generator thesaurus

Evaluation Unlike many NLP tasks (e.g., tagging, chunking, parsing, IE, pronoun resolution), there is no single gold standard for MT. Human evaluation: accuracy, fluency, … –Problem: expensive, slow, subjective, non-reusable. Automatic measures: –Edit distance –Word error rate (WER), Position-independent WER (PER) –Simple string accuracy (SSA), Generation string accuracy (GSA) –BLEU

Major approaches

Word-based SMT IBM Models 1-5 Main concepts: –Source channel model –Hidden word alignment –EM training

Source channel model for MT Eng sent Noisy channel Fr sent P(E)P(F | E) Two types of parameters: Language model: P(E) Translation model: P(F | E)

Modeling p(F | E) with alignment

Modeling Parameters: Length prob: P(m | l) Translation prob: t(f j | e i ) Distortion prob (for Model 2): d(i | j, m, l) Model 1: Model 2:

Training Model 1:

Finding the best alignment Given E and F, we are looking for Model 1:

Clump-based SMT The unit of translation is a clump. Training stage: –Word alignment –Extracting clump pairs Decoding stage: –Try all segmentations of the src sent and all the allowed permutations –For each src clump, try TopN tgt clumps –Prune the hypotheses

Transfer-based MT Analysis, transfer, generation: –Example: (Quirk et al., 2005) 1.Parse the source sentence 2.Transform the parse tree with transfer rules 3.Translate source words 4.Get the target sentence from the tree Translation as parsing: –Example: (Wu, 1995)

Hybrid approaches Preprocessing with transfer rules: (Xia and McCord, 2004), (Collins et al, 2005) Postprocessing with taggers, parsers, etc: JHU 2003 workshop Hierarchical phrase-based model: (Chiang, 2005) …

Other topics

Other issues Resources –MT for Low density languages –Using comparable corpora and wikipedia Special translation modules –Identifying and translating name entities and abbreviations –…–…

To build an MT system (1) Gather resources –Parallel corpora, comparable corpora –Grammars, dictionaries, … Process data –Document alignment, sentence alignment –Tokenization, parsing, …

To build an MT system (2) Modeling Training –Word alignment and extracting clump pairs –Learning transfer rules Decoding –Identifying entities and translating them with special modules (optional) –Translation as parsing, or parse + transfer + translation –Segmenting src sentence, replace src clump with target clump, …

To build an MT system (3) Post-processing –System combination –Reranking Using the system for other applications: –Cross-lingual IR –Computer-assisted translation –….

Misc Grades –Assignments ( hw1-hw3): 30% –Class participation: 20% –Project: Presentation: 25% Final paper: 25%