A Statistical Approach to Machine Translation ( Brown et al. 1990 CL ) POSTECH, NLP lab 김 지 협.

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Overview of Inferential Statistics
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Statistical NLP Course for Master in Computational Linguistics 2nd Year Diana Trandabat.
Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.
DP-based Search Algorithms for Statistical Machine Translation My name: Mauricio Zuluaga Based on “Christoph Tillmann Presentation” and “ Word Reordering.
Statistical Machine Translation. General Framework Given sentences S and T, assume there is a “translator oracle” that can calculate P(T|S), the probability.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
A Phrase-Based, Joint Probability Model for Statistical Machine Translation Daniel Marcu, William Wong(2002) Presented by Ping Yu 01/17/2006.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Machine Translation A Presentation by: Julie Conlonova, Rob Chase, and Eric Pomerleau.
C SC 620 Advanced Topics in Natural Language Processing Lecture 24 4/22.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
1 Statistical NLP: Lecture 13 Statistical Alignment and Machine Translation.
Microsoft Research Faculty Summit Robert Moore Principal Researcher Microsoft Research.
Jan 2005Statistical MT1 CSA4050: Advanced Techniques in NLP Machine Translation III Statistical MT.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
SI485i : NLP Set 3 Language Models Fall 2012 : Chambers.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Statistical Alignment and Machine Translation
An Introduction to SMT Andy Way, DCU. Statistical Machine Translation (SMT) Translation Model Language Model Bilingual and Monolingual Data* Decoder:
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
2010 Failures in Czech-English Phrase-Based MT 2010 Failures in Czech-English Phrase-Based MT Full text, acknowledgement and the list of references in.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
11 Chapter 14 Part 1 Statistical Parsing Based on slides by Ray Mooney.
Martin KayTranslation—Meaning1 Martin Kay Stanford University with thanks to Kevin Knight.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
February 2006Machine Translation II.21 Postgraduate Diploma In Translation Example Based Machine Translation Statistical Machine Translation.
Language modelling María Fernández Pajares Verarbeitung gesprochener Sprache.
1 Modeling Long Distance Dependence in Language: Topic Mixtures Versus Dynamic Cache Models Rukmini.M Iyer, Mari Ostendorf.
LREC 2008 Marrakech 29 May Caroline Lavecchia, Kamel Smaïli and David Langlois LORIA / Groupe Parole, Vandoeuvre-Lès-Nancy, France Phrase-Based Machine.
Statistical Decision-Tree Models for Parsing NLP lab, POSTECH 김 지 협.
Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.
Inside-outside reestimation from partially bracketed corpora F. Pereira and Y. Schabes ACL 30, 1992 CS730b김병창 NLP Lab
School of Computer Science 1 Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Phrase-Based Statistical Machine Translation as a Traveling Salesman Problem Mikhail Zaslavskiy Marc Dymetman Nicola Cancedda ACL 2009.
Jan 2009Statistical MT1 Advanced Techniques in NLP Machine Translation III Statistical MT.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Machine Translation Course 4 Diana Trandab ă ț Academic year:
September 2004CSAW Extraction of Bilingual Information from Parallel Texts Mike Rosner.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Recap: distributional hypothesis What is tezgüino? – A bottle of tezgüino is on the table. – Everybody likes tezgüino. – Tezgüino makes you drunk. – We.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
CSE 517 Natural Language Processing Winter 2015
Statistical NLP: Lecture 13
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
Neural Language Model CS246 Junghoo “John” Cho.
CSCI 5832 Natural Language Processing
CSCI 5832 Natural Language Processing
Expectation-Maximization Algorithm
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Statistical Machine Translation Papers from COLING 2004
Johns Hopkins 2003 Summer Workshop on Syntax and Statistical Machine Translation Chapters 5-8 Ethan Phelps-Goodman.
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Neural Machine Translation by Jointly Learning to Align and Translate
Presentation transcript:

A Statistical Approach to Machine Translation ( Brown et al CL ) POSTECH, NLP lab 김 지 협

POSTECH NLP LAB. 2Contents oIntroduction oLanguage Model oTranslation Model oSearching oParameter Estimation oExperiment oPlan

POSTECH NLP LAB. 3Introduction oIn 1949 m Warren Weaver : Statistical methods ( Information theory ) m Not used : Slow computing power & small machine readable texts oThese days m Fast computers & large machine readable corpora m Successful approach to speech recognition m Give them a chance in machine translation oThe job of a Translation : Arts m Don’t hope to reach it m Only the translation of individual sentences m Many translations / a given sentence m The choice among them : a matter of tastes

POSTECH NLP LAB. 4Continued oHow to view a translation in this paper? m Every sentences in one language is a possible translation of any sentence in the other. m Assign to every pair of sentences (S, T) a probability m Pr ( T | S ) : Probability producing T in the target language, given S in the source language ex) Pr(Le matin je me brosse les dents | President Lincoln was a good layers) < Pr(Le president Lincoln etait un bon avocat | President Lincoln was a good layers) oWhat is a translation? m Using Bayes’ theorem

POSTECH NLP LAB. 5Continued oStatistical translation system Source Language Model Decoder Translation Model ST T

POSTECH NLP LAB. 6 Language Model oModeling m A given a word string, m So many histories : n - gram model used m The power of a tri-gam model : Bag Translation Experiment Scheme Experiment (Fig. 2): 38 sentences with fewer than 11 words If we had handled longer sentences a order proper s’: a proper order s : order proper a P(s’) > P(s) arrangement

POSTECH NLP LAB. 7 Translation Model oTerminology (alignment, fertility, distortion) m Alignment (producing) Total possible connections : l x m Set of alignments of (S, T) : 2 lm alignments ex) P. 267, Fig. 1, 2, 3 ( Alignment Examples ) Think about only Fig. 1 in this paper Ss1s2:slSs1s2:sl T t 1 t 2 : t m

POSTECH NLP LAB. 8Continued m Fertility: Number of target words that an source word produce m Distortion: Target word far from the source word that produced it m Notation ( Le chien est battu par Jean|John(6) does beat(3,4) the(1) dog(2) ) Jean battu est chien par Le John dog the beat does

POSTECH NLP LAB. 9Continued o2 - Models m Without loss m Model 1 all connections assigned equal probability translation probability P(t|s) m Model 2 fertility probability P(n|s) distortion probability P(j|i, m) Ss1si:slSs1si:sl T t 1 t j t m a1a1 amam ajaj

POSTECH NLP LAB. 10Searching oAlgorithm m Searching for the sentence S that maximizes Pr(S)Pr(T|S) m Too many sentences to try : use suboptimal search m Use a variant of the Stack Search maintain a list of partial alignment hypotheses initially, one entry : (Jean aime Marie | * ) search proceeds by iterations, extending most promising entries cut some hypothesis, using threshhold ex) (Jean aime Marie | John(1) *), (Jean aime Marie | * loves(2) *), (Jean aime Marie | Mary(3) *) end : a complete alignment on the list that is significantly more promising than any of the incomplete alignment

POSTECH NLP LAB. 11 Parameter Estimation oTraining Corpus m The proceedings of the Canadian parliament (100 million words of English texts and corresponding French texts) m Extract 3 million pairs of sentences : a statistical algorithm based on sentence length ( 99% accurate ) m Language model: use a bi-gram from English texts m Translation model from unaligned pairs of sentences so, we can’t count analogous to the situation in speech recognition using EM algorithm oEstimation Steps m step 1 : translation probability from Model 1 m step 2 : fertility, distortion probability from Model 2

POSTECH NLP LAB. 12Experiment o1st Exp. to estimate parameters for the translation model m 9,000 English vocabulary & 9,000 French vocabulary used m So, we have 81,000,000 parameters m Training corpus: 40,000 pairs of sentences ( 800,000 words in each ) m Result : Fig. 4, 5, 6

POSTECH NLP LAB. 13Continued o2nd Exp. to translate from French to English m Translation Model parameter estimation 1,000 English vocabulary & 1,700 French vocabulary used 17,000,000 parameters from 117,000 pairs of sentences m Language Model parameter estimation bi-gram, 570,000 sentences from English texts (12,000,000 words) not restricted to sentences covered by 1,000 vocabulary m Search 73 new French sentences from elsewhere in the Hansard data m Result 5 categorization (Exact, Alternate, Different, Wrong, Ungrammatical) Fig. 8 Translation Results 776 strokes vs. 1,916 strokes from scratch (60% saved)

POSTECH NLP LAB. 14Plans oSome improvement for Parameter Estimations m Several source words to work together to produce a single target word m Use Tri-gram oMorphologies for French & English

POSTECH NLP LAB. 15