Expectation-Maximization Algorithm

Slides:



Advertisements
Similar presentations
Statistical Machine Translation
Advertisements

Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser Institute for Natural Language Processing University of Stuttgart
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Clustering Beyond K-means
Word Alignment Philipp Koehn USC/Information Sciences Institute USC/Computer Science Department School of Informatics University of Edinburgh Some slides.
Translation Model Parameters & Expectation Maximization Algorithm Lecture 2 (adapted from notes from Philipp Koehn & Mary Hearne) Dr. Declan Groves, CNGL,
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
1 Duluth Word Alignment System Bridget Thomson McInnes Ted Pedersen University of Minnesota Duluth Computer Science Department 31 May 2003.
Machine Translation (II): Word-based SMT Ling 571 Fei Xia Week 10: 12/1/05-12/6/05.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Expectation Maximization Algorithm
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
MACHINE TRANSLATION AND MT TOOLS: GIZA++ AND MOSES -Nirdesh Chauhan.
THE MATHEMATICS OF STATISTICAL MACHINE TRANSLATION Sriraman M Tallam.
Natural Language Processing Expectation Maximization.
Translation Model Parameters (adapted from notes from Philipp Koehn & Mary Hearne) 24 th March 2011 Dr. Declan Groves, CNGL, DCU
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Training and Decoding in SMT System) Kushal Ladha M.Tech Student CSE Dept.,
Advanced Signal Processing 05/06 Reinisch Bernhard Statistical Machine Translation Phrase Based Model.
Machine Translation Course 5 Diana Trandab ă ț Academic year:
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Korea Maritime and Ocean University NLP Jung Tae LEE
Statistical Machine Translation Part III – Phrase-based SMT / Decoding Alexander Fraser Institute for Natural Language Processing Universität Stuttgart.
CSE 517 Natural Language Processing Winter 2015
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
NLP. Machine Translation Source-channel model of communication Parametric probabilistic models of language and translation.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Maximum Entropy … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is.
Machine Translation Course 4 Diana Trandab ă ț Academic year:
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Statistical Machine Translation Part II: Word Alignments and EM
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
An Iterative Approach to Discriminative Structure Learning
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
CSC 594 Topics in AI – Natural Language Processing
Partial Products Algorithm for Multiplication
Hidden Markov Models - Training
Statistical Machine Translation Part III – Phrase-based SMT / Decoding
CSE P573 Applications of Artificial Intelligence Bayesian Learning
Training Tree Transducers
CSCI 5832 Natural Language Processing
Statistical Machine Translation
More about Posterior Distributions
Introduction to EM algorithm
CSE P573 Applications of Artificial Intelligence Bayesian Learning
Introduction to IBM Model 1&2 Alignment
KAIST CS LAB Oh Jong-Hoon
'Linear Hierarchical Models'
CSCI 5832 Natural Language Processing
Word-based SMT Ling 580 Fei Xia Week 1: 1/3/06.
Machine Translation and MT tools: Giza++ and Moses
EM for Inference in MV Data
Introduction to Reinforcement Learning and Q-Learning
Word Alignment David Kauchak CS159 – Fall 2019 Philipp Koehn
Topic Models in Text Processing
Machine Translation and MT tools: Giza++ and Moses
EM for Inference in MV Data
Valentin I. Spitkovsky April 16, 2010
EM Algorithm 主講人:虞台文.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
CS224N Section 2: EM Nate Chambers April 17, 2009
Presentation transcript:

Expectation-Maximization Algorithm M.B.Chandak

Principle-EM Algorithm Maximum Data Likelihood Estimation. This algorithm operates on parallel corpus. For example: English-Hindi aligned parallel corpus. The algorithm aims to find out MLE [Maximum likelihood estimation] of two words to be used for Machine Translation. In the following example: English and Hindi languages are used source and target language. Let Es-represents English and Hs-represents Hindi corpus.

Implementation: It is an iterative algorithm. The two steps are: Computing the probability of word alignment [M-step] and generating the expected count of these alignment [E-step] Initially: To all alignment uniform probability is assigned.

Example: Sentence: English-Hindi Green House The House हरा घर यह घर हरा घर यह घर Uniform probability table Green House The t(Green|हरा )=1/3 t(house|हरा )=1/3 t(the|हरा )=1/3 t(Green|घर)=1/3 t(house|घर)=1/3 t(the|घर)=1/3 t(Green|यह)=1/3 t(house|यह)=1/3 t(the|यह)=1/3

Example Compute P(a, e|h) by multiplying all “t” probabilities Green House The House हरा घर यह घर 1/3 * 1/3 = 1/9 1/3 * 1/3 = 1/9 1/3 * 1/3 = 1/9 1/3 * 1/3 = 1/9

Re-calculating values Green House हरा घर THE GREEN HOUSE यह हरा ½ घर The House यह घर THE GREEN HOUSE यह ½ हरा घर

Calculate “tcounts”=tc Green House The TOTAL tc(Green|हरा )=1/2 tc(house|हरा )=1/2 tc(the|हरा )=0 t(the|हरा )=1 tc(Green|घर)=1/2 tc(house|घर)=[1/2+1/2]=1 tc(the|घर)=1/2 t(the|घर)=2 tc(Green|यह)=0 tc(house|यह)=1/2 tc(the|यह)=1/2 t(the|यह)=1

M-Step t(Green|हरा )=1/2 t(house|हरा )=1/2 t(the|हरा )=0 TOTAL t(Green|हरा )=1/2/1 tc(house|हरा )=1/2/1 t(the|हरा )=0/1 t(the|हरा )=1 t(Green|घर)=1/2/2 t(house|घर)=[1/2+1/2]=1/2 t(the|घर)=1/2/2 t(the|घर)=2 t(Green|यह)=0/1 t(house|यह)=1/2/1 t(the|यह)=1/2/1 t(the|यह)=1 Green House The t(Green|हरा )=1/2 t(house|हरा )=1/2 t(the|हरा )=0 t(Green|घर)=1/4 t(house|घर)=1/2 t(the|घर)=1/4 t(Green|यह)=1/2 t(house|यह)=1/2 t(the|यह)=1/2

E-step: Part 2: Identifying higher probability phrase Compute P(a, e|h) by multiplying all “t” probabilities Green House The House हरा घर यह घर 1/2 * 1/2 = 1/4 1/2 * 1/2 = 1/4 1/4 * 1/2 = 1/8 1/4 * 1/2= 1/8

Further:: The process continues to iterate with E-step followed by M-step. The probability values are changed from 1/9 to 1/4 and 1/9 to 1/8.