CS224N Section 2: EM Nate Chambers April 17, 2009

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Clustering Beyond K-means
Expectation Maximization
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM Algorithm Likelihood, Mixture Models and Clustering.
Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??
MLVQ(EM 演算法 ) Speaker: 楊志民 Date: training Remove Dc_bias Feature extraction 411.C Silence.c Duration.c Breath.c Test data recognize Recognize.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Gaussian Mixture Models and Expectation Maximization.
Lecture 19: More EM Machine Learning April 15, 2010.
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
SUPPORT VECTOR MACHINES
Neural Machine Translation
Statistics 350 Lecture 3.
Matt Gormley Lecture 3 September 7, 2016
Lecture 18 Expectation Maximization
An Iterative Approach to Discriminative Structure Learning
Classification of unlabeled data:
Statistical NLP Spring 2011
Clustering Evaluation The EM Algorithm
Expectation-Maximization
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Statistical Machine Translation
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Introduction to EM algorithm
Learning Bayesian networks
Learning Markov Networks
ECE 5424: Introduction to Machine Learning
KAIST CS LAB Oh Jong-Hoon
Expectation-Maximization Algorithm
Ab Initio Profile HMM Generation
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Separating Style and Content with Bilinear Models Joshua B
Smoothing Mengqiu Wang 09 April 2010
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Separating Style and Content with Bilinear Models Joshua B
Pi-Chuan Chang, Paul Baumstarck April 11, 2008
Valentin I. Spitkovsky April 16, 2010
EM Algorithm and its Applications
EM Algorithm 主講人:虞台文.
Anish Johnson and Nate Chambers 10 April 2009
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Analysis of Large Graphs: Overlapping Communities
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

CS224N Section 2: EM Nate Chambers April 17, 2009 (Thanks to Bill MacCartney, Jenny Finkel, and Sushant Prakash for these materials)

Outline for today EM review EM examples NLP example Spreadsheet MT PA2 EM Alignment

EM Review Observed data -- x Model of how data is generated --  Point cloud, sentences, feature vectors Model of how data is generated --  Want to perform MLE estimation of :  = arg max L(x|) = arg max ∏i p(xi|)C(xi) But this problem is typically very hard, so we introduce unobserved data -- y Class labels, clusters, speakers of sentences Easier to perform:  = arg max L(x,y|)

EM Review Steps of EM: Initialize with some model parameters  E-step: use current  to calculate completions of unobserved data y: Compute y by p(y|x,) – soft counts! Use model parameters to fit the unobserved data M-step: use completions y to maximize model parameters: Compute arg max L(x,y|) Use completed data to fit model parameters