CS224N Section 2: EM Nate Chambers April 17, 2009

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.

Clustering Beyond K-means

Expectation Maximization

The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.

1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:

The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Sequence labeling and beam search LING 572 Fei Xia 2/15/07.

Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9

EM Algorithm Likelihood, Mixture Models and Clustering.

Parameter estimate in IBM Models: Ling 572 Fei Xia Week ??

MLVQ(EM 演算法 ) Speaker: 楊志民 Date: training Remove Dc_bias Feature extraction 411.C Silence.c Duration.c Breath.c Test data recognize Recognize.

EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Gaussian Mixture Models and Expectation Maximization.

Lecture 19: More EM Machine Learning April 15, 2010.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,

Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.

SUPPORT VECTOR MACHINES

Neural Machine Translation

Statistics 350 Lecture 3.

Matt Gormley Lecture 3 September 7, 2016

Lecture 18 Expectation Maximization

An Iterative Approach to Discriminative Structure Learning

Classification of unlabeled data:

Statistical NLP Spring 2011

Clustering Evaluation The EM Algorithm

Expectation-Maximization

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Statistical Machine Translation

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Introduction to EM algorithm

Learning Bayesian networks

Learning Markov Networks

ECE 5424: Introduction to Machine Learning

KAIST CS LAB Oh Jong-Hoon

Expectation-Maximization Algorithm

Ab Initio Profile HMM Generation

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Separating Style and Content with Bilinear Models Joshua B

Smoothing Mengqiu Wang 09 April 2010

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Separating Style and Content with Bilinear Models Joshua B

Pi-Chuan Chang, Paul Baumstarck April 11, 2008

Valentin I. Spitkovsky April 16, 2010

EM Algorithm and its Applications

EM Algorithm 主講人：虞台文.

Anish Johnson and Nate Chambers 10 April 2009

Clustering (2) & EM algorithm

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Analysis of Large Graphs: Overlapping Communities

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

CS224N Section 2: EM Nate Chambers April 17, 2009 (Thanks to Bill MacCartney, Jenny Finkel, and Sushant Prakash for these materials)

Outline for today EM review EM examples NLP example Spreadsheet MT PA2 EM Alignment

EM Review Observed data -- x Model of how data is generated --  Point cloud, sentences, feature vectors Model of how data is generated --  Want to perform MLE estimation of :  = arg max L(x|) = arg max ∏i p(xi|)C(xi) But this problem is typically very hard, so we introduce unobserved data -- y Class labels, clusters, speakers of sentences Easier to perform:  = arg max L(x,y|)

EM Review Steps of EM: Initialize with some model parameters  E-step: use current  to calculate completions of unobserved data y: Compute y by p(y|x,) – soft counts! Use model parameters to fit the unobserved data M-step: use completions y to maximize model parameters: Compute arg max L(x,y|) Use completed data to fit model parameters