Valentin I. Spitkovsky April 16, 2010

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Clustering Beyond K-means

The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

MLVQ(EM 演算法 ) Speaker: 楊志民 Date: training Remove Dc_bias Feature extraction 411.C Silence.c Duration.c Breath.c Test data recognize Recognize.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.

Gaussian Mixture Models and Expectation Maximization.

Natural Language Processing Expectation Maximization.

Lecture 19: More EM Machine Learning April 15, 2010.

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Lecture 17 Gaussian Mixture Models and Expectation Maximization

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,

NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.

Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart

Information Retrieval Search Engine Technology (8) Prof. Dragomir R. Radev.

For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

SUPPORT VECTOR MACHINES

Neural Machine Translation

Statistical Machine Translation Part II: Word Alignments and EM

Matt Gormley Lecture 3 September 7, 2016

Coarse-grained Word Sense Disambiguation

Lecture 18 Expectation Maximization

Master of Ceremonies (MC)

Clustering Evaluation The EM Algorithm

Today (2/11/16) Learning objectives (Sections 5.1 and 5.2):

Statistical Machine Translation

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Learning Bayesian networks

Learning Markov Networks

ECE 5424: Introduction to Machine Learning

KAIST CS LAB Oh Jong-Hoon

10701 / Machine Learning Today: - Cross validation,

Expectation-Maximization Algorithm

CS5112: Algorithms and Data Structures for Applications

10701 Recitation Pengtao Xie

Ab Initio Profile HMM Generation

Separating Style and Content with Bilinear Models Joshua B

Presented by Wanxue Dong

Smoothing Mengqiu Wang 09 April 2010

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Separating Style and Content with Bilinear Models Joshua B

Pi-Chuan Chang, Paul Baumstarck April 11, 2008

Type Topic in here! Created by Educational Technology Network

EM Algorithm and its Applications

EM Algorithm 主講人：虞台文.

Statistical Machine Translation Part VI – Phrase-based Decoding

Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi

Anish Johnson and Nate Chambers 10 April 2009

Clustering (2) & EM algorithm

CS224N Section 2: EM Nate Chambers April 17, 2009

Analysis of Large Graphs: Overlapping Communities

CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.

Presentation transcript:

Valentin I. Spitkovsky April 16, 2010 CS224N Section 2: EM Valentin I. Spitkovsky April 16, 2010 (Thanks to Nate Chambers, Bill MacCartney, Jenny Finkel, and Sushant Prakash for these materials!)

Outline for today Interactive Session!! EM review (just a few slides) EM examples Weights for interpolating language models (PA1) Speaker identification (related; another spreadsheet) Word alignment for machine translation (PA2)

EM Review Observed data -- x Model of how data is generated --  Point cloud, sentences, feature vectors Model of how data is generated --  Want to perform MLE estimation of :  = arg max L(x|) = arg max ∏i p(xi|)C(xi) But this problem is typically very hard, so we introduce unobserved data -- y Class labels, clusters, speakers of sentences Easier to perform:  = arg max L(x,y|)

EM Review Steps of EM: Initialize with some model parameters  E-step: use current  to complete the unobserved data y: Weight y by p(y|x,) – soft counts! Use model parameters to fit the unobserved data M-step: use completions y to maximize model parameters: Compute arg max L(x,y|) Use completed data to fit model parameters