Valentin I. Spitkovsky April 16, 2010

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Clustering Beyond K-means
The use of unlabeled data to improve supervised learning for text summarization MR Amini, P Gallinari (SIGIR 2002) Slides prepared by Jon Elsas for the.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
MLVQ(EM 演算法 ) Speaker: 楊志民 Date: training Remove Dc_bias Feature extraction 411.C Silence.c Duration.c Breath.c Test data recognize Recognize.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Gaussian Mixture Models and Expectation Maximization.
Natural Language Processing Expectation Maximization.
Lecture 19: More EM Machine Learning April 15, 2010.
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.
Lecture 17 Gaussian Mixture Models and Expectation Maximization
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Separating Style and Content with Bilinear Models Joshua B. Tenenbaum, William T. Freeman Computer Examples Barun Singh 25 Feb, 2002.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10, ,
NTNU Speech Lab Dirichlet Mixtures for Query Estimation in Information Retrieval Mark D. Smucker, David Kulp, James Allan Center for Intelligent Information.
Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Information Retrieval Search Engine Technology (8) Prof. Dragomir R. Radev.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
SUPPORT VECTOR MACHINES
Neural Machine Translation
Statistical Machine Translation Part II: Word Alignments and EM
Matt Gormley Lecture 3 September 7, 2016
Coarse-grained Word Sense Disambiguation
Lecture 18 Expectation Maximization
Master of Ceremonies (MC)
Clustering Evaluation The EM Algorithm
Today (2/11/16) Learning objectives (Sections 5.1 and 5.2):
Statistical Machine Translation
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Learning Bayesian networks
Learning Markov Networks
ECE 5424: Introduction to Machine Learning
KAIST CS LAB Oh Jong-Hoon
10701 / Machine Learning Today: - Cross validation,
Expectation-Maximization Algorithm
CS5112: Algorithms and Data Structures for Applications
10701 Recitation Pengtao Xie
Ab Initio Profile HMM Generation
Separating Style and Content with Bilinear Models Joshua B
Presented by Wanxue Dong
Smoothing Mengqiu Wang 09 April 2010
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Separating Style and Content with Bilinear Models Joshua B
Pi-Chuan Chang, Paul Baumstarck April 11, 2008
Type Topic in here! Created by Educational Technology Network
EM Algorithm and its Applications
EM Algorithm 主講人:虞台文.
Statistical Machine Translation Part VI – Phrase-based Decoding
Presented By: Sparsh Gupta Anmol Popli Hammad Abdullah Ayyubi
Anish Johnson and Nate Chambers 10 April 2009
Clustering (2) & EM algorithm
CS224N Section 2: EM Nate Chambers April 17, 2009
Analysis of Large Graphs: Overlapping Communities
CS224N Section 2: PA2 & EM Shrey Gupta January 21,2011.
Presentation transcript:

Valentin I. Spitkovsky April 16, 2010 CS224N Section 2: EM Valentin I. Spitkovsky April 16, 2010 (Thanks to Nate Chambers, Bill MacCartney, Jenny Finkel, and Sushant Prakash for these materials!)

Outline for today Interactive Session!! EM review (just a few slides) EM examples Weights for interpolating language models (PA1) Speaker identification (related; another spreadsheet) Word alignment for machine translation (PA2)

EM Review Observed data -- x Model of how data is generated --  Point cloud, sentences, feature vectors Model of how data is generated --  Want to perform MLE estimation of :  = arg max L(x|) = arg max ∏i p(xi|)C(xi) But this problem is typically very hard, so we introduce unobserved data -- y Class labels, clusters, speakers of sentences Easier to perform:  = arg max L(x,y|)

EM Review Steps of EM: Initialize with some model parameters  E-step: use current  to complete the unobserved data y: Weight y by p(y|x,) – soft counts! Use model parameters to fit the unobserved data M-step: use completions y to maximize model parameters: Compute arg max L(x,y|) Use completed data to fit model parameters