Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group.

Slides:



Advertisements
Similar presentations
Part 2: Unsupervised Learning
Advertisements

Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Incentivize Crowd Labeling under Budget Constraint
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Statistical Machine Translation Part II: Word Alignments and EM Alexander Fraser ICL, U. Heidelberg CIS, LMU München Statistical Machine Translation.
Bayesian Learning of Non- Compositional Phrases with Synchronous Parsing Hao Zhang; Chris Quirk; Robert C. Moore; Daniel Gildea Z honghua li Mentor: Jun.
Statistical Machine Translation Part II – Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Biointelligence Laboratory, Seoul National University
Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.
Hidden Markov Model Special case of Dynamic Bayesian network Single (hidden) state variable Single (observed) observation variable Transition probability.
1 An Introduction to Statistical Machine Translation Dept. of CSIE, NCKU Yao-Sheng Chang Date:
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Clustering Evaluation April 29, Today Cluster Evaluation – Internal We don’t know anything about the desired labels – External We have some information.
Big Ideas in Cmput366. Search Blind Search Iterative deepening Heuristic Search A* Local and Stochastic Search Randomized algorithm Constraint satisfaction.
Language Modeling Frameworks for Information Retrieval John Lafferty School of Computer Science Carnegie Mellon University.
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Introduction to Machine Learning Approach Lecture 5.
Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Using Inactivity to Detect Unusual behavior Presenter : Siang Wang Advisor : Dr. Yen - Ting Chen Date : Motion and video Computing, WMVC.
Bayesian statistics Probabilities for everything.
Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang I2R SMT-Reading Group.
Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.
Robust Object Tracking by Hierarchical Association of Detection Responses Present by fakewen.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Lian Yan and David J. Miller 國立雲林科技大學 National Yunlin University of.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
by Ryan P. Adams, Iain Murray, and David J.C. MacKay (ICML 2009)
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
CSE 517 Natural Language Processing Winter 2015
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Occam’s Razor No Free Lunch Theorem Minimum.
Ch.9 Bayesian Models of Sensory Cue Integration (Mon) Summarized and Presented by J.W. Ha 1.
Recitation4 for BigData Jay Gu Feb MapReduce.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Personalization Services in CADAL Zhang yin Zhuang Yuting Wu Jiangqin College of Computer Science, Zhejiang University November 19,2006.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Computing with R & Bayesian Statistical Inference P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/11/2016: Lecture 02-1.
Statistical Machine Translation Part II: Word Alignments and EM
Today Cluster Evaluation Internal External
Latent variable discovery in classification models
Constrained Hidden Markov Models for Population-based Haplotyping
An Iterative Approach to Discriminative Structure Learning
CIS 700 Advanced Machine Learning for NLP Inference Applications
Variational Knowledge Graph Reasoning
More about Posterior Distributions
My Office Hours I will stay after class on both Monday and Wednesday, i.e., 1:30 Mon/Wed in MGH 030. Can everyone stay if they need to? Psych 548, Miyamoto,
CSCI 5822 Probabilistic Models of Human and Machine Learning
Important Distinctions in Learning BNs
Bayesian Inference for Mixture Language Models
Nonparametric Bayesian Texture Learning and Synthesis
Parameter Learning 2 Structure Learning 1: The good
Topic Models in Text Processing
CS639: Data Management for Data Science
Type Topic in here! Created by Educational Technology Network
Qiang Huo(*) and Chorkin Chan(**)
Dan Roth Department of Computer Science
Presentation transcript:

Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group

Outline Motivation and Introduction Posterior Regularization Application Implementation Some Related Frameworks

Motivation and Introduction Prior Knowledge We posses a wealth of prior knowledge about most NLP tasks.

Motivation and Introduction --Prior Knowledge

Motivation and Introduction Leveraging Prior Knowledge Possible approaches and their limitations

Motivation and Introduction --Limited Approach Bayesian Approach : Encode prior knowledge with a prior on parameters Limitation: Our prior knowledge is not about parameters! Parameters are difficult to interpret; hard to get desired effect.

Augmenting Model : Encode prior knowledge with additional variables and dependencies. Motivation and Introduction --Limited Approach limitation: may make exact inference intractable

Posterior Regularization A declarative language for specifying prior knowledge -- Constraint Features & Expectations Methods for learning with knowledge in this language -- EM style learning algorithm

Posterior Regularization

Original Objective :

Posterior Regularization EM style learning algorithm

Posterior Regularization Computing the Posterior Regularizer

Application Statistical Word Alignments IBM Model 1 and HMM

Application One feature for each source word m, that counts how many times it is aligned to a target word in the alignment y.

Application Define feature for each target-source position pair i,j. The feature takes the value zero in expectation if a word pair i,j is aligned with equal probability in both directions.

Application Learning Tractable Word Alignment Models with Complex Constraints CL10

Application Six language pairs both types of constraints improve over the HMM in terms of both precision and recall improve over the HMM by 10% to 15% S-HMM performs slightly better than B-HMM S-HMM performs better than B-HMM in 10 out of 12 cases improve over IBM M4 9 times out of 12

Application

Implementation

Some Related Frameworks

more info: many of my slides get from there Thanks!