Learning Markov Networks

Slides:



Advertisements
Similar presentations
Discriminative Training of Markov Logic Networks
Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.
Supervised Learning Recap
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)
Computer vision: models, learning and inference
Markov Networks.
Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.
Bayesian network inference
… Hidden Markov Models Markov assumption: Transition model:
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
1 On the Statistical Analysis of Dirty Pictures Julian Besag.
. Learning Bayesian networks Slides by Nir Friedman.
Expectation Maximization Algorithm
Expectation-Maximization
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Markov Logic And other SRL Approaches
Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.
Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington.
Lecture 2: Statistical learning primer for biologists
John Lafferty Andrew McCallum Fernando Pereira
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Statistical Estimation
An Introduction to Markov Logic Networks in Knowledge Bases
Lecture 18 Expectation Maximization
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Statistical Models for Automatic Speech Recognition
Markov Logic Networks for NLP CSCI-GA.2591
Multimodal Learning with Deep Boltzmann Machines
Expectation-Maximization
Logic for Artificial Intelligence
Statistical Learning Dong Liu Dept. EEIS, USTC.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Markov Networks.
Bayesian Models in Machine Learning
Markov Random Fields for Edge Classification
Learning Bayesian networks
Hierarchical POMDP Solutions
Statistical Models for Automatic Speech Recognition
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Graduate School of Information Sciences, Tohoku University
CONTEXT DEPENDENT CLASSIFICATION
CS 188: Artificial Intelligence Fall 2008
Algorithms of POS Tagging
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Markov Networks.
Graduate School of Information Sciences, Tohoku University
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Learning Bayesian networks
Presentation transcript:

Learning Markov Networks

Learning Markov Networks Learning parameters (weights) Generatively Discriminatively Learning with incomplete data Learning structure (features)

Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains

Discriminative Weight Learning (a.k.a. Conditional Random Fields) Maximize conditional likelihood of query (y) given evidence (x) Inference is easier, but still hard No. of true groundings of clause i in data Expected no. true groundings according to model

Voted Perceptron Approximate expected counts by counts in MAP state of y given x Originally proposed for training HMMs discriminatively wi ← 0 for t ← 1 to T do yMAP ← InferMAP(x) wi ← wi + η [counti(yData) – counti(yMAP)] return ∑t wi / T

Other Weight Learning Approaches Generative: Iterative scaling Discriminative: Max margin

Learning with Missing Data Gradient of likelihood is now difference of expectations Can use gradient descent or EM Exp. no. true groundings given observed data x: Observed y: Missing Expected no. true groundings given no data

Structure Learning Start with atomic features Greedily conjoin features to improve score Problem: Need to reestimate weights for each new candidate Approximation: Keep weights of previous features constant