Learning Markov Networks

Slides:

Advertisements

Similar presentations

Discriminative Training of Markov Logic Networks

Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Supervised Learning Recap

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Computer vision: models, learning and inference

Markov Networks.

Learning Markov Network Structure with Decision Trees Daniel Lowd University of Oregon Jesse Davis Katholieke Universiteit Leuven Joint work with:

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations.

Bayesian network inference

… Hidden Markov Models Markov assumption: Transition model:

First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.

1 On the Statistical Analysis of Dirty Pictures Julian Besag.

. Learning Bayesian networks Slides by Nir Friedman.

Expectation Maximization Algorithm

Expectation-Maximization

Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.

. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Markov Logic And other SRL Approaches

Discriminative Models for Spoken Language Understanding Ye-Yi Wang, Alex Acero Microsoft Research, Redmond, Washington USA ICSLP 2006.

Maximum Entropy (ME) Maximum Entropy Markov Model (MEMM) Conditional Random Field (CRF)

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Training Conditional Random Fields using Virtual Evidence Boosting Lin Liao, Tanzeem Choudhury †, Dieter Fox, and Henry Kautz University of Washington.

Lecture 2: Statistical learning primer for biologists

John Lafferty Andrew McCallum Fernando Pereira

Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Learning in Bayesian Networks. Known Structure Complete Data Known Structure Incomplete Data Unknown Structure Complete Data Unknown Structure Incomplete.

HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Statistical Estimation

An Introduction to Markov Logic Networks in Knowledge Bases

Lecture 18 Expectation Maximization

7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.

Statistical Models for Automatic Speech Recognition

Markov Logic Networks for NLP CSCI-GA.2591

Multimodal Learning with Deep Boltzmann Machines

Expectation-Maximization

Logic for Artificial Intelligence

Statistical Learning Dong Liu Dept. EEIS, USTC.

Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis

Markov Networks.

Bayesian Models in Machine Learning

Markov Random Fields for Edge Classification

Learning Bayesian networks

Hierarchical POMDP Solutions

Statistical Models for Automatic Speech Recognition

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.

Graduate School of Information Sciences, Tohoku University

CONTEXT DEPENDENT CLASSIFICATION

CS 188: Artificial Intelligence Fall 2008

Algorithms of POS Tagging

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Markov Networks.

Graduate School of Information Sciences, Tohoku University

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.

Learning Bayesian networks

Presentation transcript:

Learning Markov Networks

Learning Markov Networks Learning parameters (weights) Generatively Discriminatively Learning with incomplete data Learning structure (features)

Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2nd order) No local maxima Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains

Discriminative Weight Learning (a.k.a. Conditional Random Fields) Maximize conditional likelihood of query (y) given evidence (x) Inference is easier, but still hard No. of true groundings of clause i in data Expected no. true groundings according to model

Voted Perceptron Approximate expected counts by counts in MAP state of y given x Originally proposed for training HMMs discriminatively wi ← 0 for t ← 1 to T do yMAP ← InferMAP(x) wi ← wi + η [counti(yData) – counti(yMAP)] return ∑t wi / T

Other Weight Learning Approaches Generative: Iterative scaling Discriminative: Max margin

Learning with Missing Data Gradient of likelihood is now difference of expectations Can use gradient descent or EM Exp. no. true groundings given observed data x: Observed y: Missing Expected no. true groundings given no data

Structure Learning Start with atomic features Greedily conjoin features to improve score Problem: Need to reestimate weights for each new candidate Approximation: Keep weights of previous features constant