Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

Markov Networks Alan Ritter.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Deep Learning Bing-Chen Tsai 1/21.
CS590M 2008 Fall: Paper Presentation
Exact Inference in Bayes Nets
Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis Lecture)
Markov Networks.
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
11/16: After Sanity Test  Post-mortem  Project presentations in the last 2-3 classes  Start of Statistical Learning.
P(h i ) is called the hypothesis prior Nothing special about “learning” – just vanilla probabilistic inference.
Bayesian network inference
10/24  Exam on 10/26 (Lei Tang and Will Cushing to proctor)
Inference in Bayesian Nets
Global Approximate Inference Eran Segal Weizmann Institute.
Lecture 5: Learning models using EM
Conditional Random Fields
Constructing Belief Networks: Summary [[Decide on what sorts of queries you are interested in answering –This in turn dictates what factors to model in.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
10/22  Homework 3 returned; solutions posted  Homework 4 socket opened  Project 3 assigned  Mid-term on Wednesday  (Optional) Review session Tuesday.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.
Markov Logic Parag Singla Dept. of Computer Science University of Texas, Austin.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Markov Logic And other SRL Approaches
1 CS 391L: Machine Learning: Bayesian Learning: Beyond Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Statistical Modeling of Text (Can be seen as an application of probabilistic graphical models)
Markov Random Fields Probabilistic Models for Images
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Fields of Experts: A Framework for Learning Image Priors (Mon) Young Ki Baik, Computer Vision Lab.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
1 Markov Logic Stanley Kok Dept. of Computer Science & Eng. University of Washington Joint work with Pedro Domingos, Daniel Lowd, Hoifung Poon, Matt Richardson,
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Daphne Koller Overview Conditional Probability Queries Probabilistic Graphical Models Inference.
SA-1 University of Washington Department of Computer Science & Engineering Robotics and State Estimation Lab Dieter Fox Stephen Friedman, Lin Liao, Benson.
Bayesian Belief Propagation for Image Understanding David Rosenberg.
CSC Lecture 23: Sigmoid Belief Nets and the wake-sleep algorithm Geoffrey Hinton.
Probabilistic Reasoning Inference and Relational Bayesian Networks.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Markov Networks.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
CSCI 5822 Probabilistic Models of Human and Machine Learning
Learning Markov Networks
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Markov Networks.
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence Fall 2008
Expectation-Maximization & Belief Propagation
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Approximate Inference by Sampling
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website
Markov Networks.
Mean Field and Variational Methods Loopy Belief Propagation
Sequential Learning with Dependency Nets
Generalized Belief Propagation
Presentation transcript:

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Connection to MCMC:  MCMC requires sampling a node given its markov blanket  Need to use P(x|MB(x)). For Bayes nets MB(x) contains more nodes than are mentioned in the local distribution CPT(x)  For Markov nets,

A B C D Qn: What is the most likely configuration of A&B? Factor says a=b=0 But, marginal says a=0;b=1! Moral: Factors are not marginals! Although A,B would Like to agree, B&C Need to agree, C&D need to disagree And D&A need to agree.and the latter three have Higher weights! Okay, you convinced me that given any potentials we will have a consistent Joint. But given any joint, will there be a potentials I can provide?  Hammersley-Clifford theorem… We can have potentials on any cliques—not just the maximal ones. So, for example we can have a potential on A in addition to the other four pairwise potentials

Markov Networks Undirected graphical models Cancer CoughAsthma Smoking Potential functions defined over cliques SmokingCancer Ф(S,C) False 4.5 FalseTrue 4.5 TrueFalse 2.7 True 4.5

Log-Linear models for Markov Nets A B C D Factors are “functions” over their domains Log linear model consists of  Features f i (D i ) (functions over domains)  Weights w i for features s.t. Without loss of generality!

Markov Networks Undirected graphical models Log-linear model: Weight of Feature iFeature i Cancer CoughAsthma Smoking

Markov Nets vs. Bayes Nets PropertyMarkov NetsBayes Nets FormProd. potentials PotentialsArbitraryCond. probabilities CyclesAllowedForbidden Partition func.Z = ? globalZ = 1 local Indep. checkGraph separationD-separation Indep. props.Some InferenceMCMC, BP, etc.Convert to Markov

Inference in Markov Networks Goal: Compute marginals & conditionals of Exact inference is #P-complete Most BN inference approaches work for MNs too – Variable Elimination used factor multiplication—and should work without change.. Conditioning on Markov blanket is easy: Gibbs sampling exploits this

MCMC: Gibbs Sampling state ← random truth assignment for i ← 1 to num-samples do for each variable x sample x according to P(x|neighbors(x)) state ← state with new value of x P(F) ← fraction of states in which F is true

Other Inference Methods Many variations of MCMC Belief propagation (sum-product) Variational approximation Exact methods

Overview Motivation Foundational areas – Probabilistic inference – Statistical learning – Logical inference – Inductive logic programming Putting the pieces together Applications

Learning Markov Networks Learning parameters (weights) – Generatively – Discriminatively Learning structure (features) Easy Case: Assume complete data (If not: EM versions of algorithms)

Entanglement in log likelihood… abc

Learning for log-linear formulation Use gradient ascent Unimodal, because Hessian is Co-variance matrix over features What is the expected Value of the feature given the current parameterization of the network? Requires inference to answer (inference at every iteration— sort of like EM  )

Why should we spend so much time computing gradient? Given that gradient is being used only in doing the gradient ascent iteration, it might look as if we should just be able to approximate it in any which way – Afterall, we are going to take a step with some arbitrary step size anyway....But the thing to keep in mind is that the gradient is a vector. We are talking not just of magnitude but direction. A mistake in magnitude can change the direction of the vector and push the search into a completely wrong direction…

Generative Weight Learning Maximize likelihood or posterior probability Numerical optimization (gradient or 2 nd order) No local maxima Requires inference at each step (slow!) No. of times feature i is true in data Expected no. times feature i is true according to model

Alternative Objectives to maximize.. Since log-likelihood requires network inference to compute the derivative, we might want to focus on other objectives whose gradients are easier to compute (and which also – hopefully—have optima at the same parameter values). Two options: – Pseudo Likelihood – Contrastive Divergence Given a single data instance  log-likelihood is Log prob of data Log prob of all other possible data instances (w.r.t. current  Maximize the distance (“increase the divergence”) Pick a sample of typical other instances (need to sample from P  Run MCMC initializing with the data..) Compute likelihood of each possible data instance just using markov blanket (approximate chain rule)

Pseudo-Likelihood Likelihood of each variable given its neighbors in the data Does not require inference at each step Consistent estimator Widely used in vision, spatial statistics, etc. But PL parameters may not work well for long inference chains [Which can lead to disasterous results]

Discriminative Weight Learning Maximize conditional likelihood of query ( y ) given evidence ( x ) Approximate expected counts by counts in MAP state of y given x No. of true groundings of clause i in data Expected no. true groundings according to model

Structure Learning How to learn the structure of a Markov network? – … not too different from learning structure for a Bayes network: discrete search through space of possible graphs, trying to maximize data probability….