Pattern Recognition and Machine Learning

Slides:



Advertisements
Similar presentations
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Discrete Optimization in Computer Vision Nikos Komodakis Ecole des Ponts ParisTech, LIGM Traitement de l’information et vision artificielle.
Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Graphical models: approximate inference and learning CA6b, lecture 5.
Hidden Markov Models M. Vijay Venkatesh. Outline Introduction Graphical Model Parameterization Inference Summary.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Machine Learning CUNY Graduate Center Lecture 6: Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Bayesian network inference
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Conditional Random Fields
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.
Computer vision: models, learning and inference
CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by B.-H. Kim Biointelligence Laboratory, Seoul National.
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Introduction to Bayesian Networks
机器学习 陈昱 北京大学计算机科学技术研究所 信息安全工程研究中心. 课程基本信息  主讲教师:陈昱 Tel :  助教:程再兴, Tel :  课程网页:
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
CSC2535 Spring 2011 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
CS Statistical Machine learning Lecture 24
An Introduction to Variational Methods for Graphical Models
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Lecture 2: Statistical learning primer for biologists
CIAR Summer School Tutorial Lecture 1b Sigmoid Belief Nets Geoffrey Hinton.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Machine Learning – Lecture 18
Perceptual and Sensory Augmented Computing Machine Learning, Summer’09 Machine Learning – Lecture 13 Exact Inference & Belief Propagation Bastian.
Today Graphical Models Representing conditional dependence graphically
Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
Pattern Recognition and Machine Learning
Prof. Adriana Kovashka University of Pittsburgh April 4, 2017
Markov Networks.
CSCI 5822 Probabilistic Models of Human and Machine Learning
Markov Random Fields Presented by: Vladan Radosavljevic.
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Lecture 3: Exact Inference in GMs
Presentation transcript:

Pattern Recognition and Machine Learning Chapter 8: graphical models

Bayesian Networks Directed Acyclic Graph (DAG)

Bayesian Networks General Factorization

Bayesian Curve Fitting (1) Polynomial

Bayesian Curve Fitting (2) Plate

Bayesian Curve Fitting (3) Input variables and explicit hyperparameters

Bayesian Curve Fitting —Learning Condition on data

Bayesian Curve Fitting —Prediction Predictive distribution: where

Generative Models Causal process for generating images

Discrete Variables (1) General joint distribution: K 2 { 1 parameters Independent joint distribution: 2(K { 1) parameters

Discrete Variables (2) General joint distribution over M variables: KM { 1 parameters M -node Markov chain: K { 1 + (M { 1) K(K { 1) parameters

Discrete Variables: Bayesian Parameters (1)

Discrete Variables: Bayesian Parameters (2) Shared prior

Parameterized Conditional Distributions If are discrete, K-state variables, in general has O(K M) parameters. The parameterized form requires only M + 1 parameters

Linear-Gaussian Models Directed Graph Vector-valued Gaussian Nodes Each node is Gaussian, the mean is a linear function of the parents.

Conditional Independence a is independent of b given c Equivalently Notation

Conditional Independence: Example 1

Conditional Independence: Example 1

Conditional Independence: Example 2

Conditional Independence: Example 2

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c unobserved.

Conditional Independence: Example 3 Note: this is the opposite of Example 1, with c observed.

“Am I out of fuel?” B = Battery (0=flat, 1=fully charged) and hence B = Battery (0=flat, 1=fully charged) F = Fuel Tank (0=empty, 1=full) G = Fuel Gauge Reading (0=empty, 1=full)

“Am I out of fuel?” Probability of an empty tank increased by observing G = 0.

“Am I out of fuel?” Probability of an empty tank reduced by observing B = 0. This referred to as “explaining away”.

D-separation A, B, and C are non-intersecting subsets of nodes in a directed graph. A path from A to B is blocked if it contains a node such that either A non-convergent (divergent/serial) node is in C, or A convergent node or descendants is not in C. If A is d-separated from B by C, the joint distribution over all variables in the graph satisfies . C includes a non-convergent node or no convergent nodes (plus descendants)

D-separation: Example

D-separation: I.I.D. Data

Directed Graphs as Distribution Filters

The Markov Blanket Factors independent of xi cancel between numerator and denominator.

Markov Random Fields Markov Blanket

Cliques and Maximal Cliques

Joint Distribution where is the potential over clique C and is the normalization coefficient; note: M K-state variables  KM terms in Z. Energies and the Boltzmann distribution

Illustration: Image De-Noising (1) Original Image Noisy Image

Illustration: Image De-Noising (2)

Illustration: Image De-Noising (3) Noisy Image Restored Image (ICM)

Illustration: Image De-Noising (4) Restored Image (ICM) Restored Image (Graph cuts)

Converting Directed to Undirected Graphs (1)

Converting Directed to Undirected Graphs (2) Additional links

Directed vs. Undirected Graphs (1)

Directed vs. Undirected Graphs (2)

Inference in Graphical Models

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain

Inference on a Chain To compute local marginals: Compute and store all forward messages, . Compute and store all backward messages, . Compute Z at any node xm Compute for all variables required.

Trees Undirected Tree Directed Tree Polytree

Factor Graphs

Factor Graphs from Directed Graphs

Factor Graphs from Undirected Graphs

The Sum-Product Algorithm (1) Objective: to obtain an efficient, exact inference algorithm for finding marginals; in situations where several marginals are required, to allow computations to be shared efficiently. Key idea: Distributive Law

The Sum-Product Algorithm (2)

The Sum-Product Algorithm (3)

The Sum-Product Algorithm (4)

The Sum-Product Algorithm (5)

The Sum-Product Algorithm (6)

The Sum-Product Algorithm (7) Initialization

The Sum-Product Algorithm (8) To compute local marginals: Pick an arbitrary node as root Compute and propagate messages from the leaf nodes to the root, storing received messages at every node. Compute and propagate messages from the root to the leaf nodes, storing received messages at every node. Compute the product of received messages at each node for which the marginal is required, and normalize if necessary.

Sum-Product: Example (1)

Sum-Product: Example (2)

Sum-Product: Example (3)

Sum-Product: Example (4)

The Max-Sum Algorithm (1) Objective: an efficient algorithm for finding the value xmax that maximises p(x); the value of p(xmax). In general, maximum marginals  joint maximum.

The Max-Sum Algorithm (2) Maximizing over a chain (max-product)

The Max-Sum Algorithm (3) Generalizes to tree-structured factor graph maximizing as close to the leaf nodes as possible

The Max-Sum Algorithm (4) Max-Product  Max-Sum For numerical reasons, use Again, use distributive law

The Max-Sum Algorithm (5) Initialization (leaf nodes) Recursion

The Max-Sum Algorithm (6) Termination (root node) Back-track, for all nodes i with l factor nodes to the root (l=0)

The Max-Sum Algorithm (7) Example: Markov chain

The Junction Tree Algorithm Exact inference on general graphs. Works by turning the initial graph into a junction tree and then running a sum-product-like algorithm. Intractable on graphs with large cliques.

Loopy Belief Propagation Sum-Product on general graphs. Initial unit messages passed across all links, after which messages are passed around until convergence (not guaranteed!). Approximate but tractable for large graphs. Sometime works well, sometimes not at all.