LAC group, 16/06/2011. So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural.

Slides:



Advertisements
Similar presentations
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Advertisements

Markov Networks Alan Ritter.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Chapter 4 Probability and Probability Distributions
Identifying Conditional Independencies in Bayes Nets Lecture 4.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Causal and Bayesian Network (Chapter 2) Book: Bayesian Networks and Decision Graphs Author: Finn V. Jensen, Thomas D. Nielsen CSE 655 Probabilistic Reasoning.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Learning with Bayesian Networks David Heckerman Presented by Colin Rickert.
Bayesian networks Chapter 14 Section 1 – 2.
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Approximate Inference 2: Monte Carlo Markov Chain
Quiz 4: Mean: 7.0/8.0 (= 88%) Median: 7.5/8.0 (= 94%)
Bayes Net Perspectives on Causation and Causal Inference
A Brief Introduction to Graphical Models
Bayesian Belief Network Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow.
Presenter : Kuang-Jui Hsu Date : 2011/5/23(Tues.).
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
A Markov Random Field Model for Term Dependencies Donald Metzler W. Bruce Croft Present by Chia-Hao Lee.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
Discrete Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4)
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
T06-02.S - 1 T06-02.S Standard Normal Distribution Graphical Purpose Allows the analyst to analyze the Standard Normal Probability Distribution. Probability.
Presented by Jian-Shiun Tzeng 5/7/2009 Conditional Random Fields: An Introduction Hanna M. Wallach University of Pennsylvania CIS Technical Report MS-CIS
Conditional Probability Distributions Eran Segal Weizmann Institute.
An Introduction to Variational Methods for Graphical Models
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Arc Consistency CPSC 322 – CSP 3 Textbook § 4.5 February 2, 2011.
CPSC 422, Lecture 17Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17 Oct, 19, 2015 Slide Sources D. Koller, Stanford CS - Probabilistic.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Daphne Koller Markov Networks General Gibbs Distribution Probabilistic Graphical Models Representation.
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
PREDICATES AND QUANTIFIERS COSC-1321 Discrete Structures 1.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Markov Random Fields in Vision
Learning Deep Generative Models by Ruslan Salakhutdinov
CHAPTER 16: Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Read R&N Ch Next lecture: Read R&N
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Read R&N Ch Next lecture: Read R&N
Representing Uncertainty
Bayesian Models in Machine Learning
General Gibbs Distribution
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 17
Pairwise Markov Networks
General Gibbs Distribution
Markov Random Fields Presented by: Vladan Radosavljevic.
Graduate School of Information Sciences, Tohoku University
Bayesian networks Chapter 14 Section 1 – 2.
Pairwise Markov Networks
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

LAC group, 16/06/2011

So far...  Directed graphical models  Bayesian Networks Useful because both the structure and the parameters provide a natural representation for many types of real-world domains.

This chapter...  Undirected graphical models Useful in modelling phenomena where we cannot determine the directionality of the interaction between the variables. Offer a different, simpler perspective on directed models (both independence structure & inference task)

This chapter...  Introduce a framework that allows both directed and undirected edges  Note: some of the results in this chapter require that we restrict attention to distribution over discrete state spaces.  Discrete vs. continuous = boolean or real numbers e.g

The 4 students example (The misconception example sec , ex.3.8)  4 students who get together in pairs to work on their homework for a class. The pairs that meet are shown via the edges (lines) of this undirected graph :  A : Alice  B : Bobby  C : Charles  D : Debbie A D B C

The 4 students example We want to model the following distribution: 1) A is independent of C given B and D 2) B is independent of D given A and C

The 4 students example PROBLEM 1: If we try to model these on a Bayesian network, we will be in trouble:  Any bayesian network I-map of such a distribution will have extraneous edges  At least one of the desired independence statements will not be captured (cont’d)

The 4 students example (cont’d)  Any bayesian will require from us to describe the directionality of the influence Also:  Interactions look symmetrical and we would like to model this somehow, without representing a direction of influence.

The 4 students example SOLUTION 1: Undirected graph = (here) Markov network structure  Nodes (circles) represent variables  Edges (lines) represent a notion of direct probabilistic interaction between the neighbouring variables, not mediated by any other variable in the network. A D B C

The 4 students example PROBLEM 2:  How to parameterise this undirected graph?  CPD (conditional probability distribution) not useful, as the interaction is not directed  We would like to capture the affinities between the related variables e.g. Alice and Bobby are more likely to agree than disagree A D B C

The 4 students example SOLUTION 2:  Associate A and B with a general purpose function : factor

The 4 students example  Here we focus only on non-negative factors. Factor: Let D be a set of random variables. We define a factor φ to be a function from Val(D) to R. A factor is non- negative if all its entries are non-negative. Scope: The set of variables D is called the scope of the factor and is denoted as Scope[φ].

The 4 students example  Let’s calculate the factor of A and B i.e. the fact that Alice and Bob are more likely to agree than disagree: φ 1 (A,B) : Val(A,B) to R + The value associated with a particular assignment a,b denotes the affinity between the two values: the higher the value of φ 1 (A,B) the more compatible the two values are

The 4 students example  Fig 4.1/a shows one possible compatibility factor for A and B  Not normalised (see partial function later on how to do this)  0: right, 1:wrong/has the misconception φ 1 (A,B) a0a0 b0b0 30 a0a0 b1b1 5 a1a1 b0b0 1 a1a1 b1b1 10 0: right, 1:wrong/has the misconception

The 4 students example  φ 1 (A,B) asserts that:  it is more likely that Alice and Bob agree φ 1 (a 0, b 0 ), φ 1 (a 1, b 1 ) - they are more likely to be either both wrong or both right  If they disagree, Alice is more likely to be right (φ 1 (a 0, b 1 )) than Bob (φ 1 (a 1, b 0 )) φ 1 (A,B) a0a0 b0b0 30 a0a0 b1b1 5 a1a1 b0b0 1 a1a1 b1b1 10 0: right, 1:wrong/has the misconception

The 4 students example  φ 3 (C,D) asserts that:  Charles and Debbie argue all the time and they will end up disagreeing any way : φ 3 (c 0, d 1 ) and φ 3 (c 1, d 0 ) φ 3 (C,D) c0c0 d0d0 1 c0c0 d1d1 100 c1c1 d0d0 c1c1 d1d1 1 0: right, 1:wrong/has the misconception

The 4 students example So far:  defined the local interactions between variables/nodes/circles Next step:  Define a global model : need to combine these interactions = multiply them as with a Bayesian network

The 4 students example A possible GLOBAL MODEL: P(a,b,c,d) = φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) PROBLEM: Nothing guarantees that the result is a normalised distribution (see fig. 4.2 middle column)

The 4 students example SOLUTION Take the product of the local factors and normalise it: P(a,b,c,d) = 1/Z ∙ φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) Where Z= ∑ φ 1 (a, b) ∙ φ 2 (b, c) ∙ φ 3 (c, d) ∙ φ 4 (d, a) Z is a normalising constant known as partition function : partition as in markov random field in statistical physics; function, as Z is a function of the parameters [important for machine learning]

The 4 students example  See figure 4.2 for the calculations of the joint distribution  Calculate the partition function of a 1,b 1,c 0,d 1

The 4 students example  We can use the partition function/joint probability to answer questions like:  How likely is Bob to have a misconception?  How likely is Bob to have the misconception, given that Charles doesn’t?

The 4 students example  How likely is Bob to have the misconception? P(b 1 ) ≈ P(b 0 ) ≈ Bob is 26% less ?? likely to have the misconception

The 4 students example  How likely is Bob to have the misconception, given that Charles doesn’t? P(b 1 |c 0 ) ≈ 0.06

The 4 students example Advantages of this approach:  Allows great flexibility in representing interactions between variables.  We can change the nature of interaction between A and B by simply modifying the entries in the factor without caring about normalisation constraints and the interaction of other factors

The 4 students example  Tight connection between factorisation of the distribution and its independence properties:  Factorisation:

The 4 students example  Using the formula in 3) we can decompose the distribution in several ways e.g. P(A,B,C,D) = [1/Z ∙ φ 1 (A, B) ∙ φ 2 (B, C)] ∙ φ 3 (C, D) ∙ φ 4 (A, D) and infer that