INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, 2014 Lecture.

Slides:



Advertisements
Similar presentations
Introduction to Graphical Models Brookes Vision Lab Reading Group.
Advertisements

CS188: Computational Models of Human Behavior
Markov Networks Alan Ritter.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
An introduction to Bayesian networks Stochastic Processes Course Hossein Amirkhani Spring 2011.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
CSC2535 Spring 2013 Lecture 1: Introduction to Machine Learning and Graphical Models Geoffrey Hinton.
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Bayesian Network By DengKe Dong. Key Points Today  Intro to Graphical Model  Conditional Independence  Intro to Bayesian Network  Reasoning BN: D-Separation.
Reasoning in Uncertain Situations
Introduction to Bayesian Networks
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
Bayesian Network By Zhang Liliang. Key Point Today Intro to Bayesian Network Usage of Bayesian Network Reasoning BN: D-separation.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
Bayesian Networks Aldi Kraja Division of Statistical Genomics.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
Lecture 2: Statistical learning primer for biologists
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Pattern Recognition and Machine Learning
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
INTRODUCTION TO Machine Learning 2nd Edition
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
INTRODUCTION TO Machine Learning
Artificial Intelligence Chapter 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
An Introduction to Variational Methods for Graphical Models
Markov Networks.
Biointelligence Lab School of Computer Sci. & Eng.
Class #16 – Tuesday, October 26
INTRODUCTION TO Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Chapter 14 February 26, 2004.
Presentation transcript:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture Slides for

CHAPTER 14: GRAPHICAL MODELS

Graphical Models 3  Aka Bayesian networks, probabilistic networks  Nodes are hypotheses (random vars) and the probabilities corresponds to our belief in the truth of the hypothesis  Arcs are direct influences between hypotheses  The structure is represented as a directed acyclic graph (DAG)  The parameters are the conditional probabilities in the arcs(Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996)

4 Causes and Bayes’ Rule Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? causal diagnostic

Conditional Independence 5  X and Y are independent if P(X,Y)=P(X)P(Y)  X and Y are conditionally independent given Z if P(X,Y|Z)=P(X|Z)P(Y|Z) or P(X|Y,Z)=P(X|Z)  Three canonical cases: Head-to-tail, Tail-to-tail, head-to-head

Case 1: Head-to-Head 6  P(X,Y,Z)=P(X)P(Y|X)P(Z|Y)  P(W|C)=P(W|R)P(R|C)+P(W|~R)P(~R|C)

Case 2: Tail-to-Tail 7  P(X,Y,Z)=P(X)P(Y|X)P(Z|X)

Case 3: Head-to-Head 8  P(X,Y,Z)=P(X)P(Y)P(Z|X,Y)

9 Causal vs Diagnostic Inference Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = = 0.92 Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S) P(S|R,W) = 0.21Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on.

10 Causes Causal inference: P(W|C) = P(W|R,S) P(R,S|C) + P(W|~R,S) P(~R,S|C) + P(W|R,~S) P(R,~S|C) + P(W|~R,~S) P(~R,~S|C) and use the fact that P(R,S|C) = P(R|C) P(S|C) Diagnostic: P(C|W ) = ?

11 Exploiting the Local Structure P (F | C) = ?

12 Classification diagnostic P (C|x ) Bayes’ rule inverts the arc:

13 Naive Bayes’ Classifier Given C, x j are independent: p(x|C) = p(x 1 |C) p(x 2 |C)... p(x d |C)

Linear Regression 14

d-Separation 15  A path from node A to node B is blocked if a) The directions of edges on the path meet head-to-tail (case 1) or tail-to-tail (case 2) and the node is in C, or b) The directions of edges meet head-to-head (case 3) and neither that node nor any of its descendants is in C.  If all paths are blocked, A and B are d-separated (conditionally independent) given C. BCDF is blocked given C. BEFG is blocked by F. BEFD is blocked unless F (or G) is given.

Belief Propagation (Pearl, 1988) 16  Chain:

Trees 17

Polytrees 18 How can we model P(X|U 1,U 2,...,U k ) cheaply?

Junction Trees 19  If X does not separate E + and E -, we convert it into a junction tree and then apply the polytree algorithm Tree of moralized, clique nodes

Undirected Graphs: Markov Random Fields 20  In a Markov random field, dependencies are symmetric, for example, pixels in an image  In an undirected graph, A and B are independent if removing C makes them unconnected.  Potential function  c (X c ) shows how favorable is the particular configuration X over the clique C  The joint is defined in terms of the clique potentials

Factor Graphs 21  Define new factor nodes and write the joint in terms of them

Learning a Graphical Model 22  Learning the conditional probabilities, either as tables (for discrete case with small number of parents), or as parametric functions  Learning the structure of the graph: Doing a state- space search over a score function that uses both goodness of fit to data and some measure of complexity

Influence Diagrams 23 chance node decision node utility node