INTRODUCTION TO Machine Learning 2nd Edition

Slides:



Advertisements
Similar presentations
Introduction to Graphical Models Brookes Vision Lab Reading Group.
Advertisements

CS188: Computational Models of Human Behavior
Markov Networks Alan Ritter.
A Tutorial on Learning with Bayesian Networks
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
An Introduction to Variational Methods for Graphical Models.
Introduction of Probabilistic Reasoning and Bayesian Networks
Introduction to probability theory and graphical models Translational Neuroimaging Seminar on Bayesian Inference Spring 2013 Jakob Heinzle Translational.
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Bayes Nets. Bayes Nets Quick Intro Topic of much current research Models dependence/independence in probability distributions Graph based - aka “graphical.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
A Brief Introduction to Graphical Models
Probabilistic graphical models. Graphical models are a marriage between probability theory and graph theory (Michael Jordan, 1998) A compact representation.
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Reasoning in Uncertain Situations
Introduction to Bayesian Networks
Supervised Learning Approaches Bayesian Learning Neural Network Support Vector Machine Ensemble Methods Adapted from Lecture Notes of V. Kumar and E. Alpaydin.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
An Introduction to Variational Methods for Graphical Models
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
Lecture 2: Statistical learning primer for biologists
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Quiz 3: Mean: 9.2 Median: 9.75 Go over problem 1.
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
Machine Learning – Lecture 11
Pattern Recognition and Machine Learning
Introduction on Graphic Models
CHAPTER 3: BAYESIAN DECISION THEORY. Making Decision Under Uncertainty Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Artificial Intelligence Chapter 19 Reasoning with Uncertain Information Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Bayesian Decision Theory Introduction to Machine Learning (Chap 3), E. Alpaydin.
CS 2750: Machine Learning Review
CS 2750: Machine Learning Directed Graphical Models
CHAPTER 16: Graphical Models
CHAPTER 3: Bayesian Decision Theory
INTRODUCTION TO Machine Learning
Artificial Intelligence Chapter 19
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
An Introduction to Variational Methods for Graphical Models
Markov Networks.
CS 188: Artificial Intelligence
Biointelligence Lab School of Computer Sci. & Eng.
Graduate School of Information Sciences, Tohoku University
Class #16 – Tuesday, October 26
INTRODUCTION TO Machine Learning
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18
Chapter 14 February 26, 2004.
Presentation transcript:

INTRODUCTION TO Machine Learning 2nd Edition Lecture Slides for INTRODUCTION TO Machine Learning 2nd Edition ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e

CHAPTER 16: Graphical Models

Graphical Models Aka Bayesian networks, probabilistic networks Nodes are hypotheses (random vars) and the probabilities corresponds to our belief in the truth of the hypothesis Arcs are direct influences between hypotheses The structure is represented as a directed acyclic graph (DAG) The parameters are the conditional probabilities in the arcs (Pearl, 1988, 2000; Jensen, 1996; Lauritzen, 1996) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causes and Bayes’ Rule Diagnostic inference: Knowing that the grass is wet, what is the probability that rain is the cause? diagnostic causal Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Conditional Independence X and Y are independent if P(X,Y)=P(X)P(Y) X and Y are conditionally independent given Z if P(X,Y|Z)=P(X|Z)P(Y|Z) or P(X|Y,Z)=P(X|Z) Three canonical cases: Head-to-tail, Tail-to-tail, head-to-head Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Case 1: Head-to-Head P(X,Y,Z)=P(X)P(Y|X)P(Z|Y) P(W|C)=P(W|R)P(R|C)+P(W|~R)P(~R|C) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Case 2: Tail-to-Tail P(X,Y,Z)=P(X)P(Y|X)P(Z|X) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Case 3: Head-to-Head P(X,Y,Z)=P(X)P(Y)P(Z|X,Y) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causal vs Diagnostic Inference Causal inference: If the sprinkler is on, what is the probability that the grass is wet? P(W|S) = P(W|R,S) P(R|S) + P(W|~R,S) P(~R|S) = P(W|R,S) P(R) + P(W|~R,S) P(~R) = 0.95 0.4 + 0.9 0.6 = 0.92 Diagnostic inference: If the grass is wet, what is the probability that the sprinkler is on? P(S|W) = 0.35 > 0.2 P(S) P(S|R,W) = 0.21 Explaining away: Knowing that it has rained decreases the probability that the sprinkler is on. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Causes Causal inference: P(W|C) = P(W|R,S) P(R,S|C) + and use the fact that P(R,S|C) = P(R|C) P(S|C) Diagnostic: P(C|W ) = ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Exploiting the Local Structure P (F | C) = ? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Classification Bayes’ rule inverts the arc: diagnostic P (C | x ) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Naive Bayes’ Classifier Given C, xj are independent: p(x|C) = p(x1|C) p(x2|C) ... p(xd|C) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Hidden Markov Model as a Graphical Model Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Linear Regression Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

d-Separation A path from node A to node B is blocked if The directions of edges on the path meet head-to-tail (case 1) or tail-to-tail (case 2) and the node is in C, or The directions of edges meet head-to-head (case 3) and neither that node nor any of its descendants is in C. If all paths are blocked, A and B are d-separated (conditionally independent) given C. BCDF is blocked given C. BEFG is blocked by F. BEFD is blocked unless F (or G) is given. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Belief Propagation (Pearl, 1988) Chain: Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Trees Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Polytrees How can we model P(X|U1,U2,...,Uk) cheaply? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Junction Trees If X does not separate E+ and E-, we convert it into a junction tree and then apply the polytree algorithm Tree of moralized, clique nodes Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Undirected Graphs: Markov Random Fields In a Markov random field, dependencies are symmetric, for example, pixels in an image In an undirected graph, A and B are independent if removing C makes them unconnected. Potential function yc(Xc) shows how favorable is the particular configuration X over the clique C The joint is defined in terms of the clique potentials Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Factor Graphs Define new factor nodes and write the joint in terms of them Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Learning a Graphical Model Learning the conditional probabilities, either as tables (for discrete case with small number of parents), or as parametric functions Learning the structure of the graph: Doing a state-space search over a score function that uses both goodness of fit to data and some measure of complexity Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

Influence Diagrams decision node utility node chance node Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)