Bayesian Networks Alan Ritter.

Slides:



Advertisements
Similar presentations
CS188: Computational Models of Human Behavior
Advertisements

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Markov Networks Alan Ritter.
Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
Graphical Models BRML Chapter 4 1. the zoo of graphical models Markov networks Belief networks Chain graphs (Belief and Markov ) Factor graphs =>they.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
Exact Inference in Bayes Nets
Dynamic Bayesian Networks (DBNs)
Identifying Conditional Independencies in Bayes Nets Lecture 4.
For Monday Read chapter 18, sections 1-2 Homework: –Chapter 14, exercise 8 a-d.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Bayesian Network Representation Continued
Today Logistic Regression Decision Trees Redux Graphical Models
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Soft Computing Lecture 17 Introduction to probabilistic reasoning. Bayesian nets. Markov models.
第十讲 概率图模型导论 Chapter 10 Introduction to Probabilistic Graphical Models
Perceptual and Sensory Augmented Computing Machine Learning, Summer’11 Machine Learning – Lecture 13 Introduction to Graphical Models Bastian.
For Wednesday Read Chapter 11, sections 1-2 Program 2 due.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
Introduction to Bayesian Networks
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Announcements Project 4: Ghostbusters Homework 7
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Lecture 2: Statistical learning primer for biologists
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Christopher M. Bishop, Pattern Recognition and Machine Learning 1.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Pattern Recognition and Machine Learning
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
Introduction on Graphic Models
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Today Graphical Models Representing conditional dependence graphically
Markov Random Fields in Vision
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Today.
Read R&N Ch Next lecture: Read R&N
Bell & Coins Example Coin1 Bell Coin2
Markov Networks.
Read R&N Ch Next lecture: Read R&N
Pattern Recognition and Image Analysis
Markov Random Fields Presented by: Vladan Radosavljevic.
CS 188: Artificial Intelligence Spring 2007
Read R&N Ch Next lecture: Read R&N
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Bayesian Networks Alan Ritter

Problem: Non-IID Data Most real-world data is not IID (like coin flips) Multiple correlated variables Examples: Pixels in an image Words in a document Genes in a microarray We saw one example of how to deal with this Markov Models + Hidden Markov Models

Questions How to compactly represent ? How can we use this distribution to infer one set of variables given another? How can we learn the parameters with a reasonable amount of data?

The Chain Rule of Probability Problem: this distribution has 2^(N-1) parameters Can represent any joint distribution this way Using any ordering of the variables…

Conditional Independence This is the key to representing large joint distributions X and Y are conditionally independent given Z if and only if the conditional joint can be written as a product of the conditional marginals

(non-hidden) Markov Models “The future is independent of the past given the present”

Graphical Models First order Markov assumption is useful for 1d sequence data Sequences of words in a sentence or document Q: What about 2d images, 3d video Or in general arbitrary collections of variables Gene pathways, etc…

Graphical Models A way to represent a joint distribution by making conditional independence assumptions Nodes represent variables (lack of) edges represent conditional independence assumptions Better name: “conditional independence diagrams” Doesn’t sound as cool

Graph Terminology Graph (V,E) consists of Child (for directed graph) A set of nodes or verticies V={1..V} A set of edges {(s,t) in V} Child (for directed graph) Ancestors (for directed graph) Decedents (for directed graph) Neighbors (for any graph) Cycle (Directed vs. undirected) Tree (no cycles) Clique / Maximal Clique

Directed Graphical Models Graphical Model whose graph is a DAG Directed acyclic graph No cycles! A.K.A. Bayesian Networks Nothing inherently Bayesian about them Just a way of defining conditional independences Just sounds cooler I guess…

Directed Graphical Models Key property: Nodes can be ordered so that parents come before children Topological ordering Can be constructed from any DAG Ordered Markov Property: Generalization of first-order Markov Property to general DAGs Node only depends on it’s parents (not other predecessors)

Example

Naïve Bayes (Same as Gaussian Mixture Model w/ Diagonal Covariance)

Markov Models First order Markov Model Second order Markov Model Hidden Markov Model

Example: medical Diagnosis The Alarm Network

Another medical diagnosis example: QMR network Diseases Symptoms

Probabilistic Inference Graphical Models provide a compact way to represent complex joint distributions Q: Given a joint distribution, what can we do with it? A: Main use = Probabilistic Inference Estimate unknown variables from known ones

Examples of Inference Predict the most likely cluster for X in R^n given a set of mixture components This is what you did in HW #1 Viterbi Algorithm, Forward/Backward (HMMs) Estimate words from speech signal Estimate parts of speech given sequence of words in a text

General Form of Inference We have: A correlated set of random variables Joint distribution: Assumption: parameters are known Partition variables into: Visible: Hidden: Goal: compute unknowns from knowns

General Form of Inference Condition data by clamping visible variables to observed values. Normalize by probability of evidence

Nuisance Variables Partition hidden variables into: Query Variables:

Inference vs. Learning Inference: Learning Compute Parameters are assumed to be known Learning Compute MAP estimate of the parameters

Bayesian Learning Parameters are treated as hidden variables no distinction between inference and learning Main distinction between inference and learning: # hidden variables grows with size of dataset # parameters is fixed

Conditional Independence Properties A is independent of B given C I(G) is the set of all such conditional independence assumptions encoded by G G is an I-map for P iff I(G) I(P) Where I(P) is the set of all CI statements that hold for P In other words: G doesn’t make any assertions that are not true about P

Conditional Independence Properties (cont) Note: fully connected graph is an I-map for all distributions G is a minimal I-map of P if: G is an I-map of P There is no G’ G which is an I-map of P Question: How to determine if ? Easy for undirected graphs (we’ll see later) Kind of complicated for DAGs (Bayesian Nets)

D-separation Definitions: An undirected path P is d-separated by a set of nodes E (containing evidence) iff at least one of the following conditions hold: P contains a chain s -> m -> t or s <- m <- t where m is evidence P contains a fork s <- m -> t where m is in the evidence P contains a v-structure s -> m <- t where m is not in the evidence, nor any descendent of m

D-seperation (cont) A set of nodes A is D-separated from a set of nodes B, if given a third set of nodes E iff each undirected path from every node in A to every node in B is d-seperated by E Finally, define the CI properties of a DAG as follows:

Bayes Ball Algorithm Simple way to check if A is d-separated from B given E Shade in all nodes in E Place “balls” in each node in A and let them “bounce around” according to some rules Note: balls can travel in either direction Check if any balls from A reach nodes in B

Bayes Ball Rules

Explaining Away (inter-causal reasoning) Example: Toss two coins and observe their sum

Boundary Conditions

Other Independence Properties Ordered Markov Property Directed local Markov property D separation (we saw this already) Easy to see: Less Obvious:

Markov Blanket Definition: Markov blanket in DAG is: The smallest set of nodes that renders a node t conditionally independent of all the other nodes in the graph. Markov blanket in DAG is: Parents Children Co-parents (other nodes that are also parents of the children)

Q: why are the co-parents in the Markov Blanket? All terms that do not involve x_t will cancel out between numerator and denominator