INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

Slides:

Advertisements

Similar presentations

CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.

Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.

Weakening the Causal Faithfulness Assumption

Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Dynamic Bayesian Networks (DBNs)

Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.

From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.

Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.

Bayesian Network Representation Continued

UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 580 Artificial Intelligence Ch.6 [P]: Reasoning Under Uncertainty Sections.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.

Bayesian Networks Alan Ritter.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Advanced Artificial Intelligence

CPS 270: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer.

Bayes Net Perspectives on Causation and Causal Inference

Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 

Causal learning and modeling David Danks CMU Philosophy & Psychology 2014 NASSLLI.

Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.

A Brief Introduction to Graphical Models

Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?

1 Monte Carlo Artificial Intelligence: Bayesian Networks.

Introduction to Bayesian Networks

Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Announcements Project 4: Ghostbusters Homework 7

Simultaneously Learning and Filtering Juan F. Mancilla-Caceres CS498EA - Fall 2011 Some slides from Connecting Learning and Logic, Eyal Amir 2006.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.

Lecture 2: Statistical learning primer for biologists

Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Pattern Recognition and Machine Learning

Introduction on Graphic Models

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.

Variable selection in Regression modelling Simon Thornley.

Slide 1 Directed Graphical Probabilistic Models: inference William W. Cohen Machine Learning Feb 2008.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.

CS 2750: Machine Learning Directed Graphical Models

ICS 280 Learning in Graphical Models

Read R&N Ch Next lecture: Read R&N

Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule

Markov Properties of Directed Acyclic Graphs

Read R&N Ch Next lecture: Read R&N

CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.

CAP 5636 – Advanced Artificial Intelligence

Class #19 – Tuesday, November 3

CS 188: Artificial Intelligence

CS 188: Artificial Intelligence Spring 2007

CS 188: Artificial Intelligence Spring 2006

Read R&N Ch Next lecture: Read R&N

CS 188: Artificial Intelligence Fall 2008

Presentation transcript:

INTERVENTIONS AND INFERENCE / REASONING

Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative component = joint probability distribution And so clear definitions for independence & association  Connect DAG & jpd with two assumptions: Markov: No edge ⇒ Independent given direct parents Faithfulness: Conditional independence ⇒ No edge

Three uses of causal models  Represent (and predict the effects of) interventions on variables  Causal models only, of course  Efficiently determine independencies  I.e., which variables are informationally relevant for which other ones?  Use those independencies to rapidly update beliefs in light of evidence

Representing interventions  Central intuition: When we intervene, we control the state of the target variable  And so the direct causes of the target variable no longer matter  But the target still has its usual effects Directly applying current to the light bulb ⇒ light switch doesn’t matter, but the plant still grows

Representing interventions  Formal implementation:  Add a variable representing the intervention, and make it a direct cause of the target  When the intervention is “active,” remove all other edges into the target  Leave intact all edges directed out of the target, even when the intervention is “active”

Representing interventions  Example: Light Switch Plant Growth Light Bulb

Representing interventions  Example:  Add a manipulation variable as a “cause” Light Switch Plant Growth Current Light Bulb

Representing interventions  Example:  Add a manipulation variable as a “cause” that does not matter when it is inactive Inactive Manipulation Light Switch Plant Growth Current Light Bulb Inactive

Representing interventions  Example:  Add a manipulation variable as a “cause” that does not matter when it is inactive  When it is active, Active Manipulation Light Switch Plant Growth Current Light Bulb Inactive Manipulation Light Switch Plant Growth Current Light Bulb Inactive

Representing interventions  Example:  Add a manipulation variable as a “cause” that does not matter when it is inactive  When it is active, break the incoming edges, but leave the outgoing edges Active Manipulation Light Switch Plant Growth Current Light Bulb Inactive Manipulation Light Switch Plant Growth Current Light Bulb Inactive

Representing interventions  Straightforward extension to more interesting types of interventions  Interventions away from current state  Multi-variate interventions  Etc.  Key: For all of these, the “intervention operator” takes a causal graphical model as input, and yields a causal graphical model as output  “Post-intervention CGM” is an ordinary CGM

Why randomize?  Standard scientific practice: randomize Treatment to find its Effects  E.g., don’t let people decide on their own whether to take the drug or placebo  What is the value of randomization?  Randomization is an intervention ⇒ All edges into T will be broken, including from any common causes of T and E! ⇒ If T E, then we must have: T → E

Why randomize?  Graphically, TreatmentEffect ?

Why randomize?  Graphically, Treatment Unobserved Factors Effect ?

Why randomize?  Graphically, Treatment Unobserved Factors Effect ?

Why randomize?  Graphically, Treatment Unobserved Factors Effect ?

Why randomize?  Graphically, Treatment Unobserved Factors Effect ?

Three uses of causal models  Represent (and predict the effects of) interventions on variables  Causal models only, of course  Efficiently determine independencies  I.e., which variables are informationally relevant for which other ones?  Use those independencies to rapidly update beliefs in light of evidence

Determining independence  Markov & Faithfulness ⇒ DAG structure determines all statistical independencies and associations  Graphical criterion: d-separation  X and Y are independent given S iff X and Y are d-separated given S iff X and Y are not d-connected given S  Intuition: X and Y are d-connected iff information can “flow” from X to Y along some path

d-separation  C is a collider on a path iff A → C ← B  Formally:  A path between A and B is active given S iff Every non-collider on the path is not in S; and Every collider on the path is either in S, or else one of its descendants is in S  X and Y are d-connected by S iff there is an active path between X and Y given S

d-separation  Surprising feature being exploited here:  Conditioning on a common effect induces an association between independent causes  Motivating example: Gas Tank → Car Starts ← Spark Plugs Gas and Plugs are independent, but if we know that the car doesn’t start, then they’re associated In that case, learning Gas = Full changes the likelihood that Plugs = Bad  And similarly if Car Starts → Emits Exhaust

d-separation  Algorithm to determine d-separation: 1. Write down every path between X and Y – Edge direction is irrelevant for this step – Just write down every sequence of edges that lies between X and Y – But don’t use a node twice in the same path

d-separation  Algorithm to determine d-separation: 1. Write down every path between X and Y 2. For each path, determine whether it is active by checking the status of each node on the path – The node is not active if either: – N is a collider + not in S (and no descendants of N are in S); or – N is not a collider and in S. – I.e., “multiply” the “not”s to get the node status – Any node not active ⇒ path not active

d-separation  Algorithm to determine d-separation: 1. Write down every path between X and Y 2. For each path, determine whether it is active by checking the status of each node on the path 3. Any path active ⇒ d-connected ⇒ X & Y associated No path active ⇒ d-separated ⇒ X & Y independent

d-separation  Exercise and Weight given Metabolism? E → M → WE → M → W Blocked! M is an included non-collider  E → FE → W Unblocked! FE is a non-included non-collider  ⇒ E W | M Exercise Food Eaten Weight Metabolism

d-separation  Metabolism and FE given Exercise?  M → W ← FE Blocked! W is a non-included collider  M ← E → FE Blocked! E is an included non-collider  ⇒ M FE | E Exercise Food Eaten Weight Metabolism

d-separation  Metabolism and FE given Weight?  M → W ← FE Unblocked! W is an included collider  M ← E → FE Unblocked! E is a non-included non-collider  ⇒ M FE | W Exercise Food Eaten Weight Metabolism

Updating beliefs  For both statistical and causal models, efficient computation of independencies ⇒ efficient prediction from observations  Specific instance of belief updating  Typically, “just” compute conditional probabilities Significantly easier if we have (conditional) independencies, since we can ignore variables

Bayes (and Bayesianism)  Bayes’ Theorem:  proof is trivial…  Interpretation is the interesting part:  Let D be the observation and T be our target variable(s) of interest  ⇒ Bayes’ theorem says how to update our beliefs about T given some observation(s)

Bayes (and Bayesianism)  Terminology: Posterior distribution Likelihood function Prior distribution Data distribution

Bayes and independence  Knowing independencies can greatly speed Bayesian updating  P(C | E, F, G) = [complex mess]  Suppose C independent of F, G given E  ⇒ P(C | E, F, G) = P(C | E) = [something simpler]

Updating beliefs  Compute: P(M = Hi | E = Hi, FE = Lo)  FE M | E ⇒ P(M | E, FE) = P(M | E) And P(M | E) is a term in the Markov factorization! Exercise Food Eaten Weight Metabolism

Looking ahead…  Have:  Basic formal representation for causation  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Need:  Search & causal discovery methods