Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

Slides:



Advertisements
Similar presentations
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Advertisements

Graph Algorithms Algorithm Design and Analysis Victor AdamchikCS Spring 2014 Lecture 11Feb 07, 2014Carnegie Mellon University.
BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Weakening the Causal Faithfulness Assumption
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
1 Bayes nets Computing conditional probability Polytrees Probability Inferences Bayes nets Computing conditional probability Polytrees Probability Inferences.
Exact Inference in Bayes Nets
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Learning Causality Some slides are from Judea Pearl’s class lecture
Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Introduction to Inference for Bayesian Netoworks Robert Cowell.
Bayesian Networks A causal probabilistic network, or Bayesian network,
Bayesian Network. Introduction Independence assumptions Seems to be necessary for probabilistic inference to be practical. Naïve Bayes Method Makes independence.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
CSE 571 Advanced Artificial Intelligence Nov 24, 2003 Class Notes Transcribed By: Jon Lammers.
Bayesian Network Representation Continued
Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.
Dr. Alexandra I. Cristea CS 319: Theory of Databases: C3.
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CPS 270: Artificial Intelligence Bayesian networks Instructor: Vincent Conitzer.
Bayes Net Perspectives on Causation and Causal Inference
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
Made by: Maor Levy, Temple University  Probability expresses uncertainty.  Pervasive in all of Artificial Intelligence  Machine learning 
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
A Brief Introduction to Graphical Models
Introduction to Bayesian Networks
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Announcements Project 4: Ghostbusters Homework 7
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2008 Readings: K&F: 3.1, 3.2, –  Carlos.
Bayesian Networks Aldi Kraja Division of Statistical Genomics.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
1 BN Semantics 2 – The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 20 th, 2006 Readings: K&F:
Recognising Languages We will tackle the problem of defining languages by considering how we could recognise them. Problem: Is there a method of recognising.
Learning Bayes Nets Based on Conditional Dependencies Oliver Schulte Department of Philosophy and School of Computing Science Simon Fraser University Vancouver,
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Markov Random Fields in Vision
1 BN Semantics 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 15 th, 2006 Readings: K&F: 3.1, 3.2, 3.3.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.
1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.
CS 2750: Machine Learning Directed Graphical Models
Bell & Coins Example Coin1 Bell Coin2
Bayesian Networks Based on
CAP 5636 – Advanced Artificial Intelligence
Inferring Causal Graphs
CS 188: Artificial Intelligence Fall 2007
An Algorithm for Bayesian Network Construction from Data
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Spring 2007
CS 188: Artificial Intelligence Spring 2006
CS 188: Artificial Intelligence Fall 2008
Presentation transcript:

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002

Applications of Bayes Nets (I) Windows Office “Paper Clip.” Bill Gates: “The competitive advantage of Microsoft lies in our expertise in Bayes Nets.” UBC Intelligent Tutoring System (ASI X-change).

Applications of Bayes Nets (II) University Drop-outs: Search program Tetrad II says that higher SAT score would lead to lower drop-out rate. Carnegie Mellon uses this to reduce its drop-out rates. Tetrad II recalibrates Mass Spectrometer on earth satellite. Tetrad II predicts relation between corn exports and exchange rates.

Bayes Nets: Basic Definitions Defn: A and B are independent iff P(A and B) = P(A) x P(B). Exercise: Prove that A and B are independent iff P(A|B) = P(A). Thus independence implies irrelevance.

Independence Among Variables Let X,Y,Z be random variables. X is independent of Y iff P(X=x| Y=y) = P(X=x) for all x,y s.t. P(Y=y) > 0. X is independent of Y given Z iff P(X=x|Y=y,Z=z) = P(Z=z) for all y,z s.t. P(Y=y and Z=z) >0. Notation: (X  Y|Z). Intuitively: given information Z, Y is irrelevant to X.

Axioms for Informational Relevance Pearl (2000), p.11. It’s possible to read the  symbol as “irrelevant”. Then we can consider a number of axioms for  as axiomatizations of relevance, for example: Symmetry: if (X  Y|Z) then (Y  X|Z). Decomposition: if (X  YW|Z) then (X  Y|Z).

Markovian Parents In constructing a Bayes net, we look for “direct causes” – variables that “immediately determine” the value of another value. Such direct causes “screen off” other variables. Formally: Let an ordering of variables X1, …, Xn be given. Consider Xj. Let PA be any subset of X1,…,Xj-1. Suppose that P(Xj|PA) = P(Xj|X1,..,Xj) and that no subset of PA has this property. Then PA forms the Markovian parents of Xj.

Markovian Parents and Bayes Nets Given an ordering of variables, we can construct a causal graphs by drawing arrows between Markovian parents and children. Note that graphs are suitable for drawing the distinction between “direct” and “intermediate” causes. Exercise: For the variables in figure 1.2, construct a Bayes net in the given ordering. Exercise: Construct a Bayes net along the ordering (X5, X1, X3, X2, X4).

Independence in Bayes Nets Note how useful irrelevance information is – think of a Prolog-style logical database. A typical problem: Given some information Z, and a query about X, is Y relevant to X? For Bayes nets, the d-separation criterion is a powerful answer.

d-separation In principle, information can flow along any path between two variables X and Y. Provisos: A path is blocked by any collider. Conditioning on a node reverses its status. –Conditioning on non-collider makes it block. –Conditioning on collider or its descendant makes it unblocked.

d-separation characterizes independence If X,Y d-separated by Z in a DAG G, then (X  Y|Z) in all probability distributions compatible with G. If X,Y not d-separated by Z in a DAG G, then not [(X  Y|Z) in all probability distributions compatible with G.].

Observational Equivalence Suppose we can observe the probabilities of various occurrences (rain vs. umbrellas, smoking vs. lung cancer etc.). How does prob constrain graph? Two causal graphs G1,G2 are compatible with the same probs iff. –G1 has the same adjacencies as G2 and the same v-structures (basically, colliders).

Observational Equivalence: Examples (I) In sprinkler network, cannot tell whether X1 -> X2 or vice versa. But can tell that X2 -> X4 and X4 -> X5. General note: You cannot always tell in machine learning what the correct hypothesis is even if you have all possible data -> need more assumptions or other kinds of data.

Observational Equivalence: Examples (II) Vancouver sun, March 29, “Adolescents …. Are more likely to turn to violence in their early twenties if they watch more than an hour of television a day… The team tracked more than 700 children and took into account the “chicken and egg” question: Does watching television cause aggression or do people prone to aggression watch more television?” [Science, Dr. Johnson, Columbia U.]

Two Models of Aggressive behaviour Disposition to aggression Violent behavourTV watching Disposition to aggression Violent behavourTV watching Are these two graphs observationally distinguishable?

Minimal Graphs A graph G is minimal for a probability distribution P iff –G is compatible with P, and –no subgraph of G is compatible with P. Example: not minimal if A  {B,C,D} A C D B

Note on minimality Intuitively, minimality requires that you add an edge between A and B only if there is some dependence between A and B. In statistical tests, dependence is observable but independence is not. So minimality amounts to “assume independence until dependence is observed”. That is exactly the strategy for minimizing mind changes! (“assume reaction is impossible until observed”).

Stable Distributions A distribution P is stable iff there is a graph G such that (X  Y |Z) in P iff X and Y are d- separated given Z in G. Intuitively, stability rules out “exact counterbalance”: two forces both having a causal effect but cancelling out each other exactly in every circumstance.

Inferring Causal Structure: The IC Algorithm Assume a stable probability distribution P. Find a minimal graph for P with as many edges directed as possible. General idea: First find variables that are “directly causally related”. Connect those. Add arrows as far as possible.

Inferring Causal Structure: The IC Algorithm 1.For each pair of variables X and Y, look for a “screen off” set S(X,Y) s.t. X  Y| S(X,Y) holds. If there is no such set, add an undirected edge between X and Y. 2.For each pair X,Y with a common neighbour Z, check if Z is part of a “screening off” set S(X,Y). If not, make Z a common consequence of X,Y. 3.Orient edges without creating cycles or v-structures.

Rules for Orientation Given a  b, b – c add b  c if a,c are not linked (no new collider). Given a  c  b, a – b add a  b (no cycle). Given a – c  d and c  d  b and a – b add a  b if c,d are not linked (no cycle + no new collider). Given a – c  b and a – d  b and a – b add a  b if c,d are not linked (no cycle + no new collider).