Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003.

Slides:

Advertisements

Similar presentations

CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.

Advertisements

Topic Outline Motivation Representing/Modeling Causal Systems

Weakening the Causal Faithfulness Assumption

Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.

Lirong Xia Bayesian networks (2) Thursday, Feb 25, 2014.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

Peter Spirtes, Jiji Zhang 1. Faithfulness comes in several flavors and is a kind of principle that selects simpler (in a certain sense) over more complicated.

Learning Causality Some slides are from Judea Pearl’s class lecture

1 Automatic Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour Dept. of Philosophy & CALD Carnegie Mellon.

Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Introduction to Inference for Bayesian Netoworks Robert Cowell.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.

CSE 571 Advanced Artificial Intelligence Nov 24, 2003 Class Notes Transcribed By: Jon Lammers.

Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.

Simulation and Application on learning gene causal relationships Xin Zhang.

Ambiguous Manipulations

Learning Equivalence Classes of Bayesian-Network Structures David M. Chickering Presented by Dmitry Zinenko.

Inferring Causal Graphs Computing 882 Simon Fraser University Spring 2002.

CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.

. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.

Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

Bayes Net Perspectives on Causation and Causal Inference

Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.

Nattee Niparnan. Graph  A pair G = (V,E)  V = set of vertices (node)  E = set of edges (pairs of vertices)  V = (1,2,3,4,5,6,7)  E = ((1,2),(2,3),(3,5),(1,4),(4,

Introduction to Graphs. Introduction Graphs are a generalization of trees –Nodes or verticies –Edges or arcs Two kinds of graphs –Directed –Undirected.

CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.

Nov. 13th, Causal Discovery Richard Scheines Peter Spirtes, Clark Glymour, and many others Dept. of Philosophy & CALD Carnegie Mellon.

1 Monte Carlo Artificial Intelligence: Bayesian Networks.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,

Penn State - March 23, The TETRAD Project: Computational Aids to Causal Discovery Peter Spirtes, Clark Glymour, Richard Scheines and many others.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.

Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.

INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.

For Monday Finish chapter 19 No homework. Program 4 Any questions?

For Wednesday Read 20.4 Lots of interesting stuff in chapter 20, but we don’t have time to cover it all.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Lecture 2: Statistical learning primer for biologists

© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.

Graphs Upon completion you will be able to:

1 Acceleration of Inductive Inference of Causal Diagrams Olexandr S. Balabanov Institute of Software Systems of NAS of Ukraine

CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.

Pattern Recognition and Machine Learning

1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.

Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.

Variable selection in Regression modelling Simon Thornley.

1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.

1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.

1 Day 2: Search June 14, 2016 Carnegie Mellon University Center for Causal Discovery.

CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.

BN Semantic II d-Separation, PDAGs, etc

Topological Sorting.

Markov Properties of Directed Acyclic Graphs

More Graph Algorithms.

Center for Causal Discovery: Summer Short Course/Datathon

"Learning how to learn is life's most important skill. " - Tony Buzan

Causal Data Mining Richard Scheines

Chapter 9: Graphs Basic Concepts

Lectures on Graph Algorithms: searching, testing and sorting

Inferring Causal Graphs

An Algorithm for Bayesian Network Construction from Data

Chapter 9: Graphs Basic Concepts

Presentation transcript:

Structure Learning Using Causation Rules Raanan Yehezkel PAML Lab. Journal Club March 13, 2003

Main References Pearl, J., Verma, T., A Theory of Inferred Causation, Proceedings of the Second International Conference of Representation and Reasoning, San Francisco Spirtes, P., Glymour, C., Scheines, R., Causation Prediction and Search, second edition, 2000, MIT Press.

Taken from Judea Pearl web-site Simpson’s “Paradox” The sure thing principle (Savage, 1954) Let a, b be two alternative acts of any sort, and let G be any event. If you would definitely prefer b to a, either knowing that the event G obtained, or knowing that the event G did not obtain, then you definitely prefer b to a.

Taken from Judea Pearl web-site New treatment is preferred for male group (G). New treatment is preferred for female group (G’). =>New treatment is preferred. Simpson’s “Paradox” Local Success Rate G = male patientsG’ = female patients Old5% (50/1000)50% (5000/10000) New10% (1000/10000)92% (95/100) Global Success Rate all patients Old46% (5050/11000) New11% (1095/10100)

Simpson’s “Paradox” Intuitive way of thinking: GT S P(S,G,T)=P(G)  P(T) · P(S|G,T) P(S=1 | G,T=new) = 0.51 P(S=1 | G,T=old) = 0.27

Simpson’s “Paradox” The faithful DAG: GT S P(S,G,T)=P(G) · P(T | G) · P(S | G,T) P(S=1 | G,T=new) = 0.11 P(S=1 | G,T=old) = 0.46

Assumptions: Directed Acyclic Graph, Bayesian Networks. All variables are observable. No errors in Conditional Independence test results.

Identifying cause and effect relations Statistical data. Statistical data and temporal information.

Identifying cause and effect relations Potential Cause Genuine Cause Spurious Association

Intransitive Triplet I(C 1,C 2 ) ~I(C 1,E) ~I(C 2,E) C1C1 C2C2 E H1H1 H2H2 C1C1 C2C2 E H1H1 H2H2 C1C1 C2C2 E

Potential Cause X has a potential causal influence on Y if: X and Y are dependent in every context. ~I(Z,Y|S context ) I(X,Z|S context ) X Y Z

Genuine Cause X has a genuine causal influence on Y if: Z is a potential cause of X. ~I(Z,Y|S context ) I(Z,Y|X,S context ) ZX Potential Y Given context S Given X and context S ZX Potential Y

Spurious Association X and Y are spuriously associated if: 1.~I(X,Y| S context ) 2.~I(Z 1,X|S context ) 3.~I(Z 2,Y|S context ) 4.I(Z 1,Y|S context ) 5.I(Z 2,X|S context ) Z1Z1 X Y From conditions 1,2,4 From conditions 1,3,5 Z2Z2 X Y

Genuine Cause with temporal information X has a genuine causal influence on Y if: Z and S context precedes X. ~I(Z,Y|S context ) I(Z,Y|X,S context ) Z Y Given context S Given X and context S Z X Y

Spurious Association with temporal information X and Y are spuriously associated if: 1.~I(X,Y|S) 2.X precedes Y. 3.I(Z,Y|S context ) 4.~I(Z,X|S context ) Z Y From conditions 1,2 X From conditions 1,3,4 X Y

Algorithms Inductive Causation (IC). PC. Other.

Pearl and Verma, 1991 For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y. For each pair (X,Y) find the set of nodes S XY such that I(X,Y|S XY ). If S XY is empty, place an undirected link between X and Y. For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y. Inductive Causation (IC)

Pearl and Verma, 1991 Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Mark uni-directed links X  Y if there is some link with an arrow head at X. Inductive Causation (IC) Mark uni-directed links X  Y if there is some link with an arrow head at X.

Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 True graph

Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 For each pair (X,Y) find the set of nodes S XY such that I(X,Y|S XY ). If S XY is empty, place an undirected link between X and Y.

Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 For each pair of non-adjacent nodes (X,Y) with a common neighbor C, if C is not in S XY then add arrowheads to C: X  C  Y

Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y.

Example (IC) X1X1 X2X2 X3X3 X4X4 X5X5 Mark uni-directed links X  Y if there is some link with an arrow head at X.

Spirtes and Glymour, Form a complete undirected graph C on vertex set V. PC

Spirtes and Glymour, n = 0; 3.Repeat Repeat Select an ordered pair X and Y such that: |Adj(C,X)\{Y}|  n, and a subset S such that: S  Adj(C,X)\{Y}, |S| = n if: I(X,Y|S) = true, then delete edge(X,Y) Until all possible sets were tested. n = n + 1. Until:  X,Y, |Adj(C,X)\{Y}| < n. 2.n = 0; 3.Repeat Repeat Select an ordered pair X and Y such that: |Adj(C,X)\{Y}|  n, and a subset S such that: S  Adj(C,X)\{Y}, |S| = n if: I(X,Y|S) = true, then delete edge(X,Y) Until all possible sets were tested. n = n + 1. Until:  X,Y, |Adj(C,X)\{Y}| < n. PC

Spirtes and Glymour, For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY PC 4.For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY

Pearl and Verma, 1991 Mark uni-directed links X  Y if there is some link with an arrow head at X. Recursively: 1. If X-Y and there is a strictly directed path from X to Y then add an arrowhead at Y. 2. If X and Y aren’t adjacent but X  C and there is Y-C then direct the link C  Y. Use Inductive Causation (IC)

Spirtes, Glymour and Scheines Example (PC) True graph X5X5 X2X2 X4X4 X1X1 X3X3

Example (PC) Form a complete undirected graph C on vertex set V. X5X5 X2X2 X4X4 X1X1 X3X3

Example (PC) n = 0;|S XY | = n Independencies: None X5X5 X2X2 X4X4 X1X1 X3X3

Example (PC) n = 1;|S XY | = n Independencies: I(X 1,X 3 |X 2 ) X5X5 X2X2 X4X4 X1X1 X3X3 I(X 1,X 4 |X 2 )I(X 1,X 5 |X 2 )I(X 3,X 4 |X 2 )

Example (PC) n = 2;|S XY | = n Independencies: I(X 2,X 5 |X 3,X 4 ) X5X5 X2X2 X4X4 X1X1 X3X3

Example (PC) For each triple of vertices X, Y, Z, such that edge(X,Z) and edge(Y,Z), orient X  Z  Y, if and only if: Z  S XY X5X5 X2X2 X4X4 X1X1 X3X3 D-Separation set: S 3,4 ={X 2 }S 1,3 = {X 2 }

PC* - tests conditional independence between X,Y given a subset S, where S  { [(Adj(X)  Adj(Y)]  path(X,Y) } CI test prioritization according to: for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X. PC* - tests conditional independence between X,Y given a subset S, where S  { [(Adj(X)  Adj(Y)]  path(X,Y) } CI test prioritization according to: for a given variable X, first test those variables Y that are least dependent on X, conditional on those subsets of variables that are most dependent on X. Possible PC improvements (2)

Markov Equivalence (Verma and Pearl, 1990). Two casual models are equivalent if and only if their dags have the same links and same set of uncoupled head-to-head nodes (colliders). Z XY P=P(X)·P(Y)·P(Z|X,Y) Z XY Z XY P=P(Z)·P(X|Z)·P(Y|Z) = P(Y)·P(X|Z)·P(Z|Y)

Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations. Summery Algorithms such as PC and IC produce a partially directed graphs, which represent a family of Markov equivalent graphs. The remaining undirected arcs can be oriented arbitrarily (under DAG restrictions), in order to construct a classifier. The main flaw of the IC and PC algorithms, is that they might be unstable in a noisy environment. An error in one CI test for an arc, might lead to an error in other arcs. And one erroneous orientation might lead to other erroneous orientations.