Causal learning and modeling David Danks CMU Philosophy & Psychology 2014 NASSLLI.

Slides:



Advertisements
Similar presentations
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Advertisements

A Tutorial on Learning with Bayesian Networks
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Representations for KBS: Uncertainty & Decision Support
Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
For Monday Finish chapter 14 Homework: –Chapter 13, exercises 8, 15.
Introduction of Probabilistic Reasoning and Bayesian Networks
Learning Causality Some slides are from Judea Pearl’s class lecture
Causal Networks Denny Borsboom. Overview The causal relation Causality and conditional independence Causal networks Blocking and d-separation Excercise.
1 OSCAR: An Architecture for Generally Intelligent Agents John L. Pollock Philosophy and Cognitive Science University of Arizona
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Part II: Graphical models
Review: Bayesian learning and inference
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
Chapter 1 Conducting & Reading Research Baumgartner et al Chapter 1 Nature and Purpose of Research.
Specifying a Purpose, Research Questions or Hypothesis
1 gR2002 Peter Spirtes Carnegie Mellon University.
Bayesian Networks Alan Ritter.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Science and Engineering Practices
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Framework for K-12 Science Education
Summary of the Bayes Net Formalism David Danks Institute for Human & Machine Cognition.
Research Terminology for The Social Sciences.  Data is a collection of observations  Observations have associated attributes  These attributes are.
1 Performance Evaluation of Computer Networks: Part II Objectives r Simulation Modeling r Classification of Simulation Modeling r Discrete-Event Simulation.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Introduction: Why statistics? Petter Mostad
Bayesian Learning By Porchelvi Vijayakumar. Cognitive Science Current Problem: How do children learn and how do they get it right?
1 Causal Data Mining Richard Scheines Dept. of Philosophy, Machine Learning, & Human-Computer Interaction Carnegie Mellon.
Chapter Three: The Use of Theory
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Methodological Problems in Cognitive Psychology David Danks Institute for Human & Machine Cognition January 10, 2003.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Announcements Project 4: Ghostbusters Homework 7
INTERVENTIONS AND INFERENCE / REASONING. Causal models  Recall from yesterday:  Represent relevance using graphs  Causal relevance ⇒ DAGs  Quantitative.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Graduate School for Social Research Autumn 2015 Research Methodology and Methods of Social Inquiry socialinquiry.wordpress.com Causality.
RULES Patty Nordstrom Hien Nguyen. "Cognitive Skills are Realized by Production Rules"
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
1 BN Semantics 2 – Representation Theorem The revenge of d-separation Graphical Models – Carlos Guestrin Carnegie Mellon University September 17.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
#1 Make sense of problems and persevere in solving them How would you describe the problem in your own words? How would you describe what you are trying.
Graphical Models for Psychological Categorization David Danks Carnegie Mellon University; and Institute for Human & Machine Cognition.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Variable selection in Regression modelling Simon Thornley.
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
Identification in Econometrics: A Way to Get Causal Information from Observations? Damien Fennell, LSE UCL, May 27, 2005.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
A Brief Introduction to Bayesian networks
Literature Reviews and Research Overview
Read R&N Ch Next lecture: Read R&N
Read R&N Ch Next lecture: Read R&N
Causal Data Mining Richard Scheines
Bayesian Statistics and Belief Networks
CS 188: Artificial Intelligence Fall 2007
Descriptive Studies; Causality and Causal Inference
Read R&N Ch Next lecture: Read R&N
Regression Models - Introduction
Presentation transcript:

Causal learning and modeling David Danks CMU Philosophy & Psychology 2014 NASSLLI

High-level overview  Monday:  History of causal inference  Basic representation of causal structures  Tuesday:  Inference & reasoning using graphical models  Interventions in causal structures

High-level overview  Wednesday:  Basic principles of search & causal discovery  Thursday:  Challenges to causal discovery, and responses Both principled and real-world

High-level overview  Friday: One of two possibilities:  Singular / actual causation & counterfactuals (in the causal graphical model framework)  Recent advances in causal learning & inference  Decided by a vote at end-of-class tomorrow (Tues)

Structure & assumptions  Mix of lecture & (group) problem-solving, so if you have questions/uncertainty, Ask!  If you’re confused, then someone else probably is too…  Assuming basic knowledge of probabilities  Focus is on conceptual/foundational issues, not the technical details  But ask if you want to know more about those details!

A BRIEF HISTORY OF CAUSAL DISCOVERY

“ Big Picture” (very roughly)  Greeks : Unhelpful platitudes  : Practical successes  present: Computers + Formal models = principled methods

Aristotle  BC  Trying to answer: “ Why does X have A? ”  Four types of ‘cause’  Formal: Because of its structure  Material: Because of its composition  Efficient: Because of its development  Final: Because of its purpose  But no systematic theory of inference

Francis Bacon   Novum Organum (1620)  For any phenomenon, construct: The table of presence (tabula praesentiae) The table of absence (tabula absentiae) The table of degrees (tabula graduum)  The cause of the phenomenon is the set of properties that explains every case on each of the three tables

John Stuart Mill   System of Logic (1843)  Algorithmic form of Bacon’s method (though unattributed) Method of agreement Method of difference Method of concomitant variation

David Hume   Causal inference cannot be done using deduction  It is always logically possible that future “causes” will not be followed by the effect  Actually a general argument about induction  But we do it by “custom or habit”  Had an evolutionary justification, but no framework in which to express it

Responses to Hume’s skepticism  Hume’s arguments were quite influential in philosophical circles  And still matter in present-day philosophy  But in the sciences, people were starting to find methods that (sometimes) gave answers that at least seemed right…

Regression (Least Squares)  18 th c. astronomy: find the “best” values for 6 unknowns given 75 observations  Euler (1748) Failed due to computational intractability  Legendre (1805) Developed the method of least squares  Gauss (1795 / 1809) Independent (earlier, unpublished) discovery & justification  Still the most common causal inference method…

Growth of statistics  Early theory of statistics emerges from probability theory throughout the 1800s Laplace Quetelet Galton Pearson Spearman Yule

Ronald A. Fisher   Essentially the father of modern statistics, and developed:  An array of statistical tests  An analysis of various experimental designs  The standard statistical and methodological reference texts for a generation of scientists

Sewall Wright   Path analysis  Graphs encode high-level structure, and then regression can be used to estimate parameters  By mid-20 th c., it had been adopted by a number of economists and sociologists  But no search procedures were provided Have to know the high-level structure

Causal graphical models  Developed by statisticians, computer scientists, and philosophers  Dawid, Spiegelhalter, Wermuth, Cox, Lauritzen, Pearl, Spirtes, Glymour, Scheines  Represent both qualitative and quantitative aspects of causation

REPRESENTING CAUSAL STRUCTURES

Qualitative representation  We want a representation that captures many qualitative features of causality

Qualitative representation  We want a representation that captures many qualitative features of causality  Causation occurs among variables ⇒ One node per variable

Qualitative representation  We want a representation that captures many qualitative features of causality  Causation occurs among variables ⇒ One node per variable Exercise Food Eaten Weight Metabolism

Qualitative representation  We want a representation that captures many qualitative features of causality  Asymmetry of causation ⇒ Need an asymmetric connection in the graph Exercise Food Eaten Weight Metabolism

Qualitative representation  We want a representation that captures many qualitative features of causality  Asymmetry of causation ⇒ Need an asymmetric connection in the graph Exercise Food Eaten Weight Metabolism

Qualitative representation  We want a representation that captures many qualitative features of causality  No (immediate) reciprocal causation ⇒ No cycles (without explicit temporal indexing) Exercise Food Eaten Weight Metabolism

Qualitative representation  We want a representation that captures many qualitative features of causality  No (immediate) reciprocal causation ⇒ No cycles (without explicit temporal indexing) Exercise Food Eaten Weight Metabolism Exercise Food Eaten Weight Metabolism Time tTime t+1

Directed Acyclic Graphs  More precise: DAG G =  V = set of nodes (for variables)  E = set of edges (i.e., ordered pairs of nodes)  Path π = sequence of adjacent edges  Directed path = path with all edges same direction  Acyclicity: No directed path from node A to itself  In general: We use genealogical & topological language to describe graphical relationships

Quantitative representation  DAGs alone can represent “A causes B”… but not “strength” or “form” of causation  Need to represent the relationships between the various variables states  Exact quantitative representation will depend on the type of variables being represented

Bayesian networks  All variables are discrete/categorical  Represent quantitative causation using a joint probability distribution  I.e., a specification of the probability of any combination of variable values, such as: P(E=Hi & FE=Lo & M=Hi & W=Hi) = 0.001; P(E=Hi & FE=Lo & M=Hi & W=Lo) = 0.03; etc.  Note: Nothing inherently Bayesian about Bayes nets!

Structural Equation Models (SEMs)  All variables are continuous/real-valued  Represent quantitative causation using systems of linear equations  For example: Exercise = a 1 FE + a 2 M + a 3 W + ε E_noise FE = b 1 E + b 2 M + b 3 W + ε FE_noise etc.

Connecting the pieces  DAG-based graphical model: P(X) = P(X 1 ) P(X 2 | X 1 ) P(X 3 | X 1 ) P(X 4 | X 1,X 2 ) QuantitativeQualitative ???

Connecting the pieces  Causal Markov assumption:  Variables are independent of their non-effects conditional on their direct causes Use the qualitative graph to constrain the quantitative relationships  Encodes the intuition of “screening off” Given the values of the direct causes, learning the value of a non-effect doesn’t help me predict

Connecting the pieces  Markov assumption for Bayes nets ⇒  Markov factorization of P(X 1, X 2, …):

Connecting the pieces  Markov assumption for Bayes nets:  Markov factorization of P(X 1, X 2, …):  Example: Exercise Food Eaten Weight Metabolism P(E, FE, W, M) = P(E) * P(FE | E) * P(M | E) * P(W | M, FE) ⇒

Connecting the pieces  Markov assumption for SEMs:  Markov factorization of joint probability density:

Connecting the pieces  Markov assumption for SEMs:  Markov factorization of joint probability density:  Example: Exercise Food Eaten Weight Metabolism E = ε E_noise FE = a 1 E + ε FE_noise M = b 1 E + ε M_noise W = c 1 FE + c 2 M + ε C_noise ⇒

Connecting the pieces  Causal Faithfulness assumption  The only independencies are those predicted by the Markov assumption Uses the quantitative relations to constrain the qualitative graph Implication: No exactly counter-balancing causal paths Exercise → Food Eaten → Weight and Exercise → Metabolism → Weight do not exactly offset one another Implication: No perfectly deterministic relationships In particular, no variable is a mathematical function of others

Causal vs. statistical models  Bayes nets and SEMs are not inherently causal models  Markov and Faithfulness assumptions can be expressed purely as graph-quant. constraints  Assuming a non-causal version of the assumptions ⇒ purely statistical model  I.e., a compact representation of statistical independencies among some set of variables

Causation and intervention  Causal claims support counterfactuals  In particular, those about interventions “If I had flipped the switch, the light would have turned on” “If she hadn’t dropped the plate, then it would not have broken” Etc.

Causation and intervention  One of the central causal asymmetries  Interventions on a cause lead to changes in the effect Flipping the switch turns off the light  In contrast, interventions on an effect do not lead to changes in the cause Breaking the light bulb doesn’t flip the switch  Some have argued that this is the paradigmatic feature of causation (Woodward, Hausman)

Looking ahead…  Have: Basic formal representation for causation  Need:  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Search & causal discovery methods

Looking ahead…  Have: Basic formal representation for causation  Need:  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Search & causal discovery methods