The bumpy road of the search for a (good) cause

Slides:



Advertisements
Similar presentations
Introduction to Psychology
Advertisements

Alexander Statnikov1, Douglas Hardin1,2, Constantin Aliferis1,3
An Introduction to Causal Modeling and Discovery Using Graphical Models Greg Cooper University of Pittsburgh.
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Deriving Biological Inferences From Epidemiologic Studies.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Department of Public Health and Primary Care, Cardiovascular Epidemiology Unit, Strangeways Research Laboratory, Cambridge, UK Mendelian randomization:
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Can causal models be evaluated? Isabelle Guyon ClopiNet / ChaLearn
Lecture 5: Causality and Feature Selection Isabelle Guyon
STAT 497 APPLIED TIME SERIES ANALYSIS
PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.
Correlation AND EXPERIMENTAL DESIGN
Causality Workbenchclopinet.com/causality Results of the Causality Challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt.
Epidemiology Kept Simple
Causality Workbenchclopinet.com/causality The LOCANET task (Pot-luck challenge, NIPS 2008) Isabelle Guyon, Clopinet Alexander Statnikov, Vanderbilt Univ.
 Confounders are usually controlled with the “standard” response regression model.  The standard model includes confounders as covariates in the response.
The Methods of Social Psychology
Statistics Micro Mini Threats to Your Experiment!
Chapter 1 Conducting & Reading Research Baumgartner et al Chapter 1 Nature and Purpose of Research.
Feature selection methods from correlation to causality Isabelle Guyon NIPS 2008 workshop on kernel learning.
Lecture 6: Causal Discovery Isabelle Guyon
RESEARCH METHODS Lecture 35. EXPERIMENTAL RESEARCH [CONTINUED]
Biomedical research methods. What are biomedical research methods? An integrated approach using chemical, mathematical and computer simulations, in vitro.
Introduction to Social Science Research
Research Methods If we knew what it was we were doing, it would not be called research, would it? Albert Einstein.
Bayes Net Perspectives on Causation and Causal Inference
Research Design Interactive Presentation Interactive Presentation
Causality Workbenchclopinet.com/causality Cause-Effect Pair Challenge Isabelle Guyon, ChaLearn IJCNN 2013 IEEE/INNS.
On ranking in survival analysis: Bounds on the concordance index
Epidemiology The Basics Only… Adapted with permission from a class presentation developed by Dr. Charles Lynch – University of Iowa, Iowa City.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Feature Selection and Causal discovery Isabelle Guyon, Clopinet André Elisseeff, IBM Zürich Constantin Aliferis, Vanderbilt University.
Web of Causation; Exposure and Disease Outcomes Thomas Songer, PhD Basic Epidemiology South Asian Cardiovascular Research Methodology Workshop.
Biomedical Research Objective 2 Biomedical Research Methods.
Psychology 3306 Dr. D. Brodbeck. Introduction You knew it would start this way…. You knew it would start this way…. What is learning? What is learning?
Causality challenge #2: Pot-Luck
CHP400: Community Health Program - lI Research Methodology STUDY DESIGNS Observational / Analytical Studies Present: Disease Past: Exposure Cross - section.
Statistics: Unlocking the Power of Data Lock 5 STAT 250 Dr. Kari Lock Morgan Collecting Data: Observational Studies SECTION 1.3 Association versus Causation.
Lecture 5: Causality and Feature Selection Isabelle Guyon
273 Discovery of Causal Structure Using Causal Probabilistic Network Induction AMIA 2003, Machine Learning Tutorial Constantin F. Aliferis & Ioannis Tsamardinos.
ASSUMPTIONS OF A SCIENCE OF PSYCHOLOGY Realism –The world exists independent of observer Causality –Events (mental states and behavior) are caused by prior.
Sea level Workshop – Paris 2006 Assessing the impact of long term trends in extreme sea levels on offshore and coastal installations Ralph Rayner Marine.
Challenges in causality: Results of the WCCI 2008 challenge Isabelle Guyon, Clopinet Constantin Aliferis and Alexander Statnikov, Vanderbilt Univ. André.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Clinical Psychology Spring 2015 Kyle Stephenson. Overview – Day 3 Why is research important? Types of Research ▫Observational ▫Epidemiology ▫Correlational.
276 Causal Discovery Methods Using Causal Probabilistic Networks MEDINFO 2004, T02: Machine Learning Methods for Decision Support and Discovery Constantin.
The Visual Causality Analyst: An Interactive Interface for Causal Reasoning Jun Wang, Stony Brook University Klaus Mueller, Stony Brook University, SUNY.
Introduction to Validity True Experiment – searching for causality What effect does the I.V. have on the D.V. Correlation Design – searching for an association.
Research Methods.  Whole theories are never tested directly – rather, specific hypotheses derived from a theory’s propositions are tested through research.
Abnormal PSYCHOLOGY Third Canadian Edition Prepared by: Tracy Vaillancourt, Ph.D. Chapter 5 Research Methods in the Study of Abnormal Behaviour.
1Causal Inference and the KMSS. Predicting the future with the right model Stijn Meganck Vrije Universiteit Brussel Department of Electronics and Informatics.
INFERENCE FOR BIG DATA Mike Daniels The University of Texas at Austin Department of Statistics & Data Sciences Department of Integrative Biology.
Tijl De Bie John Shawe-Taylor ECS, ISIS, University of Southampton
Causation & Research Designs
PETRA 2014 An Interactive Learning and Adaptation Framework for Socially Assistive Robotics: An Interactive Reinforcement Learning Approach Konstantinos.
Lecture 3: Causality and Feature Selection
Fenglong Ma1, Jing Gao1, Qiuling Suo1
Lecture 3: Introduction to confounding (part 1)
Strength of Evidence; Empirically Supported Treatments
Biomedical Research.
Sociological Research Methods
Causal Data Mining Richard Scheines
Areas of Research … Causal Discovery Application Integration
Research Methods & Statistics
Objective 2 Biomedical Research Methods
From Correlation to Causation: Lessons for Security & Defense
CS 594: Empirical Methods in HCC Experimental Research in HCI (Part 1)
Presentation transcript:

The bumpy road of the search for a (good) cause Isabelle Guyon Dominik Janzing Bernhard Schölkopf

We know it’s important … …your health? What affects… …climate changes? … the economy? We are constantly facing problems of cause-effect relationships: what affects our health, the economy, climate changes, and which actions will have beneficial effects. …and which actions will have beneficial effects? … we use it all the time!

But we can’t even define it! Many definitions: Science Philosophy Law Psychology History Religion Engineering “Cause is the effect concealed, effect is the cause revealed” (Hindu philosophy) However there is no definition of causality encompassing all the notions it refers to in Science, Philosophy, Law, Psychology, History, Religion, Engineering. One of my favorite definitions comes from Hindu philosophy: “Cause is the effect concealed, effect is the cause revealed” . It indicates well that there is no effect without cause and vice versa.

Systemic causality The agent could learn! The causal system In engineering, there is a pretty well formalized notion of causality. The causal system The external agent

Difficulties Variability Confounding factors Sample bias Attrition bias In a perfect world in which we could observe and control everything, a single experiment would suffice to determine a cause-effect relationship. But the words is not perfect. There is a lot of variability we cannot control

To deal with variability… … we need experimental design

Success stories 1. Vitamin C and scurvy, Loyd 1750’s: A historical RCT. 2. Hygiene and infectious diseases, Semmelweis 1840’s; Pasteur 1860’s: Can you believe what you can’t see? Planned experiments in agriculture, Fisher 1930’s: Mathematical foundations of experimental design. Smoking and lung cancer: A long lasting debate, but better err on the safe side! NSAIDs, Aspirin, Phenacetin, Paracetamol, Vioxx: Drug efficacy vs. drug toxicity.

Statistical dependencies More difficulties A lot of “observational” data. Correlation  Causality! Statistical dependencies Experiments are often needed, but: Costly Unethical Infeasible

Learning from observational data Can we do it?

Not your usual ML problem! Non i.i.d. data: Training set  “natural” distribution Test set  “manipulated” distribution No cross-validation for model selection

The good old DAG Lung Cancer Smoking Genetics Coughing Attention Disorder Allergy Anxiety Peer Pressure Yellow Fingers Car Accident Born an Even Day Fatigue Wright, 1921 Haavelmo, 1943 Dawid, Spiegelhalter, Lauritzen, Speed Cox, Wermuth, Pearl, Spirtes, Glymour, Scheines, Cooper, Neapolitan, Koller, Friedman

Beware of the DAG! Unsuited for: Intrinsic limitations: Assumptions: Feed-back loops and equilibria Symmetric relationships Constrained systems Intrinsic limitations: Markov equivalences Imperfect data (measurement errors, quantization, aggregation) Assumptions: Causal sufficiency Causal Markov assumption Causal faithfulness Linearity & Gaussianity

Success stories 1. Genetic epidemiology: Towards personalized medicine. 2. Mendelian randomization: Resolving reverse causation & confounding. 3. System biology: Reverse engineering the cell. 4. Social sciences: Assisting policy-making.

Causality and time Everyday notion of causality: The causes precede their effects Is that always true? Delayed measurements Final cause (objective) Reverse causation Other difficulties: Non i.i.d. samples: redundant; correlation misleading. Confounding is still a problem. Seasonality. Censored data.

Co2 and temperature Over the last 650,00 years, CO2 has correlated with temperature, but … … CO2 lags behind temperature several hundred years.

Climate changes … Meehl et al. (2004). "Combinations of Natural and Anthropogenic Forcings in Twentieth-Century Climate". Journal of Climate 17: 3721-3727.

Other time series Japan

Success story: Granger causality Nobel prize, 2003 Co-integration: Elimination of spurious correlations in non-stationary time series (x(t) and y(t) non-stationary but a x(t)+b y(t) stationary). Granger causality: x(t)  y(t) if f(past x, past y) predicts better y(t) than f(past y). Co-integrated time series are in a Granger causality relationship. [does not eliminate confounding]

Open problems Final objective optimization Assessment methods Common assumptions Robust models Tradeoff efficiency/efficacy Data representation Data imperfections Heterogeneous information Mix observations and experiments Quantify uncertainty

Advertisement

The causality workbench What is the causal question? Why should we care? What is hard about it? Is this solvable? Is this a good benchmark? http://clopinet.com/causality

Causation and Prediction challenge Challenge datasets Toy datasets

Pot-Luck challenge Task Participants (views) Type CYTO 2 (609) LOCANET 10 (1372) PROMO 3 (862) SIGNET 2 (918) TIED 1 (551) CauseEffectPairs 5 (580) Stemmatology 0 (372) real self eval real artif artif self eval CYTO: Causal Protein-Signaling Networks in human T cells. Learn a protein signaling network from multicolor flow cytometry data. N=11 proteins, P~800 samples per experimental condition. E=9 conditions. LOCANET: LOcal CAusal NETwork. Find the local causal structure around a given target variable (depth 3 network) in REGED, CINA, SIDO, MARTI. PROMO: Simulated marketing task. Time series of 1000 promotion variables and 100 product sales. Predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years. SIGNET: Abscisic Acid Signaling Network. Determine the set of 43 boolean rules that describe the interactions of the nodes within a plant signaling network. 300 separate Boolean pseudodynamic simulations of the true rules. Model inspired by a true biological system. TIED: Target Information Equivalent Dataset. Illustrates a case in which there are many equivalent Markov boundaries. Find them all. CAUSEEFFECTPAIRS: Find the causal direction in eight pairs of variables. STEMMATOLOGY: Reconstruct a family tree of documents derived from one another. artif artif real real self eval

Other donated datasets Task Views Type WebLogs 272 MIDS 232 NOISE 247 SECOM 297 SEFTI 280 real self eval artif real artif WebLogs: Recover the links from page to page from number of daily hits. The network consists of 20 pages. The training data includes 512 days. MIDS: Mixed Dynamic Systems - Simulated time-series data of 9 variables based on linear Gaussian models with no latent common causes, but with multiple dynamic processes. NOISE: Real and simulated EEG data. The goal it to find which region of the brain influences which other region. SECOM: Semiconductor manufacturing. Find the causes of failure in ~60 features corresponding to measurements in a fab line (Classification pb) SEFTI: Semiconductor manufacturing. Here the pb is a regression pb: find the tools that are guilty of performance degradation. real real http://clopinet.com/causality

Lessons learnt Causation and prediction challenge: Knowing the true causal relationships yields better models. Regular feature selection is hard to beat in practice. Cause-effect pairs task of pot-luck challenge: [problem posed by Mooij, Janzing, Schölkopf] The winners identified 8/8 correct causal directions. Methodology: We need to stage our effort Address sub-problems (like ranking causes). Mix observations and experiments.

Proceedings JMLR W&CP Volume 3: Causation and Prediction Challenge (WCCI 2008) I. Guyon, C. Aliferis, G. Cooper, A. Elisseeff, J.-P. Pellet, P. Spirtes, A. Statnikov, Eds. http://jmlr.csail.mit.edu/proceedings/papers/v3/ 2) JMLR W&CP Volume 6 (in press): Objective and Assessment Workshop (NIPS 2008) I. Guyon, D. Janzing, B. Schölkopf, Eds. http://jmlr.csail.mit.edu/proceedings/papers/v6/

Coming soon … Virtual Laboratory. Workshops: Challenges: NIPS09: Causality and time series analysis mini-symposium http://clopinet.com/isabelle/Projects/NIPS2009/ - WCCI09: Active learning. Challenges: End 2009/2010: Active learning. End 2010/2011: Experimental Design in Causal Modeling (ExpDeCo). 2012: Causal Model for System Identification and Control (CoMSIco).

Conclusion Causal discovery from observational data is not an impossible task, but a very hard one. This points to the need for further research and benchmark: Combining experiments and observations. Exploring both efficiency and efficacy. Connecting with related disciplines: RL, control. Don’t miss the upcoming events!