Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez,

Slides:



Advertisements
Similar presentations
Discovering Cyclic Causal Models by Independent Components Analysis Gustavo Lacerda Peter Spirtes Joseph Ramsey Patrik O. Hoyer.
Advertisements

Probabilistic analog of clustering: mixture models
Brief introduction on Logistic Regression
Topic Outline Motivation Representing/Modeling Causal Systems
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Correlation and Autocorrelation
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Statistical Methods Chichang Jou Tamkang University.
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
1 gR2002 Peter Spirtes Carnegie Mellon University.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Chapter 9: Correlational Research. Chapter 9. Correlational Research Chapter Objectives  Distinguish between positive and negative bivariate correlations,
1 Day 2: Search June 9, 2015 Carnegie Mellon University Center for Causal Discovery.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
Chapter 2: The Research Enterprise in Psychology
Downloading Tetrad Plus a Quick Introduction to Non-Gaussian Orientation Joseph Ramsey.
Chapter 2: The Research Enterprise in Psychology
Chapter 2 The Research Enterprise in Psychology. n Basic assumption: events are governed by some lawful order  Goals: Measurement and description Understanding.
Research and Statistics AP Psychology. Questions: ► Why do scientists conduct research?  answer answer.
Outline 1)Motivation 2)Representing/Modeling Causal Systems 3)Estimation and Updating 4)Model Search 5)Linear Latent Variable Models 6)Case Study: fMRI.
The Research Enterprise in Psychology. The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each.
7/16/2014Wednesday Yingying Wang
User Study Evaluation Human-Computer Interaction.
Chapter 9 Power. Decisions A null hypothesis significance test tells us the probability of obtaining our results when the null hypothesis is true p(Results|H.
Association between 2 variables
Chapter 2 The Research Enterprise in Psychology. Table of Contents The Scientific Approach: A Search for Laws Basic assumption: events are governed by.
SPM Course Zurich, February 2015 Group Analyses Guillaume Flandin Wellcome Trust Centre for Neuroimaging University College London With many thanks to.
INTRO TO EXPERIMENTAL RESEARCH, continued Lawrence R. Gordon Psychology Research Methods I.
Correlational Research Chapter Fifteen Bring Schraw et al.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Research Process Parts of the research study Parts of the research study Aim: purpose of the study Aim: purpose of the study Target population: group whose.
Psych 230 Psychological Measurement and Statistics Pedro Wolf September 23, 2009.
Appraisal and Its Application to Counseling COUN 550 Saint Joseph College For Class # 3 Copyright © 2005 by R. Halstead. All rights reserved.
Course files
Discussion of time series and panel models
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Spatial Statistics in Ecology: Point Pattern Analysis Lecture Two.
Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Talking Points Joseph Ramsey. LiNGAM Most of the algorithms included in Tetrad (other than KPC) assume causal graphs are to be inferred from conditional.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
Spatial Smoothing and Multiple Comparisons Correction for Dummies Alexa Morcom, Matthew Brett Acknowledgements.
Statistical Analysis An Introduction to MRI Physics and Analysis Michael Jay Schillaci, PhD Monday, April 7 th, 2007.
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Chapter 2: The methods of psychology Slides prepared by Randall E. Osborne, Texas State University-San Marcos, adapted by Dr Mark Forshaw, Staffordshire.
Data Analysis. Qualitative vs. Quantitative Data collection methods can be roughly divided into two groups. It is essential to understand the difference.
Direct method of standardization of indices. Average Values n Mean:  the average of the data  sensitive to outlying data n Median:  the middle of the.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Chapter 2: The Research Enterprise in Psychology
Group Analyses Guillaume Flandin SPM Course London, October 2016
Searching for the Human Connectome
From Brain Images to Causal Connections Center for Causal Discovery (CCD) BD2K All Hands Meeting University of Pittsburgh Carnegie Mellon University Pittsburgh.
12 Inferential Analysis.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Dynamic Causal Modelling (DCM): Theory
Center for Causal Discovery: Summer Short Course/Datathon
Contrasts & Statistical Inference
12 Inferential Analysis.
Inferential Statistics
Contrasts & Statistical Inference
Searching for Graphical Causal Models of Education Data
Contrasts & Statistical Inference
Presentation transcript:

Joe Ramsey With enormous help from Clark Glymour as well as Russell Poldrack, Stephen and Catherine Hanson, Stephen Smith, Erich Kummerfeld, Ruben Sanchez, and others. 1 Causal Modeling of FMRI Data

Goals 1.From Imaging data, to extract as much information as we can, as accurately as we can, about which brain regions influence which others in the course of psychological tasks. 2.To generalize over tasks 3.To specialize over groups of people. 2

What Are the Brain Variables? In current studies, from 20,000 + ………………………………….3 voxelsROIs ROI = Region of interest Question: How sensitive are causal inferences to brain variable selection? 3

How are ROIs constructed (FSL)? Define an experimental variable (box function). Use a generalized linear model to determine which voxels “light up” in correlation with the experimental variable. Add a group level step if voxels lighting up for the group is desired. Cluster the resulting voxels into connected clusters. – Small clusters are eliminated. – Remaining clusters become the ROIs. – Symmetry constraints may be imposed. 4

Simple point about latents… Note that if variables are all picked out this way and the model is entirely cyclic, there can be no latent variables! … I is correlated with L… Probably not all cycles though… 5

Search Complexity: How Big is the Set of Possible Explanations? X Y } For N variables: 8 6 For graphical models:

Statistical Complexity Graphical models are untestable unless parameterized into statistical models. Incomplete models of associations are likely to fail tests. Multiple testing problems. Multiple subjects/Missing ROIs. No fast scoring method for mixed ancestral graphs that model feedback and latent common causes. Weak time lag information. 7

Measurement Complexity Sampling rate is slower than causal interaction speed. Indirect measurement creates spurious associations of measured variables: N1 N2 N3 X1 X3 X1 X2 X3 X2 Neural N, measured X Regression of X3 on X1, X2 8

Specification Strategies 1.Guess a model and test it. 2.Search the model space or some restriction of it. a. Search for the full parameterized structure b. Search for graphical structure alone c. Search for graphical features (e.g, adjacencies) 9

What Evidence of What Works, and Not? Theory. – Limiting correctness of algorithms (PC, FCI, GES, LiNGAM, etc., under usually incorrect assumptions for fMRI). Prior Knowledge – Do automated search results conform with established relationships? Animal Experiments (Limited) Simulation Studies 10

Brief Review: Smith’s Simulation Study 5 to 50 variables 28 simulation conditions, 50 subjects/condition. 38 search methods Search 1 subject at a time. 11

Methods tested by Smith DCM, SEM excluded; no search. (Not completely true.) Full correlation in various frequency bands Partial correlation Lasso (ICOV) Mutual Information, Partial MI Granger Causality Coherence Generalized Synchronization Patel’s Conditional Dependence Measures – P(x|y) vs P(y|x) Bayes Net Methods – CCD, CPC, FCI, PC, GES LiNGAM 12

Smith’s Results Adjacencies: – Partial Correlation methods (GLASSO) and several “Bayes Net” methods from CMU get ~ 90% correct in most simulations. Edge Directions – Smith: “None of the methods is very accurate, with Patel's τ performing best at estimating directionality, reaching nearly 65% d-accuracy, all other methods being close to chance.” (p. 883) – Most of the adjacencies for Patel’s τ are false. 13

14 Simulation conditions (see handout)…

15 SIMULATION 2 (10 variables, 11 edges)

Simulation 4 (50 variables 61 edges) 16

Simulation 7: 250 minutes, 5 variables 17

Simulation 8: Shared Inputs 18

Simulation 14: 5-Cycle 19

Simulation 15: Stronger Connections 20

Simulation 16: More Connections 21

Simulation 22: Nonstationary Connection Strengths 22

Simulation 24: One Strong External Input 23

24

25

Take Away Conclusion? Nothing works! Methods that get adjacencies (90%) cannot get directions of influence. Methods that get directions (60% - 70%) for normal session lengths cannot tell true adjacencies from false adjacencies. Even with unrealistically long sessions (4 hours), the best method gets 90% accuracy for directions but finds very few adjacencies. 26

Idea… If we could: – Increase sample size (effectively) by using data from multiple subjects – Focus on a method with strong adjacencies – Combine this with a method with strong orientations We may be able to do better (Ramsey, Hanson and Glymour, NeuroImage) – This is the strategy of the PC Lingam algorithm of Hoyer and several of us, though there are other ways to pursue the same strategy. 27

28 Reminder: If noises are non-Gaussian, we can learn more than a pattern. Linear Models, Covariance Data, Pattern/CPDAG Linear Models, Non- Gaussian Noises (LiNG), Directed Graph (1)(2)

Are noises for FMRI models non-Gaussian? Yes. This is controversial but shouldn’t be. – For word/pseudoword data of Xue and Poldrack (Task 3), kurtosis ranges up to 39.3 for residuals. There is a view in the literature that noises are distributed (empirically) as Gamma—say, with shape 19 and scale

Are connection functions linear for FMRI data? You tell me: I’ve not done a thorough survey of studies. 30

Coefficients? One expects them to be positive. – Empirically, in linear models of fMRI data, there are very few negative coefficients (1 in 200, say). – They’re only slightly negative if so. – This is consistent with negative coefficient occurring due to small sample regression estimation errors. For the most part, need to be less than 1. – Brain activations are cyclic and evolving over time. – Empirically, in linear models of fMRI, most coefficients are less than 1. To the extent that they’re greater than 1, one suspects nonlinearity. 31

The IMaGES algorithm Adaptation for multiple subjects of GES, a Bayes net method tested by Smith, et al. Iterative model construction using Bayesian scores separately on each subject at each step; edge with best average score added. Tolerates ROIs missing in various subjects. Seeks feed forward structure only. Finds adjacencies between variables with latent common causes. Forces sparsity by penalized BIC score to avoid triangulated variables (see Measurement Complexity) 32

IMaGES/LOFS Smith (2011): “Future work might look to optimize the use of higher-order statistics specifically for the scenario of estimating directionality from fMRI data.” LiNGAM orients edges by non-Normality of higher moments of the distributions of adjacent variables. LOFS uses the IMaGES adjacencies, and the LiNGAM idea for directing edges (with a different score for non- Normality, and without independent components). Unlike IMaGES, LOFS can find cycles. LOFS (from our paper) is R1 and/or R2… 33

Procedure R1(S) You don’t have to read these—I’ll describe them! G <- empty graph over variables of S For each variable V – Find the combination C of adj(V, S) that maximizes NG(e V|C ). – For each W in C Add W  V to G Return G 34

Procedure R2(S) G <- empty graph over variables of S For each pair of variables X, Y – Scores<-empty – For each combination of adjacents C for X and Y If NG(e X|Y ) NG(Y) – score <- NG(X) + NG(e Y|X ) – Add Y, score> to Scores If NG(e X|Y ) > NG(X) & NG(e Y|X ) < NG(Y) – score <- NG(e X|Y ) + NG(Y) – Add to Scores – If Scores is empty Add X—Y to G. – Else Add to G the edge in Scores with the highest score. Return G 35

Non-Gaussianity Scores Log Cosh – used in ICA Exp = -e^(-X^2/2) – used in ICA Kurtosis – ICA (one of the first tried, not great) Mean absolute – PC LiNGAM E(e^X) – Cumulant arithmetic = e^(κ 1 (X) + 1/(2!) κ 2 (X) + 1/(3!) κ 3 (X) + …) Anderson Darling A^2 – LOFS – Empirical Distribution Function (EDF) score with heavy weighting on the tails. – We’re using this one! 36

Mixing Residuals We are assuming that residuals for ROIs from different subjects are drawn from the same population, so that they can be mixed. Sometimes we center residuals from different subjects before mixing, sometimes not. For Smith study, doesn’t matter—the data is already centered! 37

Precision and Recall Precision = True positives / all positives – What fraction of the guys you found were correct? Recall = True positives / all true guys – What fraction of the correct guys did you find? 38

39

40

41

42

43

LiNG, KPC It has been suggested that LiNG (cyclic version of LiNGAM) be tried on this type of data. – LiNG typically does not come back with an answer, or if it does, it does not find a stable graph. – Like LiNGAM, cannot be scaled up to multiple subjects. KPC? – I’ve applied KPC to the first Smith simulation (5 variables, single subject). – Results have been much less accurate than IMaGES/LOFS. – On larger simulations it doesn’t come back. – Cannot be scaled up to multiple subjects (?). 44

Some Further Problems Discovering nearly canceling 2 cycles is hard (but we will try anyway…) Identifying latent latents for acyclic models Reliability of search may be worse with event designs than with block designs Subjects that differ in causal structures will yield poor results for multi-subject methods. 45

S. M. Smith, K L. Miller, G. Salimi-Khorshidi, M. Webster, C. F. Beckmann, T. E. Nichols, J. D. Ramsey, M. W. Woolrich (2011), Network modelling methods for fMRI, NeuroImage. J.D. Ramsey, S.J. Hanson, C. Hanson, Y.O. Halchenko, R.A. Poldrack, and C. Glymour(2010), Six Problems for causal inference from fMRI, NeuroImage. J.D. Ramsey, S.J. Hanson, C. Glymour. Multi-subject search correctly identifies causal connections and most causal directions in the DCM models of the Smith et al. simulation study. NeuroImage. G. Xue, Poldrack, R., The neural substrates of visual perceptual learning of words: implications for the visual word form area hypothesis. J. Cogn. Neurosci. Thanks to the James S. McDonnell Foundation. 46 Thanks!