with Application to Functional MRI Brain Data Describing High-Order Statistical Dependence Using "Concurrence Topology,” with Application to Functional MRI Brain Data Steven P. Ellis, Columbia University, ellisst@nyspi.columbia.edu Arno Klein, Sage Bionetworks Abstract: We propose a new nonparametric method, "Concurrence Topology (CT)", for describing dependence among dichotomous variables. CT starts by translating the data into a "filtration", i.e., a series of shapes. Holes in the filtration correspond to relatively weak or negative association among the variables. CT uses computational topology to describe the pattern of holes in the filtration. CT is able to describe high- order dependence while avoiding combinatorial explosion. We employed CT to investigate brain functional connectivity based on dichotomized functional MRI data. The data set includes subjects diagnosed with ADHD and healthy controls. In an exploratory analysis, working in both the time and Fourier domains, CT found a number of differences between ADHD subjects and controls in the topology of their filtrations. This poster is based on the paper Ellis and Klein (2014). CT AND NETWORK ANALYSIS: CT goes beyond network analysis in that Curto- Itskov filtrations often contain simplices that connect more than two variables (often 60 or more variables in our data). A figure like Figure 4 is impossible for a network. HOLES: Holes in a filtration are like stairwells in a building. Holes come in different dimensions (0, 1, 2, …). Holes in a Curto-Itskov filtration indicate relatively weak or negative association among the variables. Holes of dimension d pertain to order of dependence at least d+2. Figure 4: Persistence plot for same subject’s fMRI data in dimension 2. PERSISTENCE: A stairwell might span several floors. Working down from the top floor, the floor where the stairwell begins is the floor of its “birth”. The floor where it ends is the floor of the “death” of the stairwell. In general, a hole in a filtration might “persist” through several frames. “Birth – death” is the “lifespan” of the hole. The plot of death vs. birth of the holes of a given hole dimension is a “persistence plot”. The lifespan of a hole is the vertical distance from the diagonal death=birth line to the point corresponding to the hole. Figure 2 shows the persistence plot in dimension 1 for the filtration shown in Figure 1. DATA SET: We worked with publicly available resting state fMRI. The data included 25 ADHD subjects and 41 healthy controls. For every subject, the data included “BOLD” values in 92 brain regions at 192 time points. ANALYSIS STRATEGY: Dependence among regional BOLD series describes “functional connectivity” of the brain. (Holes reflect brain function, not brain anatomy.) We performed CT analysis of the fMRI data for each subject separately. In the “time domain” an observation is dichotomized BOLD in all regions at a single time point. In the “Fourier domain” an observation is dichotomized power in all regions at a single Fourier frequency. We used summary statistics of CT analyses as subject-wise variables in conventional statistical analyses. For each subject separately we performed CT analyses in both time and Fourier domains in dimensions 0—5. Our main interest was in trying out CT. Therefore, our analyses are exploratory. Significance at the 0.05 level was used as a flag for identifying potentially interesting findings. We plan to test our findings in larger, independent data sets. SOME FINDINGS BASED ON PERSISTENCE PLOTS: We observed differences between ADHD and control groups in the Fourier domain, dimensions 1 and 2 (especially dimension 1) and in the time domain, dimensions 4 and 5 (especially dimension 4): 64% of ADHD subjects had holes in the time domain in dimension 4 compared to 93% of controls. Note: Holes in dimension 4 reflect order of dependence at least 6. LOCALIZATION: Holes involve all variables, but some variables are more directly involved than others. The variables in “short cycles” are the most directly implicated. Not all holes have short cycles, but most do. Short cycles allow interpretation of holes. In dimension d, a short cycle contains d+2 variables (regions, in this case). The hole, call it a, corresponding to the dot marked by an asterisk in Figure 3 has a short cycle that appears in 13 subjects. This is nominally statistically significant. The 16 most common short cycles belonging to a distinguish ADHD from controls: 76% of the ADHD subjects have at least one of the 16, but only 44% of controls do. ORDER OF DEPENDENCE: A feature of a joint distribution that can be seen when looking at k variables at a time but that cannot be seen when looking at fewer than k reflects “kth-order dependence” among the variables. E.g., a kth-order interaction in a log-linear model reflects kth-order dependence. “High-order” dependence means order of dependence 3 or larger. PROBLEM: Describing high-order dependence among dozens or more variables “agnostically” (i.e., treating all variables the same a priori) often leads to a combinatorial explosion. CONCURRENCE TOPOLOGY (CT): “Concurrence topology” is a new nonparametric method for describing dependence among dichotomous variables that solves this problem. (Other methods exist that also solve the problem, but CT gives a very different view of the dependence structure. Initial inspiration for CT came from Curto and Itskov, 2008.) FILTRATION: CT translates multivariate dichotomous data into a filtration, which is a series of shapes (“frames”), none larger than the preceding one. Analogy: A filtration is like a building. The frames are like floors in the building. Figure 1 shows a low-dimensional filtration. Figure 2: Persistence plot in dimension 1 for filtration in figure 1. Line is death=birth line. COMPUTATIONAL TOPOLOGY: Specialized computational topology software is needed to identify holes, with their births and deaths, in a Curto-Itskov filtration. CT code, written in R, that does this is available from the authors. LONG LIFESPANS: The longer the lifespan of a hole in a Curto-Itskov filtration, the more likely it is to be “real”, not just a product of “noise.” REAL DATA PERSISTENCE PLOTS: Figures 3 and 4 are persistence plots of a research subject’s fMRI data in dimensions 1 and 2. The asterisk in figure 3 indicates a long-lived hole that proves to be interesting. (See below.) Figure 1: “Toy” filtration. CONCURRENCES: Suppose binary variables are coded “0” and “1”. The “concurrence” corresponding to a multivariate binary observation is the list of variables that are “1” in that observation. CT represents concurrences as simplices and uses them to build the “Curto-Itskov filtration” for the data. The frames (indexed by “frequency level”) correspond to the frequencies with which the concurrences appear in the data. REFERENCES: C. Curto and V. Itskov (2008) “Cell groups reveal structure of stimulus space,” PLoS Computational Biology, 4. S.P. Ellis, A. Klein (2014) "Describing high-order statistical dependence using ‘concurrence topology,’ with application to functional MRI brain data," Homology, Homotopy and Applications, 16, 245--264. Figure 3: Persistence plot for a subject’s fMRI data in dimension 1. Large dot indicates multiple holes with same birth and death. Asterisk indicates interesting hole discussed below.