1 UNC, Stat & OR SAMSI AOOD Opening Workshop Tutorial OODA of Tree Structured Objects J. S. Marron Dept. of Statistics and O. R., UNC February 15, 2014
2 UNC, Stat & OR Workshop Big Picture An investment by: Provided Funding to Bring Us Together Has Specific Goal: Generating Collaborative Research
3 UNC, Stat & OR Workshop Big Picture An investment by: Workshop Aim: Kickoff Ongoing Research (through whole program year)
4 UNC, Stat & OR Workshop Big Picture Thus different format: Fewer Main Talks Main Talks Aimed at Collaborations 2-Minute Madness Talks – Introductory Wed. Afternoon: Form Working Groups
5 UNC, Stat & OR Working Groups Usual Structure Conceived of at Opening Workshop Agreed upon on Wednesday Afternoon First Meeting: Thursday or Friday Followed by weekly meetings Can Skype or WebEx in remotely
6 UNC, Stat & OR Working Groups Goals: Collaborative Research Among unexpected partners Our hope: This group unusually well suited for this
7 UNC, Stat & OR Working Groups Program Areas of Emphasis: Functional Data Analysis Time Dynamics Image Analysis Trees as Data Shape and Manifold Data Where are potential (new) connections?
8 UNC, Stat & OR Working Groups Program Areas of Emphasis: Functional Data Analysis Time Dynamics Image Analysis Trees as Data Shape and Manifold Data fMRI Where are potential (new) connections?
9 UNC, Stat & OR Working Groups Program Areas of Emphasis: Functional Data Analysis Time Dynamics Image Analysis Trees as Data DTI Shape and Manifold Data Where are potential (new) connections?
10 UNC, Stat & OR Working Groups Program Areas of Emphasis: Functional Data Analysis Time Dynamics Image Analysis Brain Development Trees as Data Shape and Manifold Data Where are potential (new) connections?
11 UNC, Stat & OR Working Groups Program Areas of Emphasis: Functional Data Analysis Time Dynamics Atlas of Human Body Image Analysis Trees as Data Shape and Manifold Data Where are potential (new) connections?
12 UNC, Stat & OR Working Groups Where are potential (new) connections? Requests of you: Look for more of these Discuss with others Bring up on Wednesday Afternoon Join in on Thursday +
13 UNC, Stat & OR Object Oriented Data Analysis What is the atom of a statistical analysis? First Course: Numbers Multivariate Analysis: Vectors Functional Data Analysis: Curves OODA: More Complicated Objects Images Movies Shapes Tree Structured Objects
14 UNC, Stat & OR An Aside on Acronyms What is it? OODA or AOOD ???
15 UNC, Stat & OR SAMSI AOOD Opening Workshop Tutorial OODA of Tree Structured Objects J. S. Marron Dept. of Statistics and O. R., UNC February 15, 2014
16 UNC, Stat & OR Acronym History Original SAMSI Proposal: Object Oriented Data Analysis (OODA)
17 UNC, Stat & OR Acronym History Original SAMSI Proposal: Object Oriented Data Analysis (OODA) SAMSI Directors Suggestion: Analysis of Object Oriented Data (AOOD)
18 UNC, Stat & OR Acronym History Original SAMSI Proposal: Object Oriented Data Analysis (OODA) SAMSI Directors Suggestion: Analysis of Object Oriented Data (AOOD) NISS Board Suggestion: Analysis Of Object Data (AOOD)
19 UNC, Stat & OR An Aside on Acronyms What is it? OODA or AOOD Suggestion: Treat these as synonyms
20 UNC, Stat & OR Object Oriented Data Analysis What is the atom of a statistical analysis? First Course: Numbers Multivariate Analysis: Vectors Functional Data Analysis: Curves OODA: More Complicated Objects Images Movies Shapes Tree Structured Objects
21 UNC, Stat & OR Euclidean Data Spaces Data are vectors, in Effective (and Traditional) Analysis: Linear Methods Mean Covariance Principal Component Analysis Gaussian Distribution
22 UNC, Stat & OR Euclidean Data Spaces Data are vectors, in Challenges: High Dimension, Low Sample Size (Classical Methods Fail) Visualization: Find Structure (Expected & Unknown) Understand range of normal cases Find anomalies
23 UNC, Stat & OR Non - Euclidean Data Spaces Simple Example: m-reps for shapes Data involve angles Thus lie in manifold i.e. curved feature space Typical Approach: Tangent Plane Approx. e.g. PGA Personal Terminology: Mildly non-Euclidean
24 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong)
25 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong)
26 UNC, Stat & OR PGA for m-reps, Bladder-Prostate-Rectum Bladder – Prostate – Rectum, 1 person, 17 days PG 1 PG 2 PG 3 (analysis by Ja Yeon Jeong)
27 UNC, Stat & OR Non - Euclidean Data Spaces What is Strongly Non-Euclidean Case? Trees as Data Special Challenge: No Tangent Plane Must Re-Invent Data Analysis
28 UNC, Stat & OR Strongly Non-Euclidean Spaces Trees as Data Objects From Graph Theory: Graph is set of nodes and edges Tree has root and direction Data Objects: set of trees
29 UNC, Stat & OR Strongly Non-Euclidean Spaces Motivating Example: From Dr. Elizabeth Bullitt Dept. of Neurosurgery, UNC Blood Vessel Trees in Brains Segmented from MRAs Study population of trees Forest of Trees
30 UNC, Stat & OR Blood vessel tree data Marrons brain: MRI view Single Slice From 3-d Image
31 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
32 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
33 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
34 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
35 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
36 UNC, Stat & OR Blood vessel tree data Marrons brain: MRA view A for Angiography Finds blood vessels (show up as white) Track through 3d
37 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Segment tree of vessel segments Using tube tracking Bullitt and Aylward (2002)
38 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
39 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
40 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
41 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
42 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
43 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
44 UNC, Stat & OR Blood vessel tree data Now look over many people (data objects) Structure of population (understand variation?) PCA in strongly non-Euclidean Space???,...,,
45 UNC, Stat & OR Blood vessel tree data Examples of Potential Specific Goals (not accessible by traditional methods) Predict Stroke Tendency (Collateral Circulation) Screen for Loci of Pathology Explore how age affects connectivity,...,,
46 UNC, Stat & OR Blood vessel tree data Big Picture: 3 Approaches 1.Purely Combinatorial 2.Folded Euclidean 3.Dyck Path
47 UNC, Stat & OR Blood vessel tree data Big Picture: 3 Approaches 1.Purely Combinatorial 2.Folded Euclidean 3.Dyck Path
48 UNC, Stat & OR Blood vessel tree data Possible focus of analysis: Connectivity structure only (topology) Location, size, orientation of segments Structure within each vessel segment,...,,
49 UNC, Stat & OR Blood vessel tree data Present Focus: Topology only Already challenging Later address additional challenges By adding attributes (locations, thicknesses, curvature, …) To tree nodes And extend analysis
50 UNC, Stat & OR Blood vessel tree data Topological Representation: Each Vessel Segment (up to 1 st Split) is a node Split Segments are child nodes Connecting lines show relationship
51 UNC, Stat & OR Graphical Concept: Support Tree The union of all trees in data set T. Consists of the nodes in any tree of T
52 UNC, Stat & OR Support Tree Example Data trees: Support tree:
53 UNC, Stat & OR Blood vessel tree data Recall from above: Marrons brain: Focus on back Connectivity (topology) only (also consider right & left)
54 UNC, Stat & OR Blood vessel tree data Present Focus: Topology only Raw data as trees Marrons reduced tree Back tree only
55 UNC, Stat & OR Blood vessel tree data Topology only E.g. Back Trees Full Population Study as movie Understand variation?
56 UNC, Stat & OR Strongly Non-Euclidean Spaces Statistics on Population of Tree-Structured Data Objects? Mean??? Analog of PCA??? Strongly non-Euclidean, since: Space of trees not a linear space Not even approximately linear (no tangent plane)
57 UNC, Stat & OR Mildly Non-Euclidean Spaces Useful View of Manifold Data: Tangent Space Center: Frech é t Mean Reason for terminology mildly non Euclidean
58 UNC, Stat & OR Strongly Non-Euclidean Spaces Mean of Population of Tree-Structured Data Objects? Natural approach: Fr é chet mean Requires a metric (distance) on tree space
59 UNC, Stat & OR Strongly Non-Euclidean Spaces Appropriate metrics on tree space: Wang and Marron (2007) For topology only (studied here): Use Hamming Distance Just number of nodes not in common Gives appropriate Fr é chet mean
60 UNC, Stat & OR Hamming Distance The number of nodes in the symmetric difference of two trees. An example:
61 UNC, Stat & OR Hamming Distance The two trees drawn on top of each other: Common nodes: 2 Nodes only in blue tree: 4 Nodes only in red tree: 2 So, distance: 4+2=6
62 UNC, Stat & OR Strongly Non-Euclidean Spaces PCA on Tree Space? Recall Conventional PCA: Directions that explain structure in data Data are points in point cloud 1-d and 2-d projections allow insights about population structure
63 UNC, Stat & OR Illustn of PCA View: PC1 Projections
64 UNC, Stat & OR Illustn of PCA View: Projections on PC1,2 plane
65 UNC, Stat & OR PCA view: Lung Cancer Microarray Data
66 UNC, Stat & OR Strongly Non-Euclidean Spaces PCA on Tree Space? Key Idea (Jim Ramsay): Replace 1-d subspace that best approximates data By 1-d representation that best approximates data Wang and Marron (2007) define notion of Treeline (in structure space)
67 UNC, Stat & OR PCA on Combinatorial Tree Space? In Depth Discussion Tuesday Afternoon: Strongly Non-Euclidean Spaces
68 UNC, Stat & OR PCA for blood vessel tree data Individual (each PC separately) Scores Plot
69 UNC, Stat & OR PCA for blood vessel tree data Important Data Analytic Goals: Understand impact of age (colors) Understand impact of gender (symbols) Understand handedness (too few) Understand ethnicity (too few) See these in PCA?
70 UNC, Stat & OR PCA for blood vessel tree data Data Analytic Goals: Age, Gender See these? No…
71 UNC, Stat & OR PCA for blood vessel tree data Directly study age PC scores
72 UNC, Stat & OR PCA for blood vessel tree data Directly study age PC scores Take Deeper Look By Fitting Lines And doing Hypotest of H 0 : slope = 0 Show p-values to assess significance Compare Thickness & Descendants Corr.
73 UNC, Stat & OR PCA for blood vessel tree data Directly study age PC scores PC1 - Not Sigt
74 UNC, Stat & OR PCA for blood vessel tree data Directly study age PC scores PC2 - Left Sigt
75 UNC, Stat & OR PCA for blood vessel tree data Directly study age PC scores Conclusions: - No Strong Age Connection - Significant Connection for: - Descendants - Left - PC2
76 UNC, Stat & OR Strongly Non-Euclidean Spaces Overall Impression: Interesting OODA Area Much to be to done: Refined PCA Alternate tree lines Attributes (i.e. go beyond topology) Classification / Discrimination (SVM, DWD) Other data types (e.g. lung airways…)
77 UNC, Stat & OR Smoothing in Tree Space Question: How does tree structure change with age? Approach: (Gaussian) Kernel Smoothing
78 UNC, Stat & OR Smoothing in Tree Space
79 UNC, Stat & OR Strongly Non-Euclidean Spaces Smoothing on Tree Space? In Depth Discussion Tuesday Afternoon:
80 UNC, Stat & OR Blood vessel tree data Big Picture: 3 Approaches 1.Purely Combinatorial 2.Folded Euclidean 3.Dyck Path
81 UNC, Stat & OR Folded Euclidean Approach People: Scott Provan Sean Skwerer Megan Owen Martin Styner Ipek Oguz
82 UNC, Stat & OR Folded Euclidean Approach Setting: Connectivity & Length Background: Phylogenetic Trees Major Restriction: Need common leaves Big Payoff: Data space nearly Euclidean
83 UNC, Stat & OR Folded Euclidean Approach Big Payoff: Data space nearly Euclidean
84 UNC, Stat & OR Folded Euclidean Approach Big Payoff: Data space nearly Euclidean
85 UNC, Stat & OR Folded Euclidean Approach Big Payoff: Data space nearly Euclidean
86 UNC, Stat & OR Folded Euclidean Approach Major Restriction: Need common leaves Approach: Find common cortical landmarks (Oguz) corresponding across cases Treat as pseudo – leaves by projecting to points on tree (draw pic)
87 UNC, Stat & OR Blood vessel tree data Marrons brain: From MRA Reconstruct trees in 3d Rotate to view
88 UNC, Stat & OR Vessel Locations
89 UNC, Stat & OR Vessel Locations
90 UNC, Stat & OR Vessel Locations
91 UNC, Stat & OR Vessel Locations
92 UNC, Stat & OR Vessel Locations
93 UNC, Stat & OR Vessel Locations
94 UNC, Stat & OR Common Color
95 UNC, Stat & OR Common Color
96 UNC, Stat & OR Common Color
97 UNC, Stat & OR Common Color
98 UNC, Stat & OR Common Color
99 UNC, Stat & OR Common Color
100 UNC, Stat & OR Cortical Surface & Landmarks
101 UNC, Stat & OR Cortical Surface & Landmarks
102 UNC, Stat & OR Cortical Surface & Landmarks
103 UNC, Stat & OR Cortical Surface & Landmarks
104 UNC, Stat & OR Cortical Surface & Landmarks
105 UNC, Stat & OR Cortical Surface & Landmarks
106 UNC, Stat & OR Landmarks and Vessels
107 UNC, Stat & OR Landmarks and Vessels
108 UNC, Stat & OR Landmarks and Vessels
109 UNC, Stat & OR Landmarks and Vessels
110 UNC, Stat & OR Landmarks and Vessels
111 UNC, Stat & OR Landmarks and Vessels
112 UNC, Stat & OR Attach Landmarks & Subtrees
113 UNC, Stat & OR Attach Landmarks & Subtrees
114 UNC, Stat & OR Attach Landmarks & Subtrees
115 UNC, Stat & OR Attach Landmarks & Subtrees
116 UNC, Stat & OR Attach Landmarks & Subtrees
117 UNC, Stat & OR Attach Landmarks & Subtrees
118 UNC, Stat & OR Highlight Oprhans
119 UNC, Stat & OR Highlight Oprhans
120 UNC, Stat & OR Highlight Oprhans
121 UNC, Stat & OR Highlight Oprhans
122 UNC, Stat & OR Highlight Oprhans
123 UNC, Stat & OR Highlight Oprhans
124 UNC, Stat & OR Trim Oprhans
125 UNC, Stat & OR Trim Oprhans
126 UNC, Stat & OR Trim Oprhans
127 UNC, Stat & OR Trim Oprhans
128 UNC, Stat & OR Trim Oprhans
129 UNC, Stat & OR Trim Oprhans
130 UNC, Stat & OR Final Tree (common leaves)
131 UNC, Stat & OR Final Tree (common leaves)
132 UNC, Stat & OR Final Tree (common leaves)
133 UNC, Stat & OR Final Tree (common leaves)
134 UNC, Stat & OR Final Tree (common leaves)
135 UNC, Stat & OR Final Tree (common leaves)
136 UNC, Stat & OR Folded Euclidean Approach Next tasks: Statistical Analysis, e.g. Calculation of Mean Smoothing over time (wtd mean) PCA (Backwards approach???) Classification (linear method ???) Work in Progress Heavy & Specialized Optimization
137 UNC, Stat & OR Strongly Non-Euclidean Spaces Statistics on Folded EuclideanTree Space? In Depth Discussion Tuesday Afternoon:
138 UNC, Stat & OR Blood vessel tree data Big Picture: 3 Approaches 1.Purely Combinatorial 2.Euclidean Orthant 3.Dyck Path
139 UNC, Stat & OR Dyck Path Approach People: Shankar Bhamidi Dan Shen Haipeng Shen
140 UNC, Stat & OR Dyck Path Approach Setting: Start with connectivity only Second include lengths Should be generalizable
141 UNC, Stat & OR Dyck Path Approach Idea: Represent trees as functions
142 UNC, Stat & OR Dyck Path Approach Idea: Represent trees as functions Common device in probability theory Used for limiting distributions Gives access to Brownian Motion limits
143 UNC, Stat & OR Dyck Path Approach Idea: Represent trees as functions Common device in probability theory Used for limiting distributions Gives access to Brownian Motion limits Use Functional Data Analysis Familiar, Euclidean space Many methods available
144 UNC, Stat & OR Dyck Path Approach Idea: Represent trees as functions
145 UNC, Stat & OR Dyck Path Example Example 1, Assume that we have three following tree data Tree 1 Tree 2 Tree 3
146 UNC, Stat & OR Support tree: union of trees Tree 1 Tree 2 Tree 3 Tree 1
147 UNC, Stat & OR Tree 1 Tree 2 Tree 3 Tree 1,2 Support tree: union of trees
148 UNC, Stat & OR Tree 1 Tree 2 Tree 3 Tree 1,2,3 Support tree: union of trees
149 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
150 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
151 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
152 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
153 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
154 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
155 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
156 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
157 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
158 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
159 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the first tree as curve. Tree 1/ Support Tree
160 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
161 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
162 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
163 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
164 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
165 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
166 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
167 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
168 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
169 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
170 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the second tree as curve. Tree 2/ Support Tree
171 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
172 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
173 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
174 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
175 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
176 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
177 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
178 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
179 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
180 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
181 UNC, Stat & OR Transform Tree to Curve Now, we show how to transform the third tree as curve. Tree 3/ Support Tree
182 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
183 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
184 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
185 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
186 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
187 UNC, Stat & OR Some Brain Data Points (as corresponding trees)
188 UNC, Stat & OR Raw Brain Data (as curves)
189 UNC, Stat & OR Raw Brain Data - Zoomed
190 UNC, Stat & OR Raw Brain Data - Zoomed
191 UNC, Stat & OR Strongly Non-Euclidean Spaces More on Dyck PathTree Space? In Depth Discussion Tuesday Afternoon:
192 UNC, Stat & OR Working Groups Where are potential (new) connections? Requests of you: Look for more of these Discuss with others Bring up on Wednesday Afternoon Join in on Thursday +