Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.

Slides:



Advertisements
Similar presentations
1 UNC, Stat & OR SAMSI AOOD Opening Workshop Tutorial OODA of Tree Structured Objects J. S. Marron Dept. of Statistics and O. R., UNC February 15, 2014.
Advertisements

Heuristic Search techniques
PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Introduction Solving a linear inequality in two variables is similar to graphing a linear equation, with a few extra steps that will be explained on the.
Independent Component Analysis Personal Viewpoint: Directions that maximize independence Motivating Context: Signal Processing “Blind Source Separation”
Multiple Shape Correspondence by Dynamic Programming Yusuf Sahillioğlu 1 and Yücel Yemez 2 Pacific Graphics 2014 Computer Eng. Depts, 1, 2, Turkey.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Face Recognition Jeremy Wyatt.
Atul Singh Junior Undergraduate CSE, IIT Kanpur.  Dimension reduction is a technique which is used to represent a high dimensional data in a more compact.
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.
Large Two-way Arrays Douglas M. Hawkins School of Statistics University of Minnesota
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time
Chapter Two: Summarizing and Graphing Data 2.2: Frequency Distributions 2.3: ** Histograms **
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
1 UNC, Stat & OR Nonnegative Matrix Factorization.
A Challenging Example Male Pelvis –Bladder – Prostate – Rectum.
Statistics – O. R. 891 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Statistics – O. R. 891 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Digital Image Processing CCS331 Relationships of Pixel 1.
1 UNC, Stat & OR Metrics in Curve Space. 2 UNC, Stat & OR Metrics in Curve Quotient Space Above was Invariance for Individual Curves Now extend to: 
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Zooming version, -- Dependent version –Mass flux data, -- Cell cycle data Image Analysis –1 st Generation.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Common Property of Shape Data Objects: Natural Feature Space is Curved I.e. a Manifold (from Differential Geometry) Shapes As Data Objects.
1 UNC, Stat & OR PCA Extensions for Data on Manifolds Fletcher (Principal Geodesic Anal.) Best fit of geodesic to data Constrained to go through geodesic.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
1 UNC, Stat & OR ??? Place ??? Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina January.
1 Some Guidelines for Good Research Dr Leow Wee Kheng Dept. of Computer Science.
Principal Component Analysis (PCA)
Statistics – O. R. 893 Object Oriented Data Analysis Steve Marron Dept. of Statistics and Operations Research University of North Carolina.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,
Classification on Manifolds Suman K. Sen joint work with Dr. J. S. Marron & Dr. Mark Foskey.
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, III J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Principal Components Analysis ( PCA)
Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.
SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)
Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.
1 UNC, Stat & OR Place Name OODA of Tree Structured Objects J. S. Marron Dept. of Statistics and Operations Research October 2, 2016.
Landmark Based Shapes As Data Objects
Statistical Smoothing
Return to Big Picture Main statistical goals of OODA:
Object Orie’d Data Analysis, Last Time
Statistics – O. R. 881 Object Oriented Data Analysis
Statistics – O. R. 881 Object Oriented Data Analysis
Maximal Data Piling MDP in Increasing Dimensions:
Landmark Based Shape Analysis
Principal Nested Spheres Analysis
Today is Last Class Meeting
Descriptive Statistics vs. Factor Analysis
Trees & Forests D. J. Foreman.
Forests D. J. Foreman.
BIRS – Geometry of Anatomy – Sept. ‘11
Union-Find.
Data Mining CSCI 307, Spring 2019 Lecture 21
Error Correction Coding
Presentation transcript:

Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting point: Analysis done really in the manifold –Not just in projected tangent plane –Deeper than Principal Geodesic Analysis? Manifold version of DWD?

Mildly Non-Euclidean Spaces Useful View of Manifold Data: Tangent Space Center: Frech é t Mean Reason for terminology “ mildly non Euclidean ”

Strongly Non-Euclidean Spaces Trees as Data Objects From Graph Theory: Graph is set of nodes and edges Tree has root and direction Data Objects: set of trees

Strongly Non-Euclidean Spaces Motivating Example: Blood Vessel Trees in Brains From Dr. Elizabeth BullittDr. Elizabeth Bullitt –Dept. of Neurosurgery, UNC Segmented from MRAs Study population of trees (Forest?)

Blood vessel tree data Marron’s brain:  MRI view  Single Slice  From 3-d Image

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  MRA view  “A” for Angiography”  Finds blood vessels (show up as white)  Track through 3d

Blood vessel tree data Marron’s brain:  From MRA  Segment tree  of vessel segments  Using tube tracking  Bullitt and Aylward (2002)

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Marron’s brain:  From MRA  Reconstruct trees  in 3d  Rotate to view

Blood vessel tree data Now look over many people (data objects) Structure of population (understand variation?) PCA in strongly non-Euclidean Space???,...,,

Blood vessel tree data The tree team:  Very Interdsciplinary  Neurosurgery:  Bullitt, Ladha  Statistics:  Wang, Marron  Optimization:  Aydin, Pataki

Blood vessel tree data Possible focus of analysis: Connectivity structure only (topology) Location, size, orientation of segments Structure within each vessel segment,...,,

Blood vessel tree data Present Focus: Topology only  Already challenging  Later address others  Then add attributes  To tree nodes  And extend analysis

Blood vessel tree data Recall from above: Marron’s brain:  Focus on back  Connectivity only

Blood vessel tree data Present Focus:  Topology only  Raw data as trees  Marron’s reduced tree  Back tree only

Blood vessel tree data Topology only E.g. Back Trees Full Population Study as movie Understand variation?

Strongly Non-Euclidean Spaces Statistics on Population of Tree-Structured Data Objects? Mean??? Analog of PCA??? Strongly non-Euclidean, since: Space of trees not a linear space Not even approximately linear (no tangent plane)

Mildly Non-Euclidean Spaces Useful View of Manifold Data: Tangent Space Center: Frech é t Mean Reason for terminology “ mildly non Euclidean ”

Strongly Non-Euclidean Spaces Mean of Population of Tree-Structured Data Objects? Natural approach: Fr é chet mean Requires a metric (distance) On tree space

Strongly Non-Euclidean Spaces Appropriate metrics on tree space: Wang and Marron (2007) Depends on: –Tree structure –And nodal attributes Won ’ t go further here But gives appropriate Fr é chet mean

Strongly Non-Euclidean Spaces Appropriate metrics on tree space: Wang and Marron (2007) For topology only (studied here): –Use Hamming Distance –Just number of nodes not in common Gives appropriate Fr é chet mean

Strongly Non-Euclidean Spaces PCA on Tree Space? Recall Conventional PCA: Directions that explain structure in data Data are points in point cloud 1-d and 2-d projections allow insights about population structure

Illust ’ n of PCA View: PC1 Projections

Illust ’ n of PCA View: Projections on PC1,2 plane

Source Batch Adj: PC 1-3 & DWD direction

Source Batch Adj: DWD Source Adjustment

Strongly Non-Euclidean Spaces PCA on Tree Space? Key Ideas: Replace 1-d subspace that best approximates data By 1-d representation that best approximates data Wang and Marron (2007) define notion of Treeline (in structure space)

Strongly Non-Euclidean Spaces PCA on Tree Space: Treeline Best 1-d representation of data Basic idea: From some starting tree Grow only in 1 “direction”

Strongly Non-Euclidean Spaces PCA on Tree Space: Treeline Best 1-d representation of data Problem: Hard to compute In particular: to solve optimization problem Wang and Marron (2007) Maximum 4 vessel trees Hard to tackle serious trees (e.g. blood vessel trees)

Strongly Non-Euclidean Spaces PCA on Tree Space: Treeline Problem: Hard to compute Solution: Burḉu Aydin & Gabor Pataki (linear time algorithm) (based on clever “reformulation” of problem) Description coming in Participant Presentation

PCA for blood vessel tree data PCA on Tree Space: Treelines Interesting to compare: Population of Left Trees Population of Right Trees Population of Back Trees And to study 1 st, 2 nd, 3 rd & 4 th treelines

PCA for blood vessel tree data Study “Directions” 1, 2, 3, 4 For sub- populations B, L, R (interpret later)

Strongly Non-Euclidean Spaces PCA on Tree Space: Treeline Next represent data as projections Define as closest point in tree line (same as Euclidean PCA) Have corresponding score (length of projection along line) And analog of residual (distance from data point to projection)

PCA for blood vessel tree data Raw Data & Treelines, PC1, PC2, PC3:

PCA for blood vessel tree data Raw Data & Treelines, PC1, PC2, PC3: Projections, Scores, Residuals

PCA for blood vessel tree data Raw Data & Treelines, PC1, PC2, PC3: Cumulative Scores, Residuals

PCA for blood vessel tree data Now look deeper at “Directions” 1, 2, 3, 4 For sub- populations B, L, R

PCA for blood vessel tree data Notes on Treeline Directions: PC1 always to left BACK has most variation to right (PC2) LEFT has more varia’n to 2 nd level (PC2) RIGHT has more var’n to 1 st level (PC2) See these in the data?

PCA for blood vessel tree data Notes: PC1 - left BACK - right LEFT 2 nd lev RIGHT 1 st lev See these??

PCA for blood vessel tree data Individual (each PC separately) Scores Plot

PCA for blood vessel tree data Identify this person

PCA for blood vessel tree data Identify this person PC Scores: 8,9,3,5

PCA for blood vessel tree data Identify this person (PC Scores: 8,9,3,5): Red = older 0 = Female Note: color ~ age

PCA for blood vessel tree data Identify this person (PC Scores: 8,9,3,5): Red = older 0 = Female Note: color ~ age

PCA for blood vessel tree data Identify this person (PC scores 1,10,1,1)

PCA for blood vessel tree data Identify this person (PC scores 1,10,1,1)

PCA for blood vessel tree data Explain strange (low score) correlation?

PCA for blood vessel tree data Explain strange (low score) correlation? Revisit Treelines: Small PC1 Score  Small PC3 Score Small PC1 Score  Small PC4 Score

PCA for blood vessel tree data Individual (each PC sep’ly) Residuals Plot

PCA for blood vessel tree data Individual (each PC sep’ly) Residuals Plot Very strongly correlated Shows much variation not explained by PCs (data are very rich) Note age coloring useful Younger (bluer)  more variation Older (redder)  less variation

PCA for blood vessel tree data Important Data Analytic Goals: Understand impact of age (colors) Understand impact of gender (symbols) Understand handedness (too few) Understand ethnicity (too few) See these in PCA?

PCA for blood vessel tree data Data Analytic Goals: Age, Gender See these? No…

PCA for blood vessel tree data Alternate View: Cumulative Scores

PCA for blood vessel tree data Alternate View: Cumulative Scores Always below 45 degree line Better separation of age or gender? (doesn’t seem like it) This makes it easy to find: Best represented case Worst represented case

PCA for blood vessel tree data Cum. Scores: Best repr’ed case (10,19,20)

PCA for blood vessel tree data Cum. Scores: Best repr’ed case (10,19,20) PC1 & PC2 Scores Very Large

PCA for blood vessel tree data Cum. Scores: Worst repr’ed case (3,5,6)

PCA for blood vessel tree data Cum. Scores: Worst repr’ed case (3,5,6) Fairly small tree Growth in unusual directions

PCA for blood vessel tree data Directly study age  PC scores

PCA for blood vessel tree data Directly study age  PC scores Graphic highlights potential connections But no strong correlations PC3 is strongest of weak lot (less young & old for large PC3 score)

Strongly Non-Euclidean Spaces Overall Impression: Interesting new OODA Area Much to be to done: Refined PCA Alternate tree lines Attributes (i.e. go beyond topology) Classification / Discrimination (SVM, DWD) Other data types (e.g. lung airways…)