Functional Data Analysis

Slides:

Advertisements

Similar presentations

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Advertisements

Random Sampling and Data Description

Dimension reduction (1)

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.

PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir

Techniques for studying correlation and covariance structure

Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.

Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,

Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob.

Chapter 2 Dimensionality Reduction. Linear Methods

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.

Object Orie’d Data Analysis, Last Time

Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.

1 UNC, Stat & OR Nonnegative Matrix Factorization.

NORMAL DISTRIBUTION AND ITS APPL ICATION. INTRODUCTION Statistically, a population is the set of all possible values of a variable. Random selection of.

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’

Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.

SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”

1 UNC, Stat & OR PCA Extensions for Data on Manifolds Fletcher (Principal Geodesic Anal.) Best fit of geodesic to data Constrained to go through geodesic.

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!

GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.

PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.

Principal Components Analysis ( PCA)

Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:

Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.

SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Linear Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp.

Estimating standard error using bootstrap

Continuous random variables

Clustering Idea: Given data

Return to Big Picture Main statistical goals of OODA:

Probability plots.

Last Time Proportions Continuous Random Variables Probabilities

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

CHAPTER 2 Modeling Distributions of Data

Object Orie’d Data Analysis, Last Time

CHAPTER 2 Modeling Distributions of Data

Exploring Microarray data

LECTURE 10: DISCRIMINANT ANALYSIS

BIOS 501 Lecture 3 Binomial and Normal Distribution

CHAPTER 2 Modeling Distributions of Data

Application of Independent Component Analysis (ICA) to Beam Diagnosis

An Interesting Question

Principal Component Analysis

Principal Nested Spheres Analysis

Today is Last Class Meeting

Participant Presentations

Dynamic graphics, Principal Component Analysis

Probabilistic Models with Latent Variables

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CHAPTER 2 Modeling Distributions of Data

Principal Components Analysis

Feature space tansformation methods

Parametric Methods Berlin Chen, 2005 References:

CHAPTER 2 Modeling Distributions of Data

Test 2 Covers Topics 12, 13, 16, 17, 18, 14, 19 and 20 Skipping Topics 11 and 15.

CHAPTER 2 Modeling Distributions of Data

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

CHAPTER 2 Modeling Distributions of Data

Marios Mattheakis and Pavlos Protopapas

CHAPTER 2 Modeling Distributions of Data

Presentation transcript:

Functional Data Analysis Insightful Decomposition Vertical Variation Horiz’l Var’n

More Data Objects Final Curve Warps: Warp Each Data Curve, 𝑓 1 , ⋯, 𝑓 𝑛 To Template Mean, 𝜇 𝑛 Denote Warp Functions 𝛾 1 , ⋯, 𝛾 𝑛 Gives (Roughly Speaking): Vertical Components 𝑓 1 ∘ 𝛾 1 , ⋯, 𝑓 𝑛 ∘ 𝛾 𝑛 (Aligned Curves) Horizontal Components 𝛾 1 , ⋯, 𝛾 𝑛 Data Objects I

More Data Objects Final Curve Warps: Data Objects II Final Curve Warps: Warp Each Data Curve, 𝑓 1 , ⋯, 𝑓 𝑛 To Template Mean, 𝜇 𝑛 Denote Warp Functions 𝛾 1 , ⋯, 𝛾 𝑛 Gives (Roughly Speaking): Vertical Components 𝑓 1 ∘ 𝛾 1 , ⋯, 𝑓 𝑛 ∘ 𝛾 𝑛 (Aligned Curves) Horizontal Components 𝛾 1 , ⋯, 𝛾 𝑛 ~ Kendall’s Shapes

More Data Objects Final Curve Warps: Warp Each Data Curve, 𝑓 1 , ⋯, 𝑓 𝑛 To Template Mean, 𝜇 𝑛 Denote Warp Functions 𝛾 1 , ⋯, 𝛾 𝑛 Gives (Roughly Speaking): Vertical Components 𝑓 1 ∘ 𝛾 1 , ⋯, 𝑓 𝑛 ∘ 𝛾 𝑛 (Aligned Curves) Horizontal Components 𝛾 1 , ⋯, 𝛾 𝑛 Data Objects III ~ Chang’s Transfo’s

Toy Example Conventional PCA Projections Power Spread Across Spectrum

Toy Example Conventional PCA Scores Views of 1-d Curve Bending Through 4 Dim’ns’

Toy Example Aligned Curve PCA Projections All Var’n In 1st Component

Toy Example Warps, PC Projections Mostly 1st PC, But 2nd Helps Some

TIC testbed Special Feature: Answer Key of Known Peaks Goal: Find Warps To Align These

TIC testbed Fisher – Rao Alignment

PNS on SRVF Sphere Toy Example View As Points Tangent Plane PC 1 PNS 1 Boundary of Nonnegative Orthant

PNS on SRVF Sphere Real Data Analysis: Blood Glucose Curves

PNS on SRVF Sphere Real Data Analysis: Blood Glucose Curves

Juggling Data Clustering In Phase Variation Space:

Probability Distributions as Data Objects Interesting Question: What is “Best” Representation? (Which Function ~ Distributions?) Density Function? (Very Interpretable) Cumulative Distribution Function Quantile Function (Recall Inverse of CDF)

Probability Distributions as Data Objects Recall Representations of Distributions

Probability Distributions as Data Objects PCA of Random Densities Power Spread Across Spectrum

Probability Distributions as Data Objects Now Try Quantile Representation (Same E.g.)

Probability Distributions as Data Objects PCA of Quantile Rep’ns Only 2 Modes! Shift Tilt

Probability Distributions as Data Objects Conclusion: Quantile Representation Best for Typical 2 “First” Modes of Variation (Essentially Linear Modes) Density & C. D. F. Generally Much Worse (Natural Modes are Non-Linear)

Probability Distributions as Data Objects Point 1: Mean Changes, Nicely Represented By Quantiles

Probability Distributions as Data Objects Point 1: Mean Changes, Nicely Represented By Quantiles

Probability Distributions as Data Objects Point 2: Spread Changes, Nicely Represented By Quantiles

Probability Distributions as Data Objects Point 2: Spread Changes, Nicely Represented By Quantiles

Random Matrix Theory Main Idea: Pure Noise Distribution of PCA Eigenvalues Usefulness: Interpretation of Scree Plots For Eigenvalues 𝜆 𝑗 of Sample Covariance Σ Plot 𝜆 𝑗 vs. 𝑗

PCA Redist’n of Energy (Cont.) Note, have already considered some of these Useful Plots: Power Spectrum (as %s) Cumulative Power Spectrum (%) Common Terminology: Power Spectrum is Called “Scree Plot” Kruskal (1964) Cattell (1966) (all but name “scree”) (1st Appearance of name???) 26

PCA Redist’n of Energy (Cont.) Note, have already considered some of these Useful Plots: Power Spectrum (as %s) Cumulative Power Spectrum (%) Large Values Reflect Important Structure 27

PCA Redist’n of Energy (Cont.) Note, have already considered some of these Useful Plots: Power Spectrum (as %s) Cumulative Power Spectrum (%) Zoom In & Characterize Noise 28

Random Matrix Theory Pure Noise Data Matrix: 𝑋= Defined as: Entries i.i.d. 𝑁(0,1) Thinking of Columns As Data Objects 𝑑 𝑛

Random Matrix Theory Clean Notation Version of Covariance Matrix: Σ = 1 𝑛 𝑋 𝑋 𝑡 Simplified by: No Mean Centering (using 𝑁(0,1)) Roughly OK, By Usual Mean Centering Also Standardize by 1 𝑛 not 1 𝑛−1 Easy & Sensible for No Mean Centering Size = 𝑑×𝑑

Random Matrix Theory Eigenvalues are 𝜆 𝑗 , diagonal entries of Λ in Σ =𝑈Λ 𝑈 𝑡 (Eigen-analysis) Distribution of 𝜆 𝑗 ?

Random Matrix Theory For 𝑑=100, 𝑛=1000, Eigenvalues ≈1 But There Is (Chance) Variation

Random Matrix Theory Smaller 𝑛=500 Boosts Variation (More Uncertainty)

Random Matrix Theory Smaller 𝑛=200 Boosts Variation (More Uncertainty)

Random Matrix Theory Smaller 𝑛=100 Boosts Variation But Can’t Go Negative Although Can Get Large

Random Matrix Theory Larger 𝑛=10,000 Reduces Variation

Random Matrix Theory Larger 𝑛=100,000 Reduces Variation

Random Matrix Theory Fix 𝑦= 𝑑 𝑛 , and let 𝑑, 𝑛 grow. Essentially Same Shape

Random Matrix Theory Fix 𝑦= 𝑑 𝑛 , and let 𝑑, 𝑛 grow. Essentially Same Shape

Random Matrix Theory Fix 𝑦= 𝑑 𝑛 , and let 𝑑, 𝑛 grow. But Less Sampling Noise

Random Matrix Theory Fix 𝑦= 𝑑 𝑛 , and let 𝑑, 𝑛 grow. But Less Sampling Noise

Random Matrix Theory Fix 𝑦= 𝑑 𝑛 , and let 𝑑, 𝑛 grow. What Is That Shape?

Empirical Spectral Density Random Matrix Theory Shape is Captured by Empirical Spectral Density “Density” Of These Eigenvalues

(in limit as 𝑛, 𝑑→∞, with 𝑦= 𝑑 𝑛 ) Random Matrix Theory Limiting Spectral Density (in limit as 𝑛, 𝑑→∞, with 𝑦= 𝑑 𝑛 ) References: Marčenko Pastur (1967) Yao et al (2015) Dobriban (2015)

Random Matrix Theory Limiting Spectral Density (in limit as 𝑛, 𝑑→∞, with 𝑦= 𝑑 𝑛 ) Limit Exists No Closed Form But Can Implicitly Define (Using Integral Equations) And Numerically Approximate

Random Matrix Theory Limiting Spectral Density, for given 𝑦= 𝑑 𝑛 Convenient Visualization Interface By Hyo Young Choi

Random Matrix Theory LSD: Above Case 𝑛=200, 𝑑=100

Random Matrix Theory LSD: Above Case 𝑛=200, 𝑑=100, 𝑦=0.5 log 10 𝑦 =−0.301

Random Matrix Theory LSD Note: These Have Finite Support ⊂(0,∞)

Random Matrix Theory LSD: Now Try Smaller (More Negative) Values of 𝑦= 𝑑 𝑛

Random Matrix Theory LSD: Now Try Smaller (More Negative) Values of 𝑦= 𝑑 𝑛

Random Matrix Theory LSD: Now Try Smaller (More Negative) Values of 𝑦= 𝑑 𝑛 Note: Support Points →1

Random Matrix Theory LSD: Now Try Smaller (More Negative) Values of 𝑦= 𝑑 𝑛

Random Matrix Theory LSD: Now Try Smaller (More Negative) Values of 𝑦= 𝑑 𝑛 Note: Increasing Symmetry

Random Matrix Theory Larger 𝑛=100,000 Reduces Variation Recall Previous Large 𝑛 Case, LSD is Zooming In On This

Random Matrix Theory Limiting Case: lim 𝑑→∞ lim 𝑛→∞ Called Medium Dimension High Sample Size Resulting Density is “Semi-Circle” 𝑓 𝑥 = 2 𝜋 𝑅 2 𝑅 2 − 𝑥 2 1 −𝑅,𝑅 (𝑥) Called “Wigner Semi-Circle Distribution”

Random Matrix Theory Summary: Have Studied Data Matrix Shapes Observed: Convergence to 1 Increasing Symmetry What About Other Direction (Larger 𝑑)?

Random Matrix Theory Consider Growing 𝑑 Challenge: Only 𝑛 Columns in 𝑋 (so rank =𝑛) Yet Σ is 𝑑×𝑑 So Have 𝑑−𝑛 Eigenvalues =0

Random Matrix Theory LSD: Start With 𝑦= 𝑑 𝑛 =1 Case

Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛 Proportion of 0 Eigenvalues

Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛 Spectral Density of Non-0 Eigenvalues

Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛

Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛

Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛 Again Heads Towards Semi-Circle But Small Proportion

Shapes Seem Similar to Above Random Matrix Theory LSD: Now Try Larger Values of 𝑦= 𝑑 𝑛 Note: Shapes Seem Similar to Above

Random Matrix Theory LSD: Dual Covariance Variation Idea: Replace Σ = 1 𝑛 𝑋 𝑋 𝑡 by 1 𝑑 𝑋 𝑡 𝑋 Recall: Rows as Data Objects Inner Product of 𝑋 Different Normalization (𝑑 not 𝑛) N(0,1) Avoids Messy Centering Issues

Random Matrix Theory LSD: Dual Covariance Variation 𝑦= 𝑑 𝑛 =100 Is Close to Semi-Circle

Random Matrix Theory LSD: Dual Covariance Variation

Random Matrix Theory LSD: Dual Covariance Variation Seem to Follow Similar Pattern

Random Matrix Theory LSD: Dual Covariance Variation

Random Matrix Theory LSD: Dual Covariance Variation For 𝑑<𝑛 Now Get 0 Eignevalues

Random Matrix Theory LSD: Dual Covariance Variation

Random Matrix Theory LSD: Dual Covariance Variation

Random Matrix Theory LSD: Dual Covariance Variation Again Heads To Semi-Circle

Random Matrix Theory LSD: Primal & Dual Overlaid For Direct Comparison Notes: Area = 1 Area = 1 - Bar

Random Matrix Theory LSD: Primal & Dual

Random Matrix Theory LSD: Primal & Dual Very Close For 𝑑≈𝑛

Random Matrix Theory LSD: Primal & Dual Very Close For 𝑑≈𝑛

Random Matrix Theory LSD: Primal & Dual

Random Matrix Theory LSD: Primal & Dual

Random Matrix Theory LSD: Primal & Dual

Random Matrix Theory LSD: Rescaled Primal & Dual 𝑦 ×𝐿𝑆𝐷 (underneath) 1 𝑦 ×𝐷𝑢𝑎𝑙 𝐿𝑆𝐷

Random Matrix Theory LSD: Rescaled Primal & Dual

Random Matrix Theory LSD: Rescaled Primal & Dual

Random Matrix Theory LSD: Rescaled Primal & Dual

Random Matrix Theory LSD: Rescaled Primal & Dual

Random Matrix Theory LSD: Rescaled Primal & Dual

Random Matrix Theory Conclusion: Family of Marcenko – Pastur Distributions Has Several Interesting Symmetries

Random Matrix Theory Important Parallel Theory: Distribution of Largest Eigenvalue (Assuming Matrix of i.i.d. N(0,1)s) Tracey Widom (1994) Good Discussion of Statistical Implications Johnstone (2008)

Participant Presentations Zhengling Qi Classification in personalized medicine Zhiyuan Liu CPNS Visualization in Pablo Fuhui Fang DiProPerm Analysis of OsteoArthritis Data