Participant Presentations

Slides:

Advertisements

Similar presentations

Object Orie’d Data Analysis, Last Time •Clustering –Quantify with Cluster Index –Simple 1-d examples –Local mininizers –Impact of outliers •SigClust –When.

Advertisements

Eigen Decomposition and Singular Value Decomposition

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Surface normals and principal component analysis (PCA)

Yeast Cell Cycles, Freq. 2 Proj. PCA on Freq. 2 Periodic Component Of Data.

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

Dimension reduction (1)

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.

Principal Component Analysis CMPUT 466/551 Nilanjan Ray.

Dimensional reduction, PCA

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.

CS4670: Computer Vision Kavita Bala Lecture 7: Harris Corner Detection.

SVD(Singular Value Decomposition) and Its Applications

Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,

Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob.

Chapter 2 Dimensionality Reduction. Linear Methods

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

Object Orie’d Data Analysis, Last Time

CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.

Object Orie’d Data Analysis, Last Time Primal – Dual PCA vs. SVD (not comparable) Vectors (discrete) vs. Functions (contin ’ s) PCA for shapes – Corpus.

Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’

Object Orie’d Data Analysis, Last Time

SWISS Score Nice Graphical Introduction:. SWISS Score Toy Examples (2-d): Which are “More Clustered?”

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,

Principal Component Analysis (PCA)

Statistics – O. R. 893 Object Oriented Data Analysis Steve Marron Dept. of Statistics and Operations Research University of North Carolina.

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Next Week, Early, Oct.,

GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.

Participant Presentations Draft Schedule Now on Course Web Page: When You Present: Please Load Talk on Classroom.

PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.

Central limit theorem - go to web applet. Correlation maps vs. regression maps PNA is a time series of fluctuations in 500 mb heights PNA = 0.25 *

Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.

SigClust Statistical Significance of Clusters in HDLSS Data When is a cluster “really there”? Liu et al (2007), Huang et al (2014)

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Linear Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp.

Object Orie’d Data Analysis, Last Time DiProPerm Test –Direction – Projection – Permutation –HDLSS hypothesis testing –NCI 60 Data –Particulate Matter.

CSE 554 Lecture 8: Alignment

Singular Value Decomposition and its applications

Unsupervised Learning

Return to Big Picture Main statistical goals of OODA:

PREDICT 422: Practical Machine Learning

Object Orie’d Data Analysis, Last Time

Exploring Microarray data

Stat 31, Section 1, Last Time Sampling Distributions

LECTURE 10: DISCRIMINANT ANALYSIS

Object Orie’d Data Analysis, Last Time

Functional Data Analysis

Dimension Reduction via PCA (Principal Component Analysis)

Statistics – O. R. 881 Object Oriented Data Analysis

Principal Nested Spheres Analysis

Singular Value Decomposition

Participant Presentations

Diagnostics and Transformation for SLR

Principal Component Analysis

Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.

Feature space tansformation methods

Principal Component Analysis

LECTURE 09: DISCRIMINANT ANALYSIS

Diagnostics and Transformation for SLR

Marios Mattheakis and Pavlos Protopapas

Unsupervised Learning

Statistics – O. R. 891 Object Oriented Data Analysis

Presentation transcript:

Participant Presentations See Course Web Site (10 Minute Talks)

Object Oriented Data Analysis Three Major Parts of OODA Applications: I. Object Definition “What are the Data Objects?” Exploratory Analysis “What Is Data Structure / Drivers?” III. Confirmatory Analysis / Validation Is it Really There (vs. Noise Artifact)?

Course Background I Linear Algebra Please Check Familiarity No? Read Up in Linear Algebra Text Or Wikipedia?

Review of Linear Algebra (Cont.) SVD Full Representation: = Intuition: For 𝑋 as Linear Operator: Represent as: Coordinate Rescaling Isometry (~Rotation) Isometry (~Rotation)

Review of Linear Algebra (Cont.) SVD Reduced Representation: =

Review of Linear Algebra (Cont.) SVD Compact Representation: = For Reduced Rank Approximation Can Further Reduce Key to Dimension Reduction

Review of Multivar. Prob. (Cont.) Outer Product Representation: , Where:

PCA as an Optimization Problem Find Direction of Greatest Variability:

PCA as Optimization (Cont.) Variability in the Direction : i.e. (Proportional to) a Quadratic Form in the Covariance Matrix Simple Solution Comes from the Eigenvalue Representation of :

PCA as Optimization (Cont.) Now since is an Orthonormal Basis Matrix, and So the Rotation Gives a Decomposition of the Energy of in the Eigen-directions of And is Max’d (Over ), by Putting maximal Energy in the “Largest Direction”, i.e. taking , Where “Eigenvalues are Ordered”,

PCA as Optimization (Cont.) Notes: Projecting onto Subspace ⊥ to 𝑣 1 , Gives 𝑣 2 as Next Direction Continue Through 𝑣 3 ,⋯, 𝑣 𝑑

Connect Math to Graphics 2-d Toy Example 2-d Curves as Data In Object Space Simple, Visualizable Descriptor Space From Much Earlier Class Meeting

PCA Redistribution of Energy Now for Scree Plots (Upper Right of FDA Anal.) Carefully Look At: Intuition Relation to Eigenanalysis Numerical Calculation

PCA Redist’n of Energy (Cont.) ANOVA Mean Decomposition: Total Variation = = Mean Variation + Mean Residual Variation 𝑖=1 𝑛 𝑋 𝑖 2 = 𝑖=1 𝑛 𝑋 2 + 𝑖=1 𝑛 𝑋 𝑖 − 𝑋 2 Mathematics: Pythagorean Theorem Intuition Quantified via Sums of Squares (Squares More Intuitive Than Absolutes)

PCA Redist’n of Energy (Cont.) Eigenvalues Provide Atoms of SS Decompos’n Useful Plots are: Power Spectrum: vs. log Power Spectrum: vs. Cumulative Power Spectrum: vs. Note PCA Gives SS’s for Free (As Eigenval’s), But Watch Factors of 15

PCA Redist’n of Energy (Cont.) Note, have already considered some of these Useful Plots: Power Spectrum (as %s) Cumulative Power Spectrum (%) Common Terminology: Power Spectrum is Called “Scree Plot” Kruskal (1964) Cattell (1966) (all but name “scree”) (1st Appearance of name???) 16

PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA

PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA Consequence: Skip this step

PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA Useful view point: For Data Matrix 𝑋 Ignore scaled, centered 𝑋 = 1 𝑛−1 𝑋− 𝑋 Instead do eigen-analysis of 𝑋 𝑋 𝑡 (in contrast to Σ = 𝑋 𝑋 𝑡 )

Find Directions of Maximal Variation PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA Eigen-analysis of 𝑋 𝑋 𝑡 Intuition: Find Directions of Maximal Variation From the Origin

PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA Investigate with Similar Toy Example

PCA vs. SVD 2-d Toy Example Direction of “Maximal Variation”???

PCA vs. SVD 2-d Toy Example Direction of “Maximal Variation”??? PC1 Solution (Mean Centered) Very Good!

PCA vs. SVD 2-d Toy Example Direction of “Maximal Variation”??? SV1 Solution (Origin Centered) Poor Rep’n

PCA vs. SVD 2-d Toy Example Look in Orthogonal Direction: PC2 Solution (Mean Centered) Very Good!

PCA vs. SVD 2-d Toy Example Look in Orthogonal Direction: SV2 Solution (Origin Centered) Off Map!

PCA vs. SVD 2-d Toy Example SV2 Solution Larger Scale View: Not Representative of Data

PCA vs. SVD Sometimes “SVD Analysis of Data” = Uncentered PCA Investigate with Similar Toy Example: Conclusions: PCA Generally Better Unless “Origin Is Important” Deeper Look: Zhang et al (2007)

Different Views of PCA Solves several optimization problems: Direction to maximize SS of 1-d proj’d data 29

Different Views of PCA 2-d Toy Example Max SS of Projected Data 30

Different Views of PCA Solves several optimization problems: Direction to maximize SS of 1-d proj’d data Direction to minimize SS of residuals 31

Different Views of PCA 2-d Toy Example Max SS of Projected Data Min SS of Residuals 32

Different Views of PCA Solves several optimization problems: Direction to maximize SS of 1-d proj’d data Direction to minimize SS of residuals (same, by Pythagorean Theorem) “Best fit line” to data in “orthogonal sense” (vs. regression of Y on X = vertical sense & regression of X on Y = horizontal sense) 33

Different Views of PCA 2-d Toy Example Max SS of Projected Data Min SS of Residuals Best Fit Line 34

Different Views of PCA Toy Example Comparison of Fit Lines: PC1 Regression of Y on X Regression of X on Y 35

Different Views of PCA Normal Data ρ = 0.3 36

Different Views of PCA Projected Residuals 37

Different Views of PCA Vertical Residuals (X predicts Y) 38

Different Views of PCA Horizontal Residuals (Y predicts X) 39

Different Views of PCA Projected Residuals (Balanced Treatment) 40

Different Views of PCA Toy Example Comparison of Fit Lines: PC1 Regression of Y on X Regression of X on Y Note: Big Difference Prediction Matters 41

Different Views of PCA Use one that makes sense… Solves several optimization problems: Direction to maximize SS of 1-d proj’d data Direction to minimize SS of residuals (same, by Pythagorean Theorem) “Best fit line” to data in “orthogonal sense” (vs. regression of Y on X = vertical sense & regression of X on Y = horizontal sense) Use one that makes sense… 42

PCA Data Representation Idea: Expand Data Matrix in Terms of Inner Prod’ts & Eigenvectors Recall Notation: 𝑋 = 1 𝑛−1 𝑋 1 − 𝑋 ,⋯, 𝑋 𝑛 − 𝑋 𝑑×𝑛 (Mean Centered Data)

PCA Data Representation Idea: Expand Data Matrix in Terms of Inner Prod’ts & Eigenvectors Recall Notation: 𝑋 = 1 𝑛−1 𝑋 1 − 𝑋 ,⋯, 𝑋 𝑛 − 𝑋 𝑑×𝑛 Spectral Representation (centered data): 𝑋 𝑑×𝑛 = 𝑗=1 𝑑 𝑣 𝑗 𝑣 𝑗 𝑡 𝑋

PCA Data Represent’n (Cont.) Now Using: 𝑋= 𝑋 + 𝑛−1 𝑋 Spectral Representation (Raw Data): 𝑋 𝑑×𝑛 = 𝑋 + 𝑗=1 𝑑 𝑣 𝑗 𝑛−1 𝑣 𝑗 𝑡 𝑋 = 𝑋 + 𝑗=1 𝑑 𝑣 𝑗 𝑐 𝑗 Where: Entries of 𝑣 𝑗 𝑑×1 are Loadings Entries of 𝑐 𝑗 1×𝑛 are Scores

PCA Data Represent’n (Cont.) Can Focus on Individual Data Vectors: 𝑋 𝑖 = 𝑋 + 𝑗=1 𝑑 𝑣 𝑗 𝑐 𝑖𝑗 (Part of Above Full Matrix Rep’n) Terminology: 𝑐 𝑖𝑗 are Called “PCs” and are also Called Scores

PCA Data Represent’n (Cont.) More Terminology: Scores, 𝑐 𝑖𝑗 are Coefficients in Spectral Representation: 𝑋 𝑖 = 𝑋 + 𝑗=1 𝑑 𝑣 𝑗 𝑐 𝑖𝑗 Loadings are Entries 𝑣 𝑖𝑗 of Eigenvectors: 𝑣 𝑗 = 𝑣 1𝑗 ⋮ 𝑣 𝑑𝑗

PCA Data Represent’n (Cont.) Note: PCA Scatterplot Matrix Views Provide a Rotation of Data, Where Axes Are Directions of Max. Variation By Plotting 𝑐 1𝑗 ,⋯, 𝑐 𝑛𝑗 on axis 𝑗

PCA Data Represent’n (Cont.) E.g. Recall Raw Data, Slightly Mean Shifted Gaussian Data Type equation here.

PCA Data Represent’n (Cont.) PCA Rotation: Scatterplot Matrix View 𝑐 11 ,⋯, 𝑐 𝑛1 𝑐 12 ,⋯, 𝑐 𝑛2 Type equation here.

PCA Data Represent’n (Cont.) PCA Rotates to Directions of Max. Variation

PCA Data Represent’n (Cont.) PCA Rotates to Directions of Max. Variation Will Use This Later

PCA Data Represent’n (Cont.) Reduced Rank Representation: 𝑋 𝑖 = 𝑋 + 𝑗=1 𝑘 𝑣 𝑗 𝑐 𝑖𝑗 Reconstruct Using Only 𝑘 (≪𝑑) Terms (Assuming Decreasing Eigenvalues)

PCA Data Represent’n (Cont.) Reduced Rank Representation: 𝑋 𝑖 = 𝑋 + 𝑗=1 𝑘 𝑣 𝑗 𝑐 𝑖𝑗 Reconstruct Using Only 𝑘 (≪𝑑) Terms (Assuming Decreasing Eigenvalues) Gives: Rank 𝑘 Approximation of Data Key to PCA Dimension Reduction And PCA for Data Compression (~ .jpeg)

PCA Data Represent’n (Cont.) Choice of in Reduced Rank Represent’n: Generally Very Slippery Problem Not Recommended: Arbitrary Choice E.g. % Variation Explained 90%? 95%? Type equation here.

PCA Data Represent’n (Cont.) Choice of in Reduced Rank Represent’n: Generally Very Slippery Problem SCREE Plot (Kruskal 1964): Find Knee in Power Spectrum

PCA Data Represent’n (Cont.) SCREE Plot Drawbacks: What is a Knee? What if There are Several? Knees Depend on Scaling (Power? log?) Personal Suggestions: Find Auxiliary Cutoffs (Inter-Rater Variation) Use the Full Range

PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution

PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution Approach: Invert PCA Data Represent’n where

PCA & Graphical Displays Small caution on PC directions & plotting: PCA directions (may) have sign flip Mathematically no difference Numerically caused artifact of round off Can have large graphical impact

PCA & Graphical Displays Toy Example (2 colored “clusters” in data)

PCA & Graphical Displays Toy Example (1 point moved)

PCA & Graphical Displays Toy Example (1 point moved) Important Point: Constant Axes

PCA & Graphical Displays Original Data (arbitrary PC flip)

PCA & Graphical Displays Point Moved Data (arbitrary PC flip) Much Harder To See Moving Point

PCA & Graphical Displays How to “fix directions”? One Option: Use ± 1 flip that gives: max 𝑖=1,⋯,𝑛 𝑃𝑟𝑜𝑗 𝑋 𝑖 > min 𝑖=1,⋯,𝑛 𝑃𝑟𝑜𝑗 𝑋 𝑖 (assumes 0 centered)

PCA & Graphical Displays How to “fix directions”? Personal Current Favorite: Use ± 1 flip that makes the projection vector 𝑣 = 𝑣 1 ⋮ 𝑣 𝑑 “point most towards” 1 ⋮ 1 i.e. makes 𝑗=1 𝑑 𝑣 𝑗 >0

Alternate PCA Computation Issue: for HDLSS data (recall 𝑑>𝑛) Σ May be Quite Large, 𝑑×𝑑 Thus Slow to Work with, and to Compute What About a Shortcut? Approach: Singular Value Decomposition (of (centered, scaled) Data Matrix 𝑋 )

Review of Linear Algebra (Cont.) Recall SVD Full Representation: = Graphics Display Assumes 𝑑>𝑛

Review of Linear Algebra (Cont.) Recall SVD Reduced Representation: =

Review of Linear Algebra (Cont.) Recall SVD Compact Representation: = where 𝑟= rank(𝑋)

Alternate PCA Computation Singular Value Decomposition, 𝑋 =𝑈𝑆 𝑉 𝑡 Computational Advantage (for Rank 𝑟): Use Compact Form, only need to find 𝑈 𝑑×𝑟 , 𝑆 𝑟×𝑟 , 𝑉 𝑡 𝑟×𝑛 e-vec’s s-val’s scores Other Components not Useful So can be much faster for 𝑑≫𝑛

Alternate PCA Computation Another Variation: Dual PCA Recall Data Matrix Views: 𝑋= 𝑋 11 ⋯ 𝑋 1𝑛 ⋮ ⋱ ⋮ 𝑋 𝑑1 ⋯ 𝑋 𝑑𝑛 𝑑×𝑛 Recall: Matlab & This Course Columns as Data Objects

Alternate PCA Computation Another Variation: Dual PCA Recall Data Matrix Views: 𝑋= 𝑋 11 ⋯ 𝑋 1𝑛 ⋮ ⋱ ⋮ 𝑋 𝑑1 ⋯ 𝑋 𝑑𝑛 𝑑×𝑛 Columns as Data Objects Rows as Data Objects Recall: R & SAS

Alternate PCA Computation Another Variation: Dual PCA Recall Data Matrix Views: 𝑋= 𝑋 11 ⋯ 𝑋 1𝑛 ⋮ ⋱ ⋮ 𝑋 𝑑1 ⋯ 𝑋 𝑑𝑛 𝑑×𝑛 Idea: Keep Both in Mind Columns as Data Objects Rows as Data Objects

Alternate PCA Computation Dual PCA Computation: Same as above, but replace 𝑋 with 𝑋 𝑡 So can almost replace Σ = 𝑋 𝑋 𝑡 with Σ 𝐷 = 𝑋 𝑡 𝑋 Then use SVD, 𝑋 =𝑈𝑆 𝑉 𝑡 , to get: Σ 𝐷 = 𝑋 𝑡 𝑋 = 𝑈𝑆 𝑉 𝑡 𝑡 𝑈𝑆 𝑉 𝑡 = =𝑉𝑆 𝑈 𝑡 𝑈𝑆 𝑉 𝑡 =𝑉 𝑆 2 𝑉 𝑡 Note: Same Eigenvalues

Alternate PCA Computation Appears to be cool symmetry: Primal  Dual Loadings  Scores  But, care is needed with the means and 𝑛−1 normalization …

Alternate PCA Computation Terminology: The Dual Covariance Matrix Σ 𝐷 = 𝑋 𝑡 𝑋 Is Sometimes Called the Gram Matrix

Functional Data Analysis Recall from Early Class Meeting: Spanish Mortality Data

Functional Data Analysis Interesting Data Set: Mortality Data For Spanish Males (thus can relate to history) Each curve is a single year x coordinate is age Note: Choice made of Data Object (could also study age as curves, x coordinate = time)

Functional Data Analysis Important Issue: What are the Data Objects? Curves (years) : Mortality vs. Age Curves (Ages) : Mortality vs. Year Note: Rows vs. Columns of Data Matrix

Mortality Time Series Recall Improved Coloring: Rainbow Representing Year: Magenta = 1908 Red = 2002

Mortality Time Series Object Space View of Projections Onto PC1 Direction Main Mode Of Variation: Constant Across Ages

Mortality Time Series Shows Major Improvement Over Time (medical technology, etc.) And Change In Age Rounding Blips

Mortality Time Series Object Space View of Projections Onto PC2 Direction 2nd Mode Of Variation: Difference Between 20-45 & Rest

Mortality Time Series Scores Plot Feature (Point Cloud) Space View Connecting Lines Highlight Time Order Good View of Historical Effects Mortality Time Series

Demography Data Dual PCA Idea: Rows and Columns trade places Terminology: from optimization Insights come from studying “primal” & “dual” problems Machine Learning Terminology: Gram Matrix PCA

Primal / Dual PCA Consider “Data Matrix” 88

Primal / Dual PCA Consider “Data Matrix” Primal Analysis: Columns are data vectors 89

Primal / Dual PCA Consider “Data Matrix” Dual Analysis: Rows are data vectors 90

Demography Data Recall Primal - Raw Data Rainbow Color Scheme Allowed Good Interpretation 91

Demography Data Dual PCA - Raw Data Hot Metal Color Scheme To Help Keep Primal & Dual Separate 92

Demography Data Color Code (Ages) 93

Demography Data Dual PCA - Raw Data Note: Flu Pandemic 94

Demography Data Dual PCA - Raw Data Note: Flu Pandemic & Spanish Civil War 95

Demography Data Dual PCA - Raw Data Curves Indexed By Ages 1-95 96

Demography Data Dual PCA - Raw Data 1st Year of Life Is Dangerous 97

Demography Data Dual PCA - Raw Data 1st Year of Life Is Dangerous Later Childhood Years Much Improved 98

Demography Data Dual PCA 99

Demography Data Dual PCA Years 1908-2002 on Horizontal Axes 100

Demography Data Dual PCA Note: Hard To See / Interpret Smaller Effects (Lost in Scaling) 101

Demography Data Dual PCA Choose Axis Limits To Maximize Visible Variation 102

Demography Data Dual PCA Mean Shows Some History Flu Pandemic Civil War 103

Demography Data Dual PCA PC1 Shows Mortality Increases With Age 104

Demography Data Dual PCA PC2 Shows Improvements Strongest For Young 105

Demography Data Dual PCA This Shows Improvements For All 106

Demography Data Dual PCA PC3 Shows Automobile Effects Contrast of 20-45 & Rest 107

Alternate PCA Computation Appears to be cool symmetry: Primal  Dual Loadings  Scores  But, care is needed with the means and 𝑛−1 normalization …

Demography Data Dual PCA Scores Linear Connections Highlight Age Ordering 109

Demography Data Dual PCA Scores Note PC2 & PC1 Together Show Mortality vs. Age 110

Demography Data Dual PCA Scores PC2 Captures “Age Rounding” 111

Demography Data Important Observation: Effects in Primal Scores (Loadings) ↕ ↕ Appear in Dual Loadings (Scores) (Would Be Exactly True, Except for Centering) (Auto Effects in PC2 & PC3 Shows This is Serious) 112

Primal / Dual PCA Which is “Better”? Same Info, Displayed Differently Here: Prefer Primal, As Indicated by Graphics Quality 113

Primal / Dual PCA Which is “Better”? In General: Either Can Be Best Try Both and Choose Or Use “Natural Choice” of Data Object 114

Primal / Dual PCA Important Early Version: BiPlot Display Overlay Primal & Dual PCAs Not Easy to Interpret Gabriel, K. R. (1971) 115

Object Space  Descriptor Space Cornea Data Early Example: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space

Radial Curvature as “Heat Map” Cornea Data Cornea: Outer surface of the eye Driver of Vision: Curvature of Cornea Data Objects: Images on the unit disk Radial Curvature as “Heat Map” Special Thanks to K. L. Cohen, N. Tripoli, UNC Ophthalmology

Cornea Data Cornea Data: Raw Data Decompose Into Modes of Variation?

Cornea Data Reference: Locantore, et al (1999) Visualization (generally true for images): More challenging than for curves (since can’t overlay) Instead view sequence of images Harder to see “population structure” (than for curves) So PCA type decomposition of variation is more important

Cornea Data Nature of images (on the unit disk, not usual rectangle) Color is “curvature” Along radii of circle (direction with most effect on vision) Hotter (red, yellow) for “more curvature” Cooler (blue, green) for “less curvature” Descriptor vector is coefficients of Zernike expansion Zernike basis: ~ Fourier basis, on disk Conveniently represented in polar coord’s

Cornea Data Data Representation - Zernike Basis Pixels as features is large and wasteful Natural to find more efficient represent’n Polar Coordinate Tensor Product of: Fourier basis (angular) Special Jacobi (radial, to avoid singularities) See: Schwiegerling, Greivenkamp & Miller (1995) Born & Wolf (1980)

Cornea Data Data Representation - Zernike Basis Choice of Basis Dimension: Based on Collaborator’s Expertise Large Enough for Important Features Not Too Large to Eliminate Noise

Cornea Data Data Representation - Zernike Basis Descriptor Space is Vector Space of Zernike Coefficients So Perform PCA There Then Visualize in Image (Object) Space

PCA of Cornea Data Recall: PCA can find (often insightful) direction of greatest variability Main problem: display of result (no overlays for images) Solution: show movie of “marching along the direction vector”

PCA of Cornea Data PC1 Movie:

PCA of Cornea Data PC1 Summary: Mean (1st image): mild vert’l astigmatism known pop’n structure called “with the rule” Main dir’n: “more curved” & “less curved” Corresponds to first optometric measure (89% of variat’n, in Mean Resid. SS sense) Also: “stronger astig’m” & “no astig’m” Found corr’n between astig’m and curv’re Scores (cyan): Apparent Gaussian dist’n

PCA of Cornea Data PC2 Movie:

PCA of Cornea Data PC2 Movie: Mean: same as above Common centerpoint of point cloud Are studying “directions from mean” Images along direction vector: Looks terrible??? Why?

PCA of Cornea Data PC2 Movie: Reason made clear in Scores Plot (cyan): Single outlying data object drives PC dir’n A known problem with PCA Recall finds direction with “max variation” In sense of variance Easily dominated by single large observat’n

PCA of Cornea Data Toy Example: Single Outlier Driving PCA

PCA of Cornea Data PC2 Affected by Outlier: How bad is this problem? View 1: Statistician: Arrggghh!!!! Outliers are very dangerous Can give arbitrary and meaningless dir’ns

PCA of Cornea Data PC2 Affected by Outlier: How bad is this problem? View 2: Ophthalmologist: No Problem Driven by “edge effects” (see raw data) Artifact of “light reflection” data gathering (“eyelid blocking”, and drying effects) Routinely “visually ignore” those anyway Found interesting (& well known) dir’n: steeper superior vs steeper inferior

Cornea Data Cornea Data: Raw Data Which one is the outlier? Will say more later …

PCA of Cornea Data PC3 Movie

PCA of Cornea Data PC3 Movie (ophthalmologist’s view): Edge Effect Outlier is present But focusing on “central region” shows changing dir’n of astig’m (3% of MR SS) “with the rule” (vertical) vs. “against the rule” (horizontal) most astigmatism is “with the rule” most of rest is “against the rule” (known folklore)

PCA of Cornea Data PC4 movie

PCA of Cornea Data Continue with ophthalmologists view… PC4 movie version: Other direction of astigmatism??? Location (i.e. “registration”) effect??? Harder to interpret … OK, since only 1.7% of MR SS Substantially less than for PC2 & PC3

PCA of Cornea Data Ophthalmologists View (cont.) Overall Impressions / Conclusions: Useful decomposition of population variation Useful insight into population structure

PCA of Cornea Data Now return to Statistician’s View: How can we handle these outliers? Even though not fatal here, can be for other examples… Simple Toy Example (in 2d):

Outliers in PCA Deeper Toy Example:

Outliers in PCA Deeper Toy Example: Why is green curve an outlier? Never leaves range of other data But Euclidean distance to others very large relative to other distances Also major difference in terms of shape And even smoothness Important lesson: ∃ many directions in ℝ 𝑑

Outliers in PCA Much like earlier Parabolas Example But with thrown in

Outliers in PCA PCA for Deeper Toy E.g. Data:

Outliers in PCA Deeper Toy Example: At first glance, mean and PC1 look similar to no outlier version PC2 clearly driven completely by outlier PC2 scores plot (on right) gives clear outlier diagnostic Outlier does not appear in other directions Previous PC2, now appears as PC3 Total Power (upper right plot) now “spread farther”

Outliers in PCA Closer Look at Deeper Toy Example: Mean “influenced” a little, by the outlier Appearance of “corners” at every other coordinate PC1 substantially “influenced” by the outlier Clear “wiggles”

Outliers in PCA What can (should?) be done about outliers? Context 1: Outliers are important aspects of the population They need to be highlighted in the analysis Although could separate into subpopulations Context 2: Outliers are “bad data”, of no interest recording errors? Other mistakes? Then should avoid distorted view of PCA

Outliers in PCA Two Differing Goals for Outliers: Avoid Major Influence on Analysis Find Interesting Data Points (e.g. In-liers) Wilkinson (2017)

but downweight “bad data” Outliers in PCA Standard Statistical Approaches to Dealing with Influential Outliers: Outlier Deletion: Kick out “bad data” Robust Statistical methods: Work with full data set, but downweight “bad data” Reduce influence, instead of “deleting” (Think Median)

Outliers in PCA Example Cornea Data: Can find PC2 outlier (by looking through data (careful!)) Problem: after removal, another point dominates PC2 Could delete that, but then another appears After 4th step have eliminated 10% of data (𝑛=43)

Outliers in PCA Example Cornea Data

Outliers in PCA Motivates alternate approach: Robust Statistical Methods Recall main idea: Downweight (instead of delete) outliers ∃ a large literature. Good intro’s (from different viewpoints) are: Huber (2011) Hampel, et al (2011) Staudte & Sheather (2011)

Outliers in PCA Simple robustness concept: breakdown point how much of data “moved to ” will “destroy estimate”? Usual mean has breakdown 0 Median has breakdown ½ (best possible) Conclude: Median much more robust than mean Median uses all data Median gets good breakdown from “equal vote”

Outliers in PCA Mean has breakdown 0 Single Outlier Pulls Mean Outside range of data

Outliers in PCA Controversy: Is median’s “equal vote” scheme good or bad? Huber: Outliers contain some information, So should only control “influence” (e.g. median) Hampel, et. al.: Outliers contain no useful information Should be assigned weight 0 (not done by median) Using “proper robust method” (not simply deleted)

Outliers in PCA Robustness Controversy (cont.): Both are “right” (depending on context) Source of major (unfortunately bitter) debate! Application to Cornea data: Huber’s model more sensible Already know ∃ some useful info in each data point Thus “median type” methods are sensible

Robust PCA What is multivariate median? There are several! (“median” generalizes in different ways) Coordinate-wise median Often worst Not rotation invariant (2-d data uniform on “L”) Can lie on convex hull of data (same example) Thus poor notion of “center”

Robust PCA Coordinate-wise median Not rotation invariant Thus poor notion of “center”

Robust PCA Coordinate-wise median Can lie on convex hull of data Thus poor notion of “center”

Robust PCA What is multivariate median (cont.)? ii. Simplicial depth (a. k. a. “data depth”): Liu (1990) “Paint Thickness” of 𝑑+1 dim “simplices” with corners at data Nice idea Good invariance properties Slow to compute

(minimal impact by outliers) Robust PCA What is multivariate median (cont.)? iii. Huber’s 𝐿 𝑝 M-estimate: Given data 𝑋 1 ,⋯, 𝑋 𝑛 ∈ ℝ 𝑑 , Estimate “center of population” by 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝜃 𝑖=1 𝑛 𝑋 𝑖 −𝜃 2 𝑝 Where ∙ 2 is the usual Euclidean norm Here: use only 𝑝=1 (minimal impact by outliers)

Robust PCA Huber’s 𝐿 𝑝 M-estimate (cont): Estimate “center of population” by 𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝜃 𝑖=1 𝑛 𝑋 𝑖 −𝜃 2 𝑝 Case 𝑝=2: Can show 𝜃 = 𝑋 (sample mean) (also called “Fréchet Mean”, …) Again Here: use only 𝑝=1 (minimal impact by outliers)

Robust PCA 𝐿 1 M-estimate (cont.): A view of minimizer: solution of 0= 𝜕 𝜕𝜃 𝑖=1 𝑛 𝑋 𝑖 −𝜃 2 = 𝑖=1 𝑛 𝑋 𝑖 −𝜃 𝑋 𝑖 −𝜃 2 A useful viewpoint is based on: 𝑃 𝑆𝑝ℎ(𝜃,1) = “Proj’n of data onto sphere centered at 𝜃 with radius 1” And representation: 𝑃 𝑆𝑝ℎ(𝜃,1) 𝑋 𝑖 =𝜃+ 𝑋 𝑖 −𝜃 𝑋 𝑖 −𝜃 2

Robust PCA 𝐿 1 M-estimate (cont.): Thus the solution of 0= 𝑖=1 𝑛 𝑋 𝑖 −𝜃 𝑋 𝑖 −𝜃 2 = 𝑖=1 𝑛 𝑃 𝑆𝑝ℎ(𝜃,1) 𝑋 𝑖 −𝜃 is the solution of: 0=𝑎𝑣𝑔 𝑃 𝑆𝑝ℎ(𝜃,1) 𝑋 𝑖 −𝜃:𝑖=1,⋯,𝑛 So 𝜃 is location where projected data are centered “Slide sphere around until mean (of projected data) is at center”

Robust PCA 𝐿 1 M-estimate (cont.): Data are + signs

Robust PCA M-estimate (cont.): Data are + signs Sample Mean, 𝑋 outside “hot dog” of data

Robust PCA M-estimate (cont.): Candidate Sphere Center, 𝜃

Robust PCA M-estimate (cont.): Candidate Sphere Center, 𝜃 Projections Of Data

Robust PCA M-estimate (cont.): Candidate Sphere Center, 𝜃 Projections Of Data Mean of

Robust PCA M-estimate (cont.): “Slide sphere around until mean (of projected data) is at center”

(see also Sec. 3.2 of Huber (2011)). Robust PCA M-estimate (cont.): Additional literature: Called “geometric median” (long before Huber) by: Haldane (1948) Shown unique for 𝑑>1 by: Milasevic and Ducharme (1987) Useful iterative algorithm: Gower (1974) (see also Sec. 3.2 of Huber (2011)). Cornea Data experience: works well for 𝑑=66

Robust PCA M-estimate for Cornea Data: Sample Mean M-estimate Definite improvement But outliers still have some influence Improvement? (will suggest one soon)

Robust PCA Now have robust measure of “center”, how about “spread”? I.e. how can we do robust PCA?

Robust PCA Now have robust measure of “center”, how about “spread”? Parabs e.g. from above With an “outlier” (???) Added in

Robust PCA Now have robust measure of “center”, how about “spread”? Small Impact on Mean

Robust PCA Now have robust measure of “center”, how about “spread”? Small Impact on Mean More on PC1 Dir’n

Robust PCA Now have robust measure of “center”, how about “spread”? Small Impact on Mean More on PC1 Dir’n Dominates Residuals Thus PC2 Dir’n & PC2 scores

Robust PCA Now have robust measure of “center”, how about “spread”? Small Impact on Mean More on PC1 Dir’n Dominates Residuals Thus PC2 Dir’n & PC2 scores Tilt now in PC3 Viualization is very Useful diagnostic

Robust PCA Now have robust measure of “center”, how about “spread”? can we do robust PCA?