Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.

Slides:



Advertisements
Similar presentations
EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.
Advertisements

PCA for analysis of complex multivariate data. Interpretation of large data tables by PCA In industry, research and finance the amount of data is often.
Surface normals and principal component analysis (PCA)
Independent Component Analysis Personal Viewpoint: Directions that maximize independence Motivating Context: Signal Processing “Blind Source Separation”
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
x – independent variable (input)
MSU CSE 803 Stockman Fall 2009 Vectors [and more on masks] Vector space theory applies directly to several image processing/representation problems.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Basic Concepts and Definitions Vector and Function Space. A finite or an infinite dimensional linear vector/function space described with set of non-unique.
3D Geometry for Computer Graphics
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
1 MSU CSE 803 Fall 2014 Vectors [and more on masks] Vector space theory applies directly to several image processing/representation problems.
Subdivision Analysis via JSR We already know the z-transform formulation of schemes: To check if the scheme generates a continuous limit curve ( the scheme.
Separate multivariate observations
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka Virginia de Sa (UCSD) Cogsci 108F Linear.
Statistics – O. R. 892 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Chapter 2 Dimensionality Reduction. Linear Methods
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data.
Object Orie’d Data Analysis, Last Time
Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.
Stat 155, Section 2, Last Time Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary & Outlier Rule Transformation.
Yaomin Jin Design of Experiments Morris Method.
Statistics – O. R. 891 Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research University of North Carolina.
Robust PCA Robust PCA 3: Spherical PCA. Robust PCA.
Object Orie’d Data Analysis, Last Time Primal – Dual PCA vs. SVD (not comparable) Vectors (discrete) vs. Functions (contin ’ s) PCA for shapes – Corpus.
Object Orie’d Data Analysis, Last Time Discrimination for manifold data (Sen) –Simple Tangent plane SVM –Iterated TANgent plane SVM –Manifold SVM Interesting.
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
Stat 31, Section 1, Last Time Time series plots Numerical Summaries of Data: –Center: Mean, Medial –Spread: Range, Variance, S.D., IQR 5 Number Summary.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Zooming version, -- Dependent version –Mass flux data, -- Cell cycle data Image Analysis –1 st Generation.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Common Property of Shape Data Objects: Natural Feature Space is Curved I.e. a Manifold (from Differential Geometry) Shapes As Data Objects.
CS 376 Introduction to Computer Graphics 04 / 25 / 2007 Instructor: Michael Eckmann.
EIGENSYSTEMS, SVD, PCA Big Data Seminar, Dedi Gadot, December 14 th, 2014.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
Principal Component Analysis (PCA)
1. Systems of Linear Equations and Matrices (8 Lectures) 1.1 Introduction to Systems of Linear Equations 1.2 Gaussian Elimination 1.3 Matrices and Matrix.
1 Overview representing region in 2 ways in terms of its external characteristics (its boundary)  focus on shape characteristics in terms of its internal.
Instructor: Mircea Nicolescu Lecture 7
CS654: Digital Image Analysis Lecture 11: Image Transforms.
Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Chapter 13 Discrete Image Transforms
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.
Computer Graphics CC416 Lecture 04: Bresenham Line Algorithm & Mid-point circle algorithm Dr. Manal Helal – Fall 2014.
Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Linear Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp.
Data statistics and transformation revision Michael J. Watts
Singular Value Decomposition and its applications
Last Time Proportions Continuous Random Variables Probabilities
LECTURE 10: DISCRIMINANT ANALYSIS
Functional Data Analysis
Statistics – O. R. 881 Object Oriented Data Analysis
Maximal Data Piling MDP in Increasing Dimensions:
Principal Nested Spheres Analysis
Participant Presentations
Principal Component Analysis
Parallelization of Sparse Coding & Dictionary Learning
Feature space tansformation methods
LECTURE 09: DISCRIMINANT ANALYSIS
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Object Orie’d Data Analysis, Last Time PCA Redistribution of Energy - ANOVA PCA Data Representation PCA Simulation Alternate PCA Computation Primal – Dual PCA vs. SVD (centering by means is key)

Primal - Dual PCA Toy Example 3: Random Curves, all in Dual Space: 1 * Constant Shift 2 * Linear 4 * Quadratic 8 * Cubic (chosen to be orthonormal) Plus (small) i.i.d. Gaussian noise d = 40, n = 20

Primal - Dual PCA Toy Example 3: Raw Data

Primal - Dual PCA Toy Example 3: Raw Data Similar Structure to e.g. 1 But Rows and Columns trade places And now cubics visually dominant (as expected)

Primal - Dual PCA Toy Example 3: Primal PCA Column Curves as Data Gaussian Noise Only 3 components Poly Scores (as expected)

Primal - Dual PCA Toy Example 3: Dual PCA Row Curves as Data Components as expected No Gram-Schmidt (since stronger signal)

Primal - Dual PCA Toy Example 3: SVD – Matrix-Image

Primal - Dual PCA Toy Example 4: Mystery #1

Primal - Dual PCA Toy Example 4: SVD – Curves View

Primal - Dual PCA Toy Example 4: SVD – Matrix-Image

Primal - Dual PCA Toy Example 4: Mystery #1 Structure: Primal - Dual Constant Gaussian Gaussian Linear Parabola Gaussian Gaussian Cubic Nicely revealed by Full Matrix decomposition and views

Primal - Dual PCA Toy Example 5: Mystery #2

Primal - Dual PCA Toy Example 5: SVD – Curves View

Primal - Dual PCA Toy Example 5: SVD – Matrix-Image

Primal - Dual PCA Toy Example 5: Mystery #2 Structure: Primal - Dual Constant Linear Parabola Cubic Gaussian Gaussian Visible via either curves, or matrices …

Primal - Dual PCA Is SVD (i.e. no mean centering) always “ better ” ? What does “ better ” mean??? A definition: Provides most useful insights into data Others???

Primal - Dual PCA Toy Example where SVD is less informative: Simple Two dimensional Key is subtraction of mean is bad I.e. Mean dir ’ n different from PC dir ’ ns And Mean Less Informative

Primal - Dual PCA Toy Example where SVD is less informative: Raw Data

Primal - Dual PCA PC1 mode of variation (centered at mean): Yields useful major mode of variation

Primal - Dual PCA PC2 mode of variation (centered at mean): Informative second mode of variation

Primal - Dual PCA SV1 mode of variation (centered at 0): Unintuitive major mode of variation

Primal - Dual PCA SV2 mode of variation (centered at 0): Unintuitive second mode of variation

Primal - Dual PCA Summary of SVD: Does give a decomposition I.e. sum of two pieces is data But not good insights about data structure Since center point of analysis is far from center point of data So mean strongly influences the impression of variation Maybe better to keep these separate???

Primal - Dual PCA Bottom line on: Primal PCA vs. SVD vs. Dual PCA These are not comparable: Each has situations where it is “ best ” And where it is “ worst ” Generally should consider all And choose on basis of insights See work of Lingsong Zhang on this …

Vectors vs. Functions Recall overall structure: Object Space Feature Space Curves (functions) Vectors Connection 1: Digitization Parallel Coordinates Connection 2: Basis Representation

Vectors vs. Functions Connection 1: Digitization: Given a function, define vector Where is a suitable grid, e.g. equally spaced:

Vectors vs. Functions Connection 1: Parallel Coordinates: Given a vector, define a function where And linearly interpolate to “ connect the dots ” Proposed as High Dimensional Visualization Method by Inselberg (1985)

Vectors vs. Functions Parallel Coordinates: Given, define Now can “ rescale argument ” To get function on [0,1], evaluated at equally spaced grid

Vectors vs. Functions Bridge between vectors & functions: Vectors  Functions Isometry follows from convergence of: Inner Products By Reimann Summation

Vectors vs. Functions Main lesson: -OK to think about functions -But actually work with vectors For me, there is little difference But there is a statistical theory, and mathematical statistical literature on this Start with Ramsay & Silverman (2005)

Vectors vs. Functions Recall overall structure: Object Space Feature Space Curves (functions) Vectors Connection 1: Digitization Parallel Coordinates Connection 2: Basis Representation

Vectors vs. Functions Connection 2: Basis Representations: Given an orthonormal basis (in function space) E.g. –Fourier –B-spline –Wavelet Represent functions as:

Vectors vs. Functions Connection 2: Basis Representations: Represent functions as: Bridge between discrete and continuous:

Vectors vs. Functions Connection 2: Basis Representations: Represent functions as: Finite dimensional approximation: Again there is mathematical statistical theory, based on (same ref.)

Vectors vs. Functions Repeat Main lesson: -OK to think about functions -But actually work with vectors For me, there is little difference (but only personal taste)

PCA for shapes New Data Set: Corpus Callossum Data “ Window ” between right and left halves of the brain From a vertical slice MR image of head “ Segmented ” (ie. found boundary) Shape is resulting closed curve Have sample from n = 71 people Feature vector of d = 80 coeffic ’ ts from Fourier boundary representation (closed curve)

PCA for shapes Raw Data: Special thanks to Sean HoSean Ho View curves asmovie Modes of variation?

PCA for shapes PC1: Movie shows evolution along eigenvector Projections in bottom plot 2 Data Subclasses Schizophrenics Controls

PCA for shapes PC1 Summary (Corpus Callossum Data) Direction is “ overall bending ” Colors studied later (sub populations) An outlier??? Find it in the data? Case 2: could delete & repeat (will study outliers in more detail)

PCA for shapes Raw Data: This time with numbers So can identify outlier

PCA for shapes PC2: Movie shows evolution along eigenvector Projections in bottom plot

PCA for shapes PC2 Summary (Corpus Callossum Data) Rotation of right end “ Sharpening ” of left end “ Location ” of left end These are correlated with each other But independent of PC1

PCA for shapes PC3: Thin vs. fat Important mode of variation?

PCA for shapes Raw Data: Revisit to look for 3 modes Bending Endpts Thinning

PCA for shapes Raw Data: Medial Repr ’ n Heart is Medial Atoms Spokes imply boundary Modes of Variation?

PCA for shapes PC1 Summary (medial representation) From same data as above Fourier boundary rep ’ n But they look different Since different type of fitting was done Also, worst outlier was deleted Modes of variation?

PCA for shapes PC1: Overall Bending Same as for Fourier above Corr ’ d with right end fattening

PCA for shapes PC2: Rotation of ends Similar to PC2 of Fourier rep ’ n above

PCA for shapes PC3: Distortion of Curvature Different from PC2 of Fourier rep ’ n above

PCA for shapes PC3 Summary (medial representation) Systematic “ distortion of curvature ” This time different from above Fourier boundary PC3 Lesson: different rep ’ ns focus on different aspects of data I.e. not just differences in fitting But instead on features that are emphasized Thus choice of “ features ” is very important

PCA for shapes PC4: Fattening and Thinning? Relate to Fourier rep ’ n ???

PCA for shapes PC4 Summary (medial representation) more like fattening and thinning i.e. similar to Fourier boundary PC3 (view again below) but “ more local ” in nature an important property of M-reps

PCA for shapes PC3: Review this For Comparison with PC4 from M-reps

Cont. vs. discrete Need to say something about this