Object Orie’d Data Analysis, Last Time

Slides:



Advertisements
Similar presentations
Eigen Decomposition and Singular Value Decomposition
Advertisements

3D Geometry for Computer Graphics
Component Analysis (Review)
Dimension reduction (1)
Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.
Affine-invariant Principal Components Charlie Brubaker and Santosh Vempala Georgia Tech School of Computer Science Algorithms and Randomness Center.
Computer Graphics Recitation 5.
3D Geometry for Computer Graphics
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensional reduction, PCA
Linear Methods for Classification
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
3D Geometry for Computer Graphics
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Object Orie’d Data Analysis, Last Time Finished Algebra Review Multivariate Probability Review PCA as an Optimization Problem (Eigen-decomp. gives rotation,
Detailed Look at PCA Three Important (& Interesting) Viewpoints: 1. Mathematics 2. Numerics 3. Statistics 1 st : Review Linear Alg. and Multivar. Prob.
Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.
Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.
Object Orie’d Data Analysis, Last Time HDLSS Discrimination –MD much better Maximal Data Piling –HDLSS space is a strange place Kernel Embedding –Embed.
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: More NCI 60 Data.
Principles of Pattern Recognition
Object Orie’d Data Analysis, Last Time Distance Weighted Discrimination: Revisit microarray data Face Data Outcomes Data Simulation Comparison.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’
Object Orie’d Data Analysis, Last Time Classification / Discrimination Classical Statistical Viewpoint –FLD “good” –GLR “better” –Conclude always do GLR.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.
Discriminant Analysis
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.
Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,
Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,
1 UNC, Stat & OR U. C. Davis, F. R. G. Workshop Object Oriented Data Analysis J. S. Marron Dept. of Statistics and Operations Research, University of North.
1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, I J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.
PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!
GWAS Data Analysis. L1 PCA Challenge: L1 Projections Hard to Interpret (i.e. Little Data Insight) Solution: 1)Compute PC Directions Using L1 2)Compute.
Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)
PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Cornea Data Main Point: OODA Beyond FDA Recall Interplay: Object Space  Descriptor Space.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
CSE 4705 Artificial Intelligence
Introduction to Vectors and Matrices
Return to Big Picture Main statistical goals of OODA:
Object Orie’d Data Analysis, Last Time
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
School of Computer Science & Engineering
LECTURE 10: DISCRIMINANT ANALYSIS
CH 5: Multivariate Methods
Pattern Classification, Chapter 3
Maximal Data Piling MDP in Increasing Dimensions:
Classification Discriminant Analysis
Participant Presentations
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Feature space tansformation methods
Generally Discriminant Analysis
LECTURE 09: DISCRIMINANT ANALYSIS
Introduction to Vectors and Matrices
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

Object Orie’d Data Analysis, Last Time Classification / Discrimination Try to Separate Classes +1 & -1 Statistics & EECS viewpoints Introduced Simple Methods Mean Difference Naïve Bayes Fisher Linear Discrimination (nonparametric view) Gaussian Likelihood ratio Started Comparing

Classification - Discrimination Important Distinction: Classification vs. Clustering Useful terminology: Classification: supervised learning Clustering: unsupervised learning

Fisher Linear Discrimination Graphical Introduction (non-Gaussian): HDLSSod1egFLD.ps

Classical Discrimination FLD for Tilted Point Clouds – Works well PEod1FLDe1.ps

Classical Discrimination GLR for Tilted Point Clouds – Works well PEod1GLRe1.ps

Classical Discrimination FLD for Donut – Poor, no plane can work PEdonFLDe1.ps

Classical Discrimination GLR for Donut – Works well (good quadratic) PEdonGLRe1.ps

Classical Discrimination FLD for X – Poor, no plane can work PExd3FLDe1.ps

Classical Discrimination GLR for X – Better, but not great PExd3GLRe1.ps

Classical Discrimination Summary of FLD vs. GLR: Tilted Point Clouds Data FLD good GLR good Donut Data FLD bad X Data GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)

Classical Discrimination FLD Generalization II (Gen. I was GLR) Different prior probabilities Main idea: Give different weights to 2 classes I.e. assume not a priori equally likely Development is “straightforward” Modified likelihood Change intercept in FLD Won’t explore further here

Classical Discrimination FLD Generalization III Principal Discriminant Analysis Idea: FLD-like approach to > two classes Assumption: Class covariance matrices are the same (similar) (but not Gaussian, same situation as for FLD) Main idea: Quantify “location of classes” by their means

Classical Discrimination Principal Discriminant Analysis (cont.) Simple way to find “interesting directions” among the means: PCA on set of means i.e. Eigen-analysis of “between class covariance matrix” Where Aside: can show: overall

Classical Discrimination Principal Discriminant Analysis (cont.) But PCA only works like Mean Difference, Expect can improve by taking covariance into account. Blind application of above ideas suggests eigen-analysis of:

Classical Discrimination Principal Discriminant Analysis (cont.) There are: smarter ways to compute (“generalized eigenvalue”) other representations (this solves optimization prob’s) Special case: 2 classes, reduces to standard FLD Good reference for more: Section 3.8 of: Duda, Hart & Stork (2001)

Classical Discrimination Summary of Classical Ideas: Among “Simple Methods” MD and FLD sometimes similar Sometimes FLD better So FLD is preferred Among Complicated Methods GLR is best So always use that Caution: Story changes for HDLSS settings

(requires root inverse covariance) HDLSS Discrimination Recall main HDLSS issues: Sample Size, n < Dimension, d Singular covariance matrix So can’t use matrix inverse I.e. can’t standardize (sphere) the data (requires root inverse covariance) Can’t do classical multivariate analysis

HDLSS Discrimination An approach to non-invertible covariances: Replace by generalized inverses Sometimes called pseudo inverses Note: there are several Here use Moore Penrose inverse As used by Matlab (pinv.m) Often provides useful results (but not always) Recall Linear Algebra Review…

Recall Linear Algebra Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that: , i.e.

Recall Linear Algebra (Cont.) Eigenvalue Decomp. solves matrix problems: Inversion: Square Root: is positive (nonn’ve, i.e. semi) definite all

Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: For

Recall Linear Algebra (Cont.) Easy to see this satisfies the definition of Generalized (Pseudo) Inverse symmetric

Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: Idea: matrix inverse on non-null space of linear transformation Reduces to ordinary inverse, in full rank case, i.e. for r = d, so could just always use this Tricky aspect: “>0 vs. = 0” & floating point arithmetic

HDLSS Discrimination Application of Generalized Inverse to FLD: Direction (Normal) Vector: Intercept: Have replaced by

HDLSS Discrimination Toy Example: Increasing Dimension data vectors: Entry 1: Class +1: Class –1: Other Entries: All Entries Independent Look through dimensions,

HDLSS Discrimination Increasing Dimension Example Proj. on Opt’l Dir’n FLD Dir’n both Dir’ns

HDLSS Discrimination Add a 2nd Dimension (noise) Same Proj. on Opt’l Dir’n Axes same as dir’ns Now See 2 Dim’ns

HDLSS Discrimination Add a 3rd Dimension (noise) Project on 2-d subspace generated by optimal dir’n & by FLD dir’n

HDLSS Discrimination Movie Through Increasing Dimensions

HDLSS Discrimination FLD in Increasing Dimensions: Low dimensions (d = 2-9): Visually good separation Small angle between FLD and Optimal Good generalizability Medium Dimensions (d = 10-26): Visual separation too good?!? Larger angle between FLD and Optimal Worse generalizability Feel effect of sampling noise

HDLSS Discrimination FLD in Increasing Dimensions: High Dimensions (d=27-37): Much worse angle Very poor generalizability But very small within class variation Poor separation between classes Large separation / variation ratio

HDLSS Discrimination FLD in Increasing Dimensions: At HDLSS Boundary (d=38): 38 = degrees of freedom (need to estimate 2 class means) Within class variation = 0 ?!? Data pile up, on just two points Perfect separation / variation ratio? But only feels microscopic noise aspects So likely not generalizable Angle to optimal very large

HDLSS Discrimination FLD in Increasing Dimensions: Just beyond HDLSS boundary (d=39-70): Improves with higher dimension?!? Angle gets better Improving generalizability? More noise helps classification?!?

(populations overlap) HDLSS Discrimination FLD in Increasing Dimensions: Far beyond HDLSS boun’ry (d=70-1000): Quality degrades Projections look terrible (populations overlap) And Generalizability falls apart, as well Math’s worked out by Bickel & Levina (2004) Problem is estimation of d x d covariance matrix