Object Orie’d Data Analysis, Last Time

Slides:

Advertisements

Similar presentations

Eigen Decomposition and Singular Value Decomposition

Advertisements

Chapter 28 – Part II Matrix Operations. Gaussian elimination Gaussian elimination LU factorization LU factorization Gaussian elimination with partial.

STOR 892 Object Oriented Data Analysis Radial Distance Weighted Discrimination Jie Xiong Advised by Prof. J.S. Marron Department of Statistics and Operations.

Component Analysis (Review)

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

Object Orie’d Data Analysis, Last Time Finished NCI 60 Data Started detailed look at PCA Reviewed linear algebra Today: More linear algebra Multivariate.

3D Geometry for Computer Graphics

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Dimensional reduction, PCA

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces –Tree spaces –No Tangent Plane Classification - Discrimination.

3D Geometry for Computer Graphics

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Matrices CS485/685 Computer Vision Dr. George Bebis.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

Probability of Error Feature vectors typically have dimensions greater than 50. Classification accuracy depends upon the dimensionality and the amount.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:

Object Orie’d Data Analysis, Last Time OODA in Image Analysis –Landmarks, Boundary Rep ’ ns, Medial Rep ’ ns Mildly Non-Euclidean Spaces –M-rep data on.

Object Orie’d Data Analysis, Last Time

Principles of Pattern Recognition

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification  Introduction  Parametric classifiers  Semi-parametric.

Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Object Orie’d Data Analysis, Last Time Gene Cell Cycle Data Microarrays and HDLSS visualization DWD bias adjustment NCI 60 Data Today: Detailed (math ’

1 UNC, Stat & OR Hailuoto Workshop Object Oriented Data Analysis, II J. S. Marron Dept. of Statistics and Operations Research, University of North Carolina.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Principal Component Analysis Machine Learning. Last Time Expectation Maximization in Graphical Models – Baum Welch.

Maximal Data Piling Visual similarity of & ? Can show (Ahn & Marron 2009), for d < n: I.e. directions are the same! How can this be? Note lengths are different.

Discriminant Analysis

Return to Big Picture Main statistical goals of OODA: Understanding population structure –Low dim ’ al Projections, PCA … Classification (i. e. Discrimination)

Elements of Pattern Recognition CNS/EE Lecture 5 M. Weber P. Perona.

Object Orie’d Data Analysis, Last Time SiZer Analysis –Statistical Inference for Histograms & S.P.s Yeast Cell Cycle Data OODA in Image Analysis –Landmarks,

Participant Presentations Please Sign Up: Name (Onyen is fine, or …) Are You ENRolled? Tentative Title (???? Is OK) When: Thurs., Early, Oct., Nov.,

PCA as Optimization (Cont.) Recall Toy Example Empirical (Sample) EigenVectors Theoretical Distribution & Eigenvectors Different!

PCA Data Represent ’ n (Cont.). PCA Simulation Idea: given Mean Vector Eigenvectors Eigenvalues Simulate data from Corresponding Normal Distribution.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

CSE 4705 Artificial Intelligence

Return to Big Picture Main statistical goals of OODA:

PREDICT 422: Practical Machine Learning

Background on Classification

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

School of Computer Science & Engineering

LECTURE 10: DISCRIMINANT ANALYSIS

Machine Learning Basics

Maximal Data Piling MDP in Increasing Dimensions:

Participant Presentations

Classification Discriminant Analysis

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Probabilistic Models with Latent Variables

CS485/685 Computer Vision Dr. George Bebis

Derivative of scalar forms

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Feature space tansformation methods

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Generally Discriminant Analysis

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

LECTURE 09: DISCRIMINANT ANALYSIS

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Subject :- Applied Mathematics

Marios Mattheakis and Pavlos Protopapas

Presentation transcript:

Object Orie’d Data Analysis, Last Time Mildly Non-Euclidean Spaces Strongly Non-Euclidean Spaces Tree spaces No Tangent Plane Classification - Discrimination Mean Difference (Centroid method) Fisher Linear Discrimination Graphical Viewpoint (nonparametric) Maximum Likelihood Derivation (Gaussian based)

Classification - Discrimination Background: Two Class (Binary) version: Using “training data” from Class +1 and Class -1 Develop a “rule” for assigning new data to a Class Canonical Example: Disease Diagnosis New Patients are “Healthy” or “Ill” Determined based on measurements

Classification - Discrimination Important Distinction: Classification vs. Clustering Classification: Class labels are known, Goal: understand differences Clustering: Goal: Find class labels (to be similar) Both are about clumps of similar data, but much different goals

Classification - Discrimination Important Distinction: Classification vs. Clustering Useful terminology: Classification: supervised learning Clustering: unsupervised learning

Fisher Linear Discrimination Graphical Introduction (non-Gaussian): HDLSSod1egFLD.ps

Classical Discrimination Above derivation of FLD was: Nonstandard Not in any textbooks(?) Nonparametric (don’t need Gaussian data) I.e. Used no probability distributions More Machine Learning than Statistics

Classical Discrimination Summary of FLD vs. GLR: Tilted Point Clouds Data FLD good GLR good Donut Data FLD bad X Data GLR OK, not great Classical Conclusion: GLR generally better (will see a different answer for HDLSS data)

Classical Discrimination FLD Generalization II (Gen. I was GLR) Different prior probabilities Main idea: Give different weights to 2 classes I.e. assume not a priori equally likely Development is “straightforward” Modified likelihood Change intercept in FLD Won’t explore further here

Classical Discrimination FLD Generalization III Principal Discriminant Analysis Idea: FLD-like approach to > two classes Assumption: Class covariance matrices are the same (similar) (but not Gaussian, same situation as for FLD) Main idea: Quantify “location of classes” by their means

Classical Discrimination Principal Discriminant Analysis (cont.) Simple way to find “interesting directions” among the means: PCA on set of means i.e. Eigen-analysis of “between class covariance matrix” Where Aside: can show: overall

Classical Discrimination Principal Discriminant Analysis (cont.) But PCA only works like Mean Difference, Expect can improve by taking covariance into account. Blind application of above ideas suggests eigen-analysis of:

Classical Discrimination Principal Discriminant Analysis (cont.) There are: smarter ways to compute (“generalized eigenvalue”) other representations (this solves optimization prob’s) Special case: 2 classes, reduces to standard FLD Good reference for more: Section 3.8 of: Duda, Hart & Stork (2001)

Classical Discrimination Summary of Classical Ideas: Among “Simple Methods” MD and FLD sometimes similar Sometimes FLD better So FLD is preferred Among Complicated Methods GLR is best So always use that Caution: Story changes for HDLSS settings

(requires root inverse covariance) HDLSS Discrimination Recall main HDLSS issues: Sample Size, n < Dimension, d Singular covariance matrix So can’t use matrix inverse I.e. can’t standardize (sphere) the data (requires root inverse covariance) Can’t do classical multivariate analysis

HDLSS Discrimination An approach to non-invertible covariances: Replace by generalized inverses Sometimes called pseudo inverses Note: there are several Here use Moore Penrose inverse As used by Matlab (pinv.m) Often provides useful results (but not always) Recall Linear Algebra Review…

Recall Linear Algebra Eigenvalue Decomposition: For a (symmetric) square matrix Find a diagonal matrix And an orthonormal matrix (i.e. ) So that: , i.e.

Recall Linear Algebra (Cont.) Eigenvalue Decomp. solves matrix problems: Inversion: Square Root: is positive (nonn’ve, i.e. semi) definite all

Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: For

Recall Linear Algebra (Cont.) Easy to see this satisfies the definition of Generalized (Pseudo) Inverse symmetric

Recall Linear Algebra (Cont.) Moore-Penrose Generalized Inverse: Idea: matrix inverse on non-null space of linear transformation Reduces to ordinary inverse, in full rank case, i.e. for r = d, so could just always use this Tricky aspect: “>0 vs. = 0” & floating point arithmetic

HDLSS Discrimination Application of Generalized Inverse to FLD: Direction (Normal) Vector: Intercept: Have replaced by

HDLSS Discrimination Toy Example: Increasing Dimension

Next Topics: HDLSS Properties of FLD Generalized inverses??? Think about HDLSS, as on 04/03/02 Maximal Data Piling Embedding and Kernel Spaces