Background on Classification
What is Classification? Objects of Different Types : Classes Sensing and Digitizing Calculating Properties : Features Mapping Features to Classes
Features (term means different things to different people) A Feature is a quantity used by a classifier A Feature Vector is an ordered list of features Examples: A full spectrum A dimensionality reduced spectrum A differentiated spectrum Vegetation indices
Features (terms means different things to different people) For the math behind Classifiers, feature vectors are thought of as points in n-dimensional space Spectra in 426 dimensional (or 426D) space (NDVI, Nitrogen, ChlorA, ChlorB) in 4D space Dimensionality reduction used to Visualize in 2D or 3D Mitigate the Curse of Dimensionality
What is a Classifier A bit more formal x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Discrete) classifier is a function
What is a Classifier A bit more formal x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Continuous) classifier is a function
Linear Discriminants Discriminant classifiers are designed to discriminate between classes Generative model classes
Linear Discriminant or Linear Discriminant Analysis There are many differ types. Here are some: Ordinary Least Squares Ridge Regression Lasso Canonical Perceptron Support Vector Machine (without kernels) Relevance Vector Machine
Linear Discriminants Points on left side of lines are in blue class Points on right side of lines are in red class Which line is best? What does best mean? Blue Class Red Class
Linear Discriminants – 2 Classes bias wk0 x0=1 wk1 x1 S wk2 x2 . . wkm BIG Numbers for Class 1 small Numbers for Class 2 xm Features Weights
Example of “Best” Support Vector Machine Pairwise (2 classes at a time) Maximizes Margin Between Classes Minimizes Objective Function by Solving Quadratic Program
yn = w0 +w1xn,1+w2xn,2+…+wBxn,B Back to Classifiers Definition: Training Given a data set X = {x1, x2, …, xN} Corresponding desired, or target, outputs Y = {y1, y2, …, yN} user defined functional form of a classifier, e.g. yn = w0 +w1xn,1+w2xn,2+…+wBxn,B Estimate the parameters {w0, w1,…, wB} X is called the training set
Linear Classifiers Ordinary Least Squares Continuous Classifier Target Outputs usually {0,1} or {-1,1} Minimize the squared error:
Linear Classifiers – Least Squares How do we minimize? Take derivative and set to zero or
Example Rows ~ Spectra X = t = w = pinv(X)*t = X*w =
Ridge Regression Ordinary Least Squares + Regularization Diagonal Loading: Solution:
Ridge Regression Diagonal Loading can be crucial for Ordinary Least Squares Solution: Ridge Regression Solution: Diagonal Loading: Diagonal Loading can be crucial for Numerical Stability
Illustrative Example We’ll see value later
Notes on Methodology When developing a “Machine Learning” algorithm, one should test it on simulated data first Necessary but not sufficient Necessary: If it doesn’t work on simulated data, then it almost certainly will not work with real data Sufficient: If it works on simulated data, then it may or may not work on real data Question: How do we simulate?
Simulating Data Usually use Gaussians because they are often assumed (although not accurate nearly as often) Multivariate Gaussians completely determined by Mean Covariance Matrix
Some Single Gaussians in One - Dimension
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Same Y-Axis
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis
Some Single Gaussians in Two Dimensions
Formulas for Gaussians To generate simulated data, we need to draw samples from these distributions Univariate Gaussian Multivariate Gaussian, e.g. x is a spectrum Covariance Matrix
Sample Covariance Definition Called Outer Product Example Outer Product
Covariance Matrices If S is a covariance matrix, then people who need to know can calculate matrices U and D with the properties that S is Diagonalized U is Orthogonal (Like a Rotation) D is Diagonal
Generating Covariance Matrices (1) Any Matrix of the form AtA is a covariance for some distribution So we can do the following: Set A = random square matrix S = AtA (2) So we also can do the following: Make a diagonal matrix D Make a rotation matrix U Make a covariance matrix using by setting S = UtDU We will generate covariance matrices S using Python
Go To Python
Linear Dimensionality Reduction PCA: Principal Components Analysis Maximize amount of variance in first k bands compared to all other linear (orthogonal) transforms MNF: Minimum Noise Fraction Minimizes estimate of Noise/Signal or Maximizes estimate of Signal/Noise
PCA Start with a data set of spectra or other samples. Implicitly assumed drawn from same distribution. Compute sample mean over all spectra: Compute Sample Covariance: Diagonalize S: PCA is defined to be:
PCA – Easy Examples Eigenvector (Columns of V determine major and minor axes New coordinate system is a rotation (U) and shift (x-xbar) of original coordinate system Assumes elliptical, which is Gaussian Eigenvalues determine length of major and minor axes
PCA, “Dark Points”, and BRDF These Black Points are from Oak Trees These Red Points are from Soil in a Baseball Field
Go To Python
MNF Assumption: Observed Spectrum = Signal + Noise Want to transform x so is minimized How do we represent this ratio?
Noise Variance / Signal Variance MNF Assume the signal and noise are both random vectors with multivariate Gaussian distributions Assume the noise is zero mean. Equally likely to add or subtract by the same amounts. The noise variance uniquely determines how much the signal is modified by noise. Therefore, we should try to minimize the ratio Noise Variance / Signal Variance
MNF – Noise/Signal Ratio How do we compute it for spectra? 426 bands -> 426 variances and 425*424/2 covariances Dividing element-wise won’t work What should we do? Diagonalize! Covariance of n is diagonalizable: Covariance of x is diagonalizable:
MNF – Noise/Signal Ratio Covariance of n is diagonalizable: Covariance of x is diagonalizable: GOOD NEWS! They can be simultaneously diagonalized. It’s a little complicated by basically looks like this: So
MNF: Algorithm Estimate n Calculate Covariance of n Calculate Covariance of x Calculate Left Eigenvectors and Eigenvalues of Make sure eigenvalues are sorted in order of Big to Little if maximizing Little to Big if minimizing Only keep the Eigenvectors that early in the sort or