Download presentation
Presentation is loading. Please wait.
2
Background on Classification
3
What is Classification?
Objects of Different Types : Classes Sensing and Digitizing Calculating Properties : Features Mapping Features to Classes
4
Features (term means different things to different people)
A Feature is a quantity used by a classifier A Feature Vector is an ordered list of features Examples: A full spectrum A dimensionality reduced spectrum A differentiated spectrum Vegetation indices
5
Features (terms means different things to different people)
For the math behind Classifiers, feature vectors are thought of as points in n-dimensional space Spectra in 426 dimensional (or 426D) space (NDVI, Nitrogen, ChlorA, ChlorB) in 4D space Dimensionality reduction used to Visualize in 2D or 3D Mitigate the Curse of Dimensionality
6
What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Discrete) classifier is a function
7
What is a Classifier A bit more formal
x = Feature Vector (x1, x2, …, xB)’ L = set of class labels : L = {L1, L2, …, LC} e.g. {Pinus palustris, Quercus laevis, …} A (Continuous) classifier is a function
8
Linear Discriminants Discriminant classifiers are designed to
discriminate between classes Generative model classes
9
Linear Discriminant or Linear Discriminant Analysis
There are many differ types. Here are some: Ordinary Least Squares Ridge Regression Lasso Canonical Perceptron Support Vector Machine (without kernels) Relevance Vector Machine
10
Linear Discriminants Points on left side of lines are in blue class
Points on right side of lines are in red class Which line is best? What does best mean? Blue Class Red Class
11
Linear Discriminants – 2 Classes
bias wk0 x0=1 wk1 x1 S wk2 x2 . . wkm BIG Numbers for Class 1 small Numbers for Class 2 xm Features Weights
12
Example of “Best” Support Vector Machine
Pairwise (2 classes at a time) Maximizes Margin Between Classes Minimizes Objective Function by Solving Quadratic Program
13
yn = w0 +w1xn,1+w2xn,2+…+wBxn,B
Back to Classifiers Definition: Training Given a data set X = {x1, x2, …, xN} Corresponding desired, or target, outputs Y = {y1, y2, …, yN} user defined functional form of a classifier, e.g. yn = w0 +w1xn,1+w2xn,2+…+wBxn,B Estimate the parameters {w0, w1,…, wB} X is called the training set
14
Linear Classifiers Ordinary Least Squares
Continuous Classifier Target Outputs usually {0,1} or {-1,1} Minimize the squared error:
15
Linear Classifiers – Least Squares
How do we minimize? Take derivative and set to zero or
16
Example Rows ~ Spectra X = t = w = pinv(X)*t = X*w =
17
Ridge Regression Ordinary Least Squares + Regularization
Diagonal Loading: Solution:
18
Ridge Regression Diagonal Loading can be crucial for
Ordinary Least Squares Solution: Ridge Regression Solution: Diagonal Loading: Diagonal Loading can be crucial for Numerical Stability
19
Illustrative Example We’ll see value later
20
Notes on Methodology When developing a “Machine Learning” algorithm, one should test it on simulated data first Necessary but not sufficient Necessary: If it doesn’t work on simulated data, then it almost certainly will not work with real data Sufficient: If it works on simulated data, then it may or may not work on real data Question: How do we simulate?
21
Simulating Data Usually use Gaussians because they are often assumed (although not accurate nearly as often) Multivariate Gaussians completely determined by Mean Covariance Matrix
22
Some Single Gaussians in One - Dimension
23
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Same Y-Axis
24
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis
25
Fisher Canonical LDA Gaussians Trickery of Displays Same X-axis Different Y-Axis
26
Some Single Gaussians in Two Dimensions
27
Formulas for Gaussians
To generate simulated data, we need to draw samples from these distributions Univariate Gaussian Multivariate Gaussian, e.g. x is a spectrum Covariance Matrix
28
Sample Covariance Definition
Called Outer Product Example Outer Product
29
Covariance Matrices If S is a covariance matrix, then people who need to know can calculate matrices U and D with the properties that S is Diagonalized U is Orthogonal (Like a Rotation) D is Diagonal
30
Generating Covariance Matrices
(1) Any Matrix of the form AtA is a covariance for some distribution So we can do the following: Set A = random square matrix S = AtA (2) So we also can do the following: Make a diagonal matrix D Make a rotation matrix U Make a covariance matrix using by setting S = UtDU We will generate covariance matrices S using Python
31
Go To Python
32
Linear Dimensionality Reduction
PCA: Principal Components Analysis Maximize amount of variance in first k bands compared to all other linear (orthogonal) transforms MNF: Minimum Noise Fraction Minimizes estimate of Noise/Signal or Maximizes estimate of Signal/Noise
33
PCA Start with a data set of spectra or other samples. Implicitly assumed drawn from same distribution. Compute sample mean over all spectra: Compute Sample Covariance: Diagonalize S: PCA is defined to be:
34
PCA – Easy Examples Eigenvector (Columns of V determine major and minor axes New coordinate system is a rotation (U) and shift (x-xbar) of original coordinate system Assumes elliptical, which is Gaussian Eigenvalues determine length of major and minor axes
35
PCA, “Dark Points”, and BRDF
These Black Points are from Oak Trees These Red Points are from Soil in a Baseball Field
36
Go To Python
37
MNF Assumption: Observed Spectrum = Signal + Noise
Want to transform x so is minimized How do we represent this ratio?
38
Noise Variance / Signal Variance
MNF Assume the signal and noise are both random vectors with multivariate Gaussian distributions Assume the noise is zero mean. Equally likely to add or subtract by the same amounts. The noise variance uniquely determines how much the signal is modified by noise. Therefore, we should try to minimize the ratio Noise Variance / Signal Variance
39
MNF – Noise/Signal Ratio
How do we compute it for spectra? 426 bands -> 426 variances and 425*424/2 covariances Dividing element-wise won’t work What should we do? Diagonalize! Covariance of n is diagonalizable: Covariance of x is diagonalizable:
40
MNF – Noise/Signal Ratio
Covariance of n is diagonalizable: Covariance of x is diagonalizable: GOOD NEWS! They can be simultaneously diagonalized. It’s a little complicated by basically looks like this: So
41
MNF: Algorithm Estimate n Calculate Covariance of n
Calculate Covariance of x Calculate Left Eigenvectors and Eigenvalues of Make sure eigenvalues are sorted in order of Big to Little if maximizing Little to Big if minimizing Only keep the Eigenvectors that early in the sort or
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.