Classification and application in Remote Sensing
2 Overview Introduction to classification problem an application of classification in remote sensing: vegetation classification -band selection -multi-class classification
3 Introduction make program that automatically recognize handwritten numbers:
4 Introduction classification problem from raw data to decisions learn from examples and generalize Given: Training examples (x, f(x)) for some unknown function f. Find: A good approximation to f.
5 Examples Handwriting recognition -x: data from pen motion -f(x): letter of the alphabet Disease Diagnosis -x: properties of patient (symptoms, lab tests) -f(x): disease (or maybe, recommended therapy) Face Recognition -x: bitmap picture of person’s face -f(x): name of person Spam Detection -x: message -f(x): spam or not spam
6 Steps for building a classifier data acquisition / labeling (ground truth) preprocessing feature selection / feature extraction classification (learning/testing) post-processing decision
7 Data acquisition acquiring the data and labeling data is independently randomly sample according to unknown distribution P(x,y)
8 Pre-processing e.g. image processing: -histogram equalization, -filtering -segmentation data normalization
9 Pre-processing: example
10 Feature selection/extraction This is generally the most important step conveying the information in the data to classifier the number of features: -should be high: more info is better -should be low: curse of dimensionality will include prior knowledge of problem in part manual, in part automatic
11 Feature selection/extraction User knowledge Automatic: -PCA: reduce number of feature by decorrelation -look which feature give best classification result
12 Feature extraction: example
13 Feature scatterplot Class A Class B Class C K=3 value feature 1 value feature 2
14 Classification learn from the features and generalize learning algorithm analyzes the examples and produces a classifier f given a new data point (x,y), the classifier is given x and predicts ŷ = f(x) the loss L(ŷ,y) is then measured goal of the learning algorithm: Find the f that minimizes the expected loss
15 Classification: Bayesian decision theory fundamental statistical approach to the problem of pattern classification assuming that the descision problem is posed in probabilistic terms using P(y|x) posterior probability, make classification (Maximum aposteriori classification)
16 Classification density estimationneed to estimate p(y) and p(x|y), prior and class-conditional probability density using only the data: density estimation. often not feasible: too little data in to high- dimensional space: -assume simple parametric probability model (normal) -non-parametric -directly find discriminant function
17 example
18 example
19 Post-processing include context -e.g. in images, signals integrate multiple classifiers
20 Decision minimize risk, considering cost of misclassification : when unsure, select class of minimal cost of error.
21 no free lunch theorem don’t wait until the a “generic” best classifier is here!
22 Applications in Remote Sensing
23 Remote Sensing : acquisition image are acquired from air or space.
24 Spectral response
25 Spectral response
26
27 Brugge Westhoek Hyperspectral sensor: AISA Eagle (July 2004): resolution
28 Labeling
29 Labeling:spectral class mean
30 Feature extraction here: exploratory use: Automatically look for relevant features -which spectral bands (wavelength) should be measured at what which spectral resolution (width) for my application. -results can be used for classification, sensor design or interpretation
31 Feature extraction: Band Selection With spectral response function:
32 Hypothetical 12 band sensor
33 Class distribution: Normal
34 Class Separation Criterion two class Bhattacharyya bound multi-class criterion
35 Optimization Minimize Gradient descent is possible, but local minima prevent it from giving good optimal values. Therefore, we use global optimization : Simulated Annealing.
36
37
38
39
40
41
42 Remote sensing: classification
43 Multi-class Classification Linear Multi-class Classifier Combining Binary Classifiers -One against all: K-1 classifiers -One against one: K(K-1)/2 classifiers
44 combining linear multi-class classifiers Class A Class B Class C AC AB BC K=3
45 Combining Binary Classifiers Maximum Voting: 4 class example Votes: 1 : 0 2 : 2 3 : 1 4 : 3 (Winner) Bin ClassifierResult
46 Problem with max voting No Probabilities, just class labels -Hard classification Probabilities are usefull for -spectral unmixing -post-processing
47 Combining Binary Classifiers : Coupling Probabilities Look for class probabilities p i : with r ij : probability class ω i for binary classifier i-j -K-1 free parameters and K(K-1)/2 constraints ! Hastie and Tibshirani: find approximations -minimizing Kullback-Leibler distance
48 Classification result
49 single pixel classes: not wanted
50 Remote Sensing: post-processing use contextual information to “adjust” classification. look a classes of neighboring pixels and probabilities, if necessary adjust pixel class
51 Post-processed classification result
52 Pixel mixing SAND MOSS DRY GRASS GREEN GRASS
53 Pixel mixing
54 Unmixing with sand Moss Sparse Moss Grass Sparse Grass Marram Sparse Marram
55 The End