pub/ hurtado/5336

www.geo.utep.edu/ pub/ hurtado/5336

Image Classification

Image Classification Introduction Hard Classification: Supervised
Training Classes & Class Statistics Feature Selection Class Separability Algorithms Classification Algorithms Hard Classification: Unsupervised Class Determination and Classification Algorithms Fuzzy Classification Accuracy Assessment

Unsupervised Classification
Requires minimal initial input. Statistical analysis used to determine natural clustering in the data  classes. Data is then automatically classified into those classes Requires extensive post-processing a posteriori assignment of spectral classes to information classes.

Information classes: classes defined by humans. Spectral classes: classes defined by the inherent statistical properties of the data. I will refer to the classes in unsupervised classification as “clusters”. They are analogous to “training classes” for supervised classification, except they are automatically defined. Training classes are information classes Clusters are spectral classes

Some spectral classes determined by unsupervised classification may be meaningless  mixtures of other classes. Interpretation of unsupervised classification results requires understanding of the general spectral characteristics of the terrain in the image.

The Chain Method Two-pass methodology…
First pass: cluster building & determination of mean measurement vectors (mean spectra) using some statistical/separability measure. Second pass: cluster assignment using minimum distance to means classification algorithm (see supervised classification notes).

Cluster Building Analyst needs to specify values for the following parameters: R: radius in spectral space defining the minimum spacing between clusters during cluster accumulation. N: the number of pixels to be evaluated during cluster accumulation between major merging steps. C: a distance in spectral space used as a threshold when merging clusters in major merging steps Cmax: the maximum number of clusters allowed.

Cluster Building Image is evaluated pixel-by-pixel, line-by-line (row-by-row), “typewriter style”. Jensen gives an example using just the first 3 pixels and only two image bands… Extend this to all the pixels in the image and all n bands…

Cluster Building Cluster Accumulation
Consider pixel 1 as the mean vector for cluster 1 (M1). Consider pixel 2 as the mean vector for cluster 2 (M2). Determine Euclidean distance between M1 and M2 If it is < R, merge cluster 1 and cluster 2 into a new cluster 1 with a “weight”of 2. New cluster mean vector is calculated as the average of M1 and M2. If it is > R, leave cluster 1 and cluster 2 separate.

Cluster Building Consider pixel 3 as the mean vector for cluster 3 (M3). Determine the Euclidean distance from M3 to the previous cluster, etc…. Continue evaluating individual pixels until you have looked at N pixels (or have defined Cmax clusters, whichever comes first)…

Cluster Building Major merging step (evaluate the clusters defined so far for N pixels) Calculate the distance between each cluster and every other cluster. If any two clusters are less than C apart, merge them together. New cluster mean vector is calculated as the weighted average of the two original clusters. New weight is the sum of the two individual weights. Keep going until all remaining clusters are at least C apart from one another. Then go back to pixel-by-pixel cluster accumulation…

Cluster Building Iterate between cluster accumulation and major merging steps until the entire image has been examined and Cmax clusters have been defined. Gradually, the location in spectral space of the cluster means will “stabilize” as more pixels are added (see figure in Jensen). Ending point is the spectral location of the mean measurement vector used in the actual classification step (the 2nd pass).

The second pass involves applying some classification algorithm to the image using the Cmax clusters defined in pass 1. Jensen describes a minimum distance algorithm for cluster assignment…

Minimum Distance Classification Algorithm
Algorithm calculates the Euclidean distance between the cluster mean measurement vector and the vector of a given pixel…

Minimum Distance Classification Algorithm
The pixel is classified into the cluster whose mean measurement vector is “closest” to the pixel’s measurement vector. All pixels will be classified unless a user-specified threshold value is set.

Cluster Assignment Classify each pixel into one of the Cmax clusters.
Produce spectral scatter plots to visualize distribution of data and clusters. Evaluate clusters, label them, combine them (as necessary). Spectral class  information class.

Cluster Assignment Jensen shows an example of a Landsat TM dataset that was analyzed with an unsupervised classification… Lots more information in these results than in supervised classification of the same image. Why? The supervised classification didn’t sample the full range of clusters in the training process. Supervised classification – “lumper” Unsupervised classification – “splitter”

ISODATA Iterative Self-Organizing Data Analysis Technique (I don’t know where that last “A”comes from). Widely-used clustering algorithm. Implemented in ENVI. Comprises a set of “rules of thumb” determined by trial and error…

ISODATA User needs to specify: Cmax: maximum number of clusters
T: maximum percentage of pixels whose class values are allowed to go unchanged between iterations. If exceeded (may never happen), ISODATA stops. M: maximum number of times pixels are classified and their cluster mean vectors recalculated. If exceeded, ISODATA stops. Min%: minimum number of pixels (in terms of percentage of entire image) in a cluster. If not satisfied, the cluster is deleted and its members are reassigned. smax: if exceeded and the cluster has at least 2x Min% pixels, the cluster is split. New cluster means are the old class centers +/- 1s. Split separation value: can be specified for use instead of s in determining new cluster means for split clusters. Minimum distance between cluster means: clusters with a weighted distance less than this are merged.

ISODATA Instead of just 2 passes, it will make an arbitrarily large number of passes and stops only when specified results are obtained. Starts with an arbitrary assignment of Cmax clusters with means along the n-dimensional vector with endpoints defined by the mean and standard deviation of each band…

ISODATA First pass after initial definition of clusters begins at upper left corner of image… Each pixel is compared to each cluster mean and assigned to cluster it is closest to (in a Euclidean distance sense).

ISODATA Second iteration and beyond…
New mean for each cluster calculated. For a given cluster, the new mean is based on the statistics of the pixels assigned to it in the first iteration. Reanalyze all the image pixels. Assign each pixel to the nearest cluster mean. Pixels may or may not change cluster assignment. Repeat process until T threshold reached or maximum number of iterations (M) is reached. Requires MANY iterations to produce a classification that partitions the spectral space effectively. Label the resulting final clusters to create information classes.

We may generate n clusters, but usually have the situation where q of them are difficult to label. These may be pixels that are affected by mixing, etc. We can attempt to reclassify them by performing “cluster busting”: Mask out everything but those pixels. Run a new classification on the resulting masked image Repeat as necessary. Combine new clusters with original ones to create final information classes and classification map.

Fuzzy Classification Fuzzy classification – classification logic that recognizes that radiance measured for a given pixel is the result of mixing (inherent inhomogeneity) and that land cover types have gradational boundaries. Real life is messy and imprecise. Instead of classifying an image pixel into one of m discrete classes, fuzzy logic gives m probabilities, each the likelihood of the pixel belonging to that class.

Fuzzy Classification In reality, boundaries between land cover types are fuzzy, there are heterogeneities within land cover types, etc. Also, our data is imprecise and discretized. Classical set theory is restrictive when dealing with problems like this. Fuzzy set theory (“fuzzy logic”) is better.

Fuzzy Classification Consider a “universe” X with elements x such that X = {x}… Two set theories… Classical: membership in a set A (class) of X is a binary function (either you are or you aren’t) e.g. fA,x = {0,1}. Fuzzy: membership in sets is a continuous function e.g. fA,x = {0,…,1}, and x can have a non-zero membership function in more than one set (class)

Fuzzy Classification With fuzzy logic, hard boundaries between classes are replaced with gradational ones. Instead of a threshold, there is a continuous gradation from one class to the next. Membership function describes the relative amounts of the various classes that are found in a given mixed pixel.

Fuzzy Classification Selection of training classes. It may be useful to pick heterogeneous as well as homogeneous training sites. Partitioning of spectral space. Pixels no longer belong to only one class. Membership function denotes a membership grade value indicating how close the vector of a pixel is to the vectors of the various training classes.

Fuzzy Classification Partitioning of spectral space results in a family of fuzzy sets (classes) such that:

Fuzzy Classification For each fuzzy class, we can define statistics as before (mean, standard deviation, variance/covariance, etc.) They are modified from the usual formulations, however…

Mean Fuzzy Non-fuzzy Covariance Fuzzy Non-fuzzy n = number of pixels j

Fuzzy Classification Membership functions can be defined using any of the traditional classification algorithms. Jensen shows a definition based on the maximum likelihood classification method…

Similar sort of output to what MF (Matched Filtering) produces.
You can make a “hard classification” map by setting the highest membership grade for a given pixel to 1 and the others to 0.

Accuracy Assessment How do we know how well the classification performed? How do we fix or post-process our results?

Accuracy Assessment We can improve classifications by using ancillary data (extra data). Elevation data Geology Hydrology Vegetation Etc. We can apply it as part of the ground truth, as constraints in “stratification”, as part of the classification algorithm, or in post-classification processing. You can get very sophisticated with ancillary data and develop expert systems (see Jensen).

Ancillary Data Problems:
May not be directly applicable May be inaccurate or incomplete May be analog and needs to be digitized You have already used ancillary data as ground truth in your labs.

Stratification Ancillary data can be used to subdivide an image prior to classification. The sub-images are classified separately. Why? Better results… What if you are mapping a certain type of tree that only grows above a certain elevation? Use a DEM to mask out those parts of the image that are below that elevation. Classify only those parts that are above that elevation. Result is a classification map that reduces errors of commission. You have done so by a priori excluding parts of the image based on prior knowledge.

Classifier Operations
Ancillary data can be used during the classification computation itself. For example, you might be classifying an image that is a mixture of spectral bands and a co-registered DEM. The DEM is ancillary data that is treated as just another “feature” by the classification algorithm.

Classifier Operations
You might also consider group pixel, contextual properties like texture as an added “feature” to be considered in the classification. You might also include historical probabilities e.g. “I know last year there was 80% of x and 20% of y in the scene”.. Adding ancillary data will improve performance, but at increased cost.

Post-Processing Involves GIS analysis of the classification results and ancillary data. For example, perform an if-then (Boolean) analysis of your classification map compared to a DEM (or whatever other data is applicable). Also includes clump, sieve, combine, etc. functions provided in ENVI (see ENVI help).

Layered Classification
Hierarchical process where ancillary data and image statistics are used in more than one decision step. Example: Automatically segment image into homogenous areas. Attempt to classify those pixels into training classes. Use minimum distance to classify remaining, unclassified pixels. Use ancillary data to classify remaining pixels.

Accuracy Assessment Need to take into account not only the amount of a category that has been classified, but also where it was classified. Need to compare two sources of information… Classification map Reference test information Relationship between the two is an error matrix (confusion matrix in ENVI)…

Error Matrix Square array of numbers laid out in row and columns that expresses the number of pixels assigned to a particular class relative to the actual class as identified in the field. Columns are the reference data. Rows are the classification results. Illuminates the accuracy of the classification, errors of commission, and errors of exclusion.

Error Matrix How is it constructed? Need to figure out the following first… Get training and reference information. Determine sample size Define sampling strategy Specify statistics

Training & Reference Information
Can’t really use your training classes because they aren’t random – you chose them to begin with, so they bias the error assessment to higher accuracy. You need to find other reference test pixels that weren’t part of the training classes. Best bet is to get ground truth…

Sample Size How many pixels do you need to look at to determine the accuracy of the classification. All of them? Some subset? One way to estimate the number you need to look at is to assume some statistical distribution to your data e.g. normal approximation to binomial. Equations have been derived based on the proportion of correctly classified pixels and some allowable error…

Normal approximation to the binomial distribution:

p = expected percent accuracy
q = 100 – p E = allowable error Z = 2 (for 2s confidence level) Greater allowable error, fewer points needed to evaluate.

Sample Size Problem with this is that there are usually many, many pixels in a remotely sensed image. For example, 0.5% of an ASTER VNIR scene is still ca. 80,000 pixels! Not practical! Compromise: collect a minimum of test pixels for each class in the error matrix. Larger area  more pixels Smaller area  fewer pixels More homogenous  fewer pixels More variability  more pixels

Sampling Strategy Simple random sampling may undersample small, but important classes unless the sample size is very large. Stratified random sampling is preferred. A minimum number of samples are selected from each class. When the test pixels are determined, go to the field and ground-truth them or cross-check them with another dataset.

Error Matrix Calculation
Finally, after doing the classification and collecting our test reference information from an appropriate number of randomly selected pixels, we can populate the error matrix. We compare the ground truth for the test pixels with the classification results for the same test pixels on a pixel-by-pixel basis. The error matrix is evaluated using Descriptive statistics Discrete multivariate statistics

Descriptive Statistics
Overall accuracy computed by dividing the number of correctly classified pixels by the total number of pixels in the error matrix. Accuracy of individual classes is more complicated. Can compute two accuracy measures… Error of commission (producer accuracy) Error of omission (user accuracy) All three tell you different things. All three should be reported.

Overall: overall, 94% of the pixels were correctly classified.
Omission: 96% of the residential pixels were correctly identified as residential. Commission: only 80% of the pixels identified as residential are actually residential.

Discrete Multivariate Statistics
Appropriate since digital images are discrete (not continuous) and therefore are binomially distributed rather than actually normally distributed.

One thing to do is to normalize the error matrix by forcing each row and column to sum to 1. This eliminates differences in sample size and bias thereof. Normalization also effectively includes the omission and commission errors into the matrix values themselves, making the matrix better represent the accuracy. Finally, normalization allows comparison between results of two different classifications/accuracy assessment strategies.

KAPPA analysis – yields a discrete, multivariate statistic that is a measure of agreement or accuracy…. Unlike overall accuracy, KAPPA takes into account omission and commission errors.

r = number of rows in the matrix
xii = number of observations in row i, column i (diagonal of error matrix) xi+ and xi- = marginal totals for row i and column i, respectively N = total number of observations

www.geo.utep.edu/ pub/ hurtado/5336

pub/ hurtado/5336

Similar presentations

Presentation on theme: "pub/ hurtado/5336"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

pub/ hurtado/5336

Similar presentations

Presentation on theme: "pub/ hurtado/5336"— Presentation transcript:

Similar presentations

About project

Feedback