Unsupervised Classification

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Major Operations of Digital Image Processing (DIP) Image Quality Assessment Radiometric Correction Geometric Correction Image Classification Introduction.
Data Mining Techniques: Clustering
Assessment. Schedule graph may be of help for selecting the best solution Best solution corresponds to a plateau before a high jump Solutions with very.
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
EE 7730 Image Segmentation.
An Overview of RS Image Clustering and Classification by Miles Logsdon with thanks to Robin Weeks Frank Westerlund.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Thematic Information Extraction: Pattern Recognition/ Classification
Feature Screening Concept: A greedy feature selection method. Rank features and discard those whose ranking criterions are below the threshold. Problem:
What is Cluster Analysis?
Lecture 14: Classification Thursday 18 February 2010 Reading: Ch – 7.19 Last lecture: Spectral Mixture Analysis.
Image Classification To automatically categorize all pixels in an image into land cover classes or themes.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Classification of Remotely Sensed Data General Classification Concepts Unsupervised Classifications.
Lecture 14: Classification Thursday 19 February Reading: “Estimating Sub-pixel Surface Roughness Using Remotely Sensed Stereoscopic Data” pdf preprint.
Pixel-based image classification
Evaluating Performance for Data Mining Techniques
Image Classification
 C. C. Hung, H. Ijaz, E. Jung, and B.-C. Kuo # School of Computing and Software Engineering Southern Polytechnic State University, Marietta, Georgia USA.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Rsensing6_khairul 1 Image Classification Image Classification uses the spectral information represented by the digital numbers in one or more spectral.
Image Classification: Introduction Lecture Notes 6 prepared by R. Lathrop 11/99 updated 3/04 Readings: ERDAS Field Guide 6th Ed. CH. 6.
Environmental Remote Sensing Lecture 5: Image Classification " Purpose: – categorising data – data abstraction / simplification – data interpretation –
Exercise #5: Supervised Classification. Step 1. Delineating Training Sites and Generating Signatures An individual training site is delineated as an “area.
Image Classification: Introduction Lecture Notes 7 prepared by R. Lathrop 11/99 updated 3/04 Readings: ERDAS Field Guide 5th Ed. CH. 6:
1 Remote Sensing and Image Processing: 6 Dr. Mathias (Mat) Disney UCL Geography Office: 301, 3rd Floor, Chandler House Tel:
Image Preprocessing: Geometric Correction Image Preprocessing: Geometric Correction Jensen, 2003 John R. Jensen Department of Geography University of South.
Image Classification 영상분류
Digital Image Processing CCS331 Relationships of Pixel 1.
Lecture 3 The Digital Image – Part I - Single Channel Data 12 September
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Digital Image Processing
Image Segmentation in Color Space By Anisa Chaudhary.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Supervised Classification in Imagine D. Meyer E. Wood
Machine Learning Queens College Lecture 7: Clustering.
Remote Sensing Unsupervised Image Classification.
Digital Image Processing
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
Classification Categorization is the process in which ideas and objects are recognized, differentiated and understood. Categorization implies that objects.
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Accuracy Assessment Accuracy Assessment Error Matrix Sampling Method
Image Enhancement Band Ratio Linear Contrast Enhancement
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Thematic Information Extraction: Supervised Classification
COMP24111 Machine Learning K-means Clustering Ke Chen.
1.Image Error and Quality 2.Sampling Theory 3.Univariate Descriptive Image Statistics 4.Multivariate Statistics 5.Geostatistics for RS Next Remote Sensing1.
Semi-Supervised Clustering
Urban growth pattern of the San Antonio area
Unsupervised Classification in Imagine
Classification of Remotely Sensed Data
Class 10 Unsupervised Classification
UZAKTAN ALGIILAMA UYGULAMALARI Segmentasyon Algoritmaları
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
University College London (UCL), UK
REMOTE SENSING Multispectral Image Classification
Supervised Classification
pub/ hurtado/5336
Unsupervised Classification
Image Information Extraction
Chap.8 Image Analysis 숙명여자대학교 컴퓨터과학과 최 영 우 2005년 2학기.
ALI assignment – see amended instructions
Text Categorization Berlin Chen 2003 Reference:
Environmental Remote Sensing GEOG 2021
Class 10 Unsupervised Classification
Presentation transcript:

Unsupervised Classification Geography KHU Jinmu Choi Unsupervised Classification Chain clustering method ISODATA method

Unsupervised Classification Unsupervised classification (commonly referred to as clustering) is an effective method of partitioning remote sensor image data in multispectral feature space and extracting land-cover information. Compared to supervised classification, unsupervised classification normally requires only a minimal amount of initial input from the analyst. This is because clustering does not normally require training data.

Unsupervised Classification Unsupervised classification is the process where numerical operations are performed that search for natural groupings of the spectral properties of pixels, as examined in multispectral feature space. The clustering process results in a classification map consisting of m spectral classes. The analyst then attempts a posteriori (after the fact) to assign or transform the spectral classes into thematic information classes of interest (e.g., forest, agriculture).

Unsupervised Classification Hundreds of clustering algorithms have been developed. Two examples of conceptually simple but not necessarily efficient clustering algorithms will be used to demonstrate the fundamental logic of unsupervised classification of remote sensor data: clustering using the Chain Method clustering using the Iterative Self-Organizing Data Analysis Technique (ISODATA).

Clustering Using the Chain Method The Chain Method clustering algorithm operates in a two- pass mode (i.e., it passes through the multispectral dataset two times). Pass #1: The program reads through the dataset and sequentially builds clusters (groups of points in spectral space). A mean vector is then associated with each cluster. Pass #2: A minimum distance to means classification algorithm is applied to the whole dataset on a pixel-by-pixel basis whereby each pixel is assigned to one of the mean vectors created in pass #1. The first pass, therefore, automatically creates the cluster signatures (class mean vectors) to be used by the minimum distance to means classifier.

Clustering Using the Chain Method Phase 1: Cluster Building – R, a radius distance in spectral space used to determine when a new cluster should be formed (e.g., when raw remote sensor data are used, it might be set at 15 brightness value units). C, a spectral space distance parameter used when merging clusters (e.g., 30 units ) when N is reached. N, the number of pixels to be evaluated between each major merging of the clusters (e.g., 2000 pixels). Cmax, the maximum number of clusters to be identified by the clustering algorithm (e.g., 20 clusters).

(source: Jensen, 2011)

Clustering Using the Chain Method Phase 2: Assignment of pixels to one of the Cmax clusters using minimum distance to means classification logic Starting at the origin of the multispectral dataset (i.e., line 1, column 1), pixels are evaluated sequentially from left to right as if in a chain. After one line is processed, the next line of data is considered. We will analyze the clustering of only the first three pixels in a hypothetical image and label them pixels 1, 2, and 3.

(source: Jensen, 2011)

Pass 2: Assignment of Pixels to One of the Cmax Clusters Using Minimum Distance Classification Logic The final cluster mean data vectors are used in a minimum distance to means classification algorithm to classify all the pixels in the image into one of the Cmax clusters. The analyst usually produces a co-spectral plot display to document where the clusters reside in 3-dimensional feature space. It is then necessary to evaluate the location of the clusters in the image, label them if possible, and see if any clusters should be combined. It is usually necessary to combine some clusters. This is where an intimate knowledge of the terrain is critical.

(source: Jensen, 2011)

Results of Clustering on Thematic Mapper Bands 2, 3, and 4 of the Charleston, SC, Landsat TM scene. (source: Jensen, 2011)

(source: Jensen, 2011) Grouping (labeling) of the original 20 spectral clusters into information classes. The labeling was performed by analyzing the mean vector locations in bands 3 and 4.

ISODATA Clustering The Iterative Self-Organizing Data Analysis Technique (ISODATA) represents a comprehensive set of heuristic (rules of thumb) procedures that have been incorporated into an iterative classification algorithm. Many of the steps incorporated into the algorithm are a result of experience gained through experimentation. The ISODATA algorithm is a modification of the k-means clustering algorithm, which includes a) merging clusters if their separation distance in multispectral feature space is below a user-specified threshold, and b) rules for splitting a single cluster into two clusters.

ISODATA Clustering ISODATA is iterative because it makes a large number of passes through the remote sensing dataset until specified results are obtained, instead of just two passes. ISODATA does not allocate its initial mean vectors based on the analysis of pixels in the first line of data the way the two-pass chain algorithm does. Rather, an initial arbitrary assignment of all Cmax clusters takes place along an n-dimensional vector that runs between very specific points in feature space. The region in feature space is defined using the mean, µk, and standard deviation, sk, of each band in the analysis. This method of automatically seeding the original Cmax vectors makes sure that the first few lines of data do not bias the creation of clusters.

ISODATA Clustering ISODATA is self-organizing because it requires relatively little human input. Typical ISODATA algorithms normally require the analyst to specify the following criteria: Cmax: the maximum number of clusters to be identified by the algorithm (e.g., 20 clusters). T: the maximum percentage of pixels whose class values are allowed to be unchanged between iterations. When this number is reached, the ISODATA algorithm terminates. M: the maximum number of times ISODATA is to classify pixels and recalculate cluster mean vectors. The ISODATA algorithm terminates when this number is reached.

ISODATA Clustering Minimum members in a cluster (%): If a cluster contains less than the minimum percentage of members, it is deleted and the members are assigned to an alternative cluster. This also affects whether a class is going to be split (see maximum standard deviation). The default minimum percentage of members is often set to 0.01. Maximum standard deviation (smax): When the standard deviation for a cluster exceeds the specified maximum standard deviation and the number of members in the class is greater than twice the specified minimum members in a class, the cluster is split into two clusters. The mean vectors for the two new clusters are the old class centers ±1s. Maximum standard deviation values between 4.5 and 7 are typical.

ISODATA Clustering Split separation value: If this value is changed from 0.0, it takes the place of the standard deviation in determining the locations of the new mean vectors plus and minus the split separation value. Minimum distance between cluster means (C): Clusters with a weighted distance less than this value are merged. A default of 3.0 is often used.

a) ISODATA initial distribution of five (5) hypothetical mean vectors b) In the first iteration, assigned to the cluster whose mean is closest in Euclidean distance. c) During the second iteration, a new mean is calculated for each cluster. Then, every pixel in the scene is assigned to one of the new clusters. d) This split–merge–assign process continues until there is little change in class assignment between iterations (the T threshold is reached) or the maximum number of iterations is reached (M). ISODATA Clustering (source: Jensen, 2011)

a) Distribution of 20 ISODATA mean vectors after just one iteration using Landsat TM band 3 and 4 data of Charleston, SC. b) Distribution of 20 ISODATA mean vectors after 20 iterations. The bulk of the important feature space (the gray background) is partitioned rather well after just 20 iterations. (source: Jensen, 2011)

Unsupervised Cluster Busting After the chain algorithm or ISODATA, if no confidence in labeling q of them to an appropriate information class!!! In this hypothetical example, the final cluster map would be composed of : 70 good clusters from the initial classification, 15 good clusters from the first cluster-busting pass (recoded as values 71 to 85), and 5 good clusters from the second cluster-busting pass (recoded as values 86 to 90). The final cluster map file may be put together using a simple GIS maximum dominate function. The final cluster map is then recoded to create the final classification map.

Next Lab: Exercise: Unsupervised Classification Lecture: Accuracy Assessment Source: Jensen and Jensen, 2011, Introductory Digital Image Processing, 4th ed, Prentice Hall.