AMwww.Remote-Sensing.info Ch 4. Image Quality Assessment and Statistical Evaluation
Many remote sensing datasets contain high-quality, accurate data. Unfortunately, sometimes error (or noise) is introduced into the remote sensor data by: the environment (e.g., atmospheric scattering), random or systematic malfunction of the remote sensing system (e.g., an uncalibrated detector creates striping), or improper airborne or ground processing of the remote sensor data prior to actual data analysis (e.g., inaccurate analog-to- digital conversion). Many remote sensing datasets contain high-quality, accurate data. Unfortunately, sometimes error (or noise) is introduced into the remote sensor data by: the environment (e.g., atmospheric scattering), random or systematic malfunction of the remote sensing system (e.g., an uncalibrated detector creates striping), or improper airborne or ground processing of the remote sensor data prior to actual data analysis (e.g., inaccurate analog-to- digital conversion). Image Quality Assessment and Statistical Evaluation
Therefore, the person responsible for analyzing the digital remote sensor data should first assess its quality and statistical characteristics. This is normally accomplished by: looking at the frequency of occurrence of individual brightness values in the image displayed in a histogram viewing on a computer monitor individual pixel brightness values at specific locations or within a geographic area, computing univariate descriptive statistics to determine if there are unusual anomalies in the image data, and computing multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). Therefore, the person responsible for analyzing the digital remote sensor data should first assess its quality and statistical characteristics. This is normally accomplished by: looking at the frequency of occurrence of individual brightness values in the image displayed in a histogram viewing on a computer monitor individual pixel brightness values at specific locations or within a geographic area, computing univariate descriptive statistics to determine if there are unusual anomalies in the image data, and computing multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). Image Quality Assessment and Statistical Evaluation
Remote Sensing Sampling Theory A population is an infinite or finite set of elements. An infinite population could be all possible images that might be acquired of the Earth in All Landsat 7 ETM+ images of Charleston, S.C. in 2004 is a finite population. A sample is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. For example, we might decide to analyze a June 1, 2004, Landsat image of Charleston. If observations with certain characteristics are systematically excluded from the sample either deliberately or inadvertently (such as selecting images obtained only in the spring of the year), it is a biased sample. Sampling error is the difference between the true value of a population characteristic and the value of that characteristic inferred from a sample. A population is an infinite or finite set of elements. An infinite population could be all possible images that might be acquired of the Earth in All Landsat 7 ETM+ images of Charleston, S.C. in 2004 is a finite population. A sample is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. For example, we might decide to analyze a June 1, 2004, Landsat image of Charleston. If observations with certain characteristics are systematically excluded from the sample either deliberately or inadvertently (such as selecting images obtained only in the spring of the year), it is a biased sample. Sampling error is the difference between the true value of a population characteristic and the value of that characteristic inferred from a sample.
Remote Sensing Sampling Theory Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution. Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution. Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred. Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred. Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution. Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around some central value, and the frequency of occurrence declines away from this central point. A graph of the distribution appears bell shaped and is called a normal distribution. Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred. Many statistical tests used in the analysis of remotely sensed data assume that the brightness values recorded in a scene are normally distributed. Unfortunately, remotely sensed data may not be normally distributed and the analyst must be careful to identify such conditions. In such instances, nonparametric statistical theory may be preferred.
Common Symmetric and Skewed Distributions in Remotely Sensed Data
Remote Sensing Sampling Theory The histogram is a useful graphic representation of the information content of a remotely sensed image. The histogram is a useful graphic representation of the information content of a remotely sensed image. It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BV ijk at each pixel location is constructed.It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BV ijk at each pixel location is constructed. The histogram is a useful graphic representation of the information content of a remotely sensed image. The histogram is a useful graphic representation of the information content of a remotely sensed image. It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BV ijk at each pixel location is constructed.It is instructive to review how a histogram of a single band of imagery, k, composed of i rows and j columns with a brightness value BV ijk at each pixel location is constructed.
Histogram of A Single Band of Landsat Thematic Mapper Data of Charleston, SC
Histogram of Thermal Infrared Imagery of a Thermal Plume in the Savannah River
Cursor and Raster Display of Brightness Values
Two- and Three- Dimensional Evaluation of Pixel Brightness Values within a Geographic Area
Univariate Descriptive Image Statistics Measures of Central Tendency in Remote Sensor Data The mode is the value that occurs most frequently in a distribution and is usually the highest point on the curve (histogram). It is common, however, to encounter more than one mode in a remote sensing dataset. The histograms of the Landsat TM image of Charleston, SC and the predawn thermal infrared image of the Savannah River have multiple modes. They are nonsymmetrical (skewed) distributions. The median is the value midway in the frequency distribution. One- half of the area below the distribution curve is to the right of the median, and one-half is to the left. Measures of Central Tendency in Remote Sensor Data The mode is the value that occurs most frequently in a distribution and is usually the highest point on the curve (histogram). It is common, however, to encounter more than one mode in a remote sensing dataset. The histograms of the Landsat TM image of Charleston, SC and the predawn thermal infrared image of the Savannah River have multiple modes. They are nonsymmetrical (skewed) distributions. The median is the value midway in the frequency distribution. One- half of the area below the distribution curve is to the right of the median, and one-half is to the left.
Univariate Descriptive Image Statistics The mean ( k ) of a single band of imagery composed of n brightness values (BV ik ) is computed using the formula: The mean is the arithmetic average and is defined as the sum of all brightness value observations divided by the number of observations. It is the most commonly used measure of central tendency. The mean ( k ) of a single band of imagery composed of n brightness values (BV ik ) is computed using the formula: The sample mean, k, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode). The mean ( k ) of a single band of imagery composed of n brightness values (BV ik ) is computed using the formula: The mean is the arithmetic average and is defined as the sum of all brightness value observations divided by the number of observations. It is the most commonly used measure of central tendency. The mean ( k ) of a single band of imagery composed of n brightness values (BV ik ) is computed using the formula: The sample mean, k, is an unbiased estimate of the population mean. For symmetrical distributions, the sample mean tends to be closer to the population mean than any other unbiased estimate (such as the median or mode).
Remote Sensing Univariate Statistics - Variance Measures of Dispersion Measures of the dispersion about the mean of a distribution provide valuable information about the image. For example, the range of a band of imagery (range k ) is computed as the difference between the maximum (max k ) and minimum (min k ) values; that is, Unfortunately, when the minimum or maximum values are extreme or unusual observations (i.e., possibly data blunders), the range could be a misleading measure of dispersion. Such extreme values are not uncommon because the remote sensor data are often collected by detector systems with delicate electronics that can experience spikes in voltage and other unfortunate malfunctions. When unusual values are not encountered, the range is a very important statistic often used in image enhancement functions such as min–max contrast stretching. Measures of Dispersion Measures of the dispersion about the mean of a distribution provide valuable information about the image. For example, the range of a band of imagery (range k ) is computed as the difference between the maximum (max k ) and minimum (min k ) values; that is, Unfortunately, when the minimum or maximum values are extreme or unusual observations (i.e., possibly data blunders), the range could be a misleading measure of dispersion. Such extreme values are not uncommon because the remote sensor data are often collected by detector systems with delicate electronics that can experience spikes in voltage and other unfortunate malfunctions. When unusual values are not encountered, the range is a very important statistic often used in image enhancement functions such as min–max contrast stretching.
Remote Sensing Univariate Statistics - Variance Measures of Dispersion The variance of a sample is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, var k, is computed using the equation: The numerator of the expression is the corrected sum of squares (SS). If the sample mean ( k ) were actually the population mean, this would be an accurate measurement of the variance. Measures of Dispersion The variance of a sample is the average squared deviation of all possible observations from the sample mean. The variance of a band of imagery, var k, is computed using the equation: The numerator of the expression is the corrected sum of squares (SS). If the sample mean ( k ) were actually the population mean, this would be an accurate measurement of the variance.
Remote Sensing Univariate Statistics Unfortunately, there is some underestimation because the sample mean was calculated in a manner that minimized the squared deviations about it. Therefore, the denominator of the variance equation is reduced to n – 1, producing a larger, unbiased estimate of the sample variance:
Remote Sensing Univariate Statistics The standard deviation is the positive square root of the variance. The standard deviation of the pixel brightness values in a band of imagery, s k, is computed as
Pixel Band 1 (green) Band 2 (red) Band 3 (near- infrared) Band 4 (near- infrared) (1,1) (1,2) (1,3) (1,4) (1,5) Hypothetical Dataset of Brightness Values
Jensen, 2004 Band 1 (green) Band 2 (red) Band 3 (near- infrared) Band 4 (near- infrared) Mean ( k ) Variance (var k ) Standard deviation (s k ) Minimum (min k ) Maximum (max k ) Range (BV r ) Univariate Statistics for the Hypothetical Example Dataset
Measures of Distribution (Histogram) Asymmetry and Peak Sharpness Skewness is a measure of the asymmetry of a histogram and is computed using the formula: A perfectly symmetric histogram has a skewness value of zero. Skewness is a measure of the asymmetry of a histogram and is computed using the formula: A perfectly symmetric histogram has a skewness value of zero.
A histogram may be symmetric but have a peak that is very sharp or one that is subdued when compared with a perfectly normal distribution. A perfectly normal distribution (histogram) has zero kurtosis. The greater the positive kurtosis value, the sharper the peak in the distribution when compared with a normal histogram. Conversely, a negative kurtosis value suggests that the peak in the histogram is less sharp than that of a normal distribution. Measures of Distribution (Histogram) Asymmetry and Peak Sharpness
Remote Sensing Multivariate Statistics Remote sensing research is often concerned with the measurement of how much radiant flux is reflected or emitted from an object in more than one band (e.g., in red and near-infrared bands). It is useful to compute multivariate statistical measures such as covariance and correlation among the several bands to determine how the measurements covary. Later it will be shown that variance–covariance and correlation matrices are used in remote sensing principal components analysis (PCA), feature selection, classification and accuracy assessment.
Remote Sensing Multivariate Statistics To calculate covariance, we first compute the corrected sum of products (SP) defined by the equation:
Just as simple variance was calculated by dividing the corrected sums of squares (SS) by (n – 1), covariance is calculated by dividing SP by (n – 1). Therefore, the covariance between brightness values in bands k and l, cov kl, is equal to: Remote Sensing Multivariate Statistics
Band 1 (green) Band 2 (red) Band 3 (near- infrared) Band 4 (near- infrared) Band 1 SS 1 cov 1,2 cov 1,3 cov 1,4 Band 2 cov 2,1 SS 2 cov 2,3 cov 2,4 Band 3 cov 3,1 cov 3,2 SS 3 cov 3,4 Band 4 cov 4,1 cov 4,2 cov 4,3 SS 4 Format of a Variance-Covariance Matrix
Jensen, 2004 Band 1 (green) Band 2 (red) Band 3 (near- infrared) Band 4 (near- infrared) Band Band Band Band Variance-Covariance Matrix of the Sample Data
Correlation between Multiple Bands of Remotely Sensed Data To estimate the degree of interrelation between variables in a manner not influenced by measurement units, the correlation coefficient, r, is commonly used. The correlation between two bands of remotely sensed data, r kl, is the ratio of their covariance (cov kl ) to the product of their standard deviations (s k s l ); thus:
Correlation between Multiple Bands of Remotely Sensed Data If we square the correlation coefficient (r kl ), we obtain the sample coefficient of determination (r 2 ), which expresses the proportion of the total variation in the values of “band l” that can be accounted for or explained by a linear relationship with the values of the random variable “band k.” Thus a correlation coefficient (r kl ) of 0.70 results in an r 2 value of 0.49, meaning that 49% of the total variation of the values of “band l” in the sample is accounted for by a linear relationship with values of “band k”.
Correlation Matrix of the Sample Data Band 1 (green) Band 2 (red) Band 3 (near- infrared) Band 4 (near- infrared) Band Band Band Band
Band Min Max Mean Standard Deviation Covariance Matrix Band Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band Correlation Matrix Band Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band Band Min Max Mean Standard Deviation Covariance Matrix Band Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band Correlation Matrix Band Band 1 Band 2 Band 3 Band 4 Band 5 Band 6 Band Univariate and Multivariate Statistics of Landsat TM Data of Charleston, SC
Feature Space Plots The univariate and multivariate statistics discussed provide accurate, fundamental information about the individual band statistics including how the bands covary and correlate. Sometimes, however, it is useful to examine statistical relationships graphically. Individual bands of remotely sensed data are often referred to as features in the pattern recognition literature. To truly appreciate how two bands (features) in a remote sensing dataset covary and if they are correlated or not, it is often useful to produce a two-band feature space plot. The univariate and multivariate statistics discussed provide accurate, fundamental information about the individual band statistics including how the bands covary and correlate. Sometimes, however, it is useful to examine statistical relationships graphically. Individual bands of remotely sensed data are often referred to as features in the pattern recognition literature. To truly appreciate how two bands (features) in a remote sensing dataset covary and if they are correlated or not, it is often useful to produce a two-band feature space plot.
Feature Space Plots A two-dimensional feature space plot extracts the brightness value for every pixel in the scene in two bands and plots the frequency of occurrence in a 255 by 255 feature space (assuming 8-bit data). The greater the frequency of occurrence of unique pairs of values, the brighter the feature space pixel.
Two-dimensional Feature Space Plot of Landsat Thematic Mapper Band 3 and 4 Data of Charleston, SC obtained on November 11, 1982 Two-dimensional Feature Space Plot of Landsat Thematic Mapper Band 3 and 4 Data of Charleston, SC obtained on November 11, 1982