A methodology for translating positional error into measures of attribute error, and combining the two error sources Yohay Carmel 1, Curtis Flather 2 and Denis Dean 3 1 The Thechnion, Haifa, Israel 2 USDA, Forest Service 3 Colorado State University
Part 1: bridging the gap between positional error and classification error
classification error -- difference in pixel class between the map and a reference
Positional error (misregistration, location error) Is the gap between the true location of an item and its location on the map / image
Positional error may translate to thematic error positional error largely affects overall thematic error (often more than classification error) (Townshend et al 1992, Dai and Khorram 1998)
Positional error: RMSE = 2.51 m classification error: Accuracy Matrix: Goal 1: find a common denominator for both error types Goal 2: combine the two error types to get an overall estimate of error (in the context of temporal change) THE PROBABILITY THAT AN OBSERVED TRANSITION IS CORRECT
positional error affects thematic error
Expressing positional error in terms of thematic error Shift = 1, 0 Shift = 15, 7 Shift = 2, 3
Expressing positional error in terms of thematic error
Error model: step 1 (and step 2) of 5 RMSE Positional Error Classification Error
positional accuracy matrix A LOC classification accuracy matrix A CLASS Combined A BOTH A LOC 1, 1 *A CLASS 2, 1 / n +1 A LOC 2, 1 *A CLASS 2, 2 / n +2 A LOC 3, 1 *A CLASS 2, 3 / n +3 A BOTH 2,1 = + + Classified error model: step 3 (of 5) Combining the two accuracy matrices
Classified Reference Error model: step 3 Combined Error Matrix
Error model: step 4 Calculating the combined PCC, and the Combined user accuracy, p(C) Classified Reference PCC = 0.70 THE PROBABILITY THAT AN OBSERVED STATE IS CORRECT
Error model: step 5 Calculating multi-temporal indices One such index is The probability that an observed transition is correct The context of this model is temporal change. The goal is to provide indices for the reliability of an observed change
Example: vegetation changes in Hastings Nature Reserve, California
Example: vegetation changes in Hastings, California RMSE 1939 = 3.53 m 1995 = 2.51 m User accuracy for: Grass in 1939 = 0.92 Trees in 1995 = 0.91 positional accuracy Classification accuracy C1C1 C4C4
The probability that an observed transition is correct p(C 1 C 2 …C n ) = p(C 1 ) * p(C 2 ) * … * p(C n ) C1C1 C2C2 C3C3 C4C
Error model: step 5 p(C 1 C 2 …C n ) = p(C 1 ) * p(C 2 ) * … * p(C n ) C1C1 C2C2 C3C3 C4C4 The probability that an observed transition is correct: This probability may be calculated as the product of the respective user-accuracy value for the respective year and class
Example: vegetation changes in Hastings, California GT 0.53 p(GRASS 1939 TREE 1995 ) = p(G 1939 G 1956 G 1971 T 1995 ) = GGGT
Transition type Nature of transition Proportion in 1939 Probability of being correct, given : positional error Classif. error Combined error GGGG Grassland does not change CGGGChaparral burnt in 1955 fire All 69 transitions involving forest Averaged across the entire study area Indices of accuracy of multi-temporal datasets
(1)errors in each time step are independent of errors in other time steps (2)positional and classification errors are independent of one another A simulation study was conducted in order to evaluate how robust is the model in general and in particular -- to violations of two assumptions:
Maps for simulations High autocorrelation, equal class proportions Low autocorrelation, unequal class proportions Int. J. Rem. Sens. 2004
Original map Spatial error Classification error Both error types Simulation
Model simulations were conducted under a range of values for: Number of map categories (2-4) Class Proportions in the original map Auto-correlation in the original map Auto-correlation in classification error Classification error rate Positional error rate Correlation in error structure between time steps Correlation between the two error types
Some results of simulation runs
When models assumptions are not met, model fit decreases Maximum correlation found in real datasets* *IEEE GRSL 2004 Maximum correlation found in real datasets* *IEEE GRSL 2004
Transition type Nature of transition Proportion in 1939 Probability of being correct, given : positional error Classif. error Combined error GGGG Grassland does not change CGGGChaparral burnt in 1955 fire All 69 transitions involving forest Averaged across the entire study area Indices of accuracy of multi-temporal datasets
PART 2: Controlling data-uncertainty via aggregation Pixel size = 0.3 m Grid cell size = 15 m = 2500 pixels
Map aggregation = image degradation overlay a grid of cells on the image (cell >>pixel) and define the larger cell as the basic unit Pixel size = 0.3 m Grid cell size = 15 m = 2500 pixels
‘soft’ aggregation ‘hard’ aggregation Map aggregation = image degradation overlay a grid of cells on the image (cell >>pixel) and define the larger cell as the basic unit
Impact of positional error is largely reduced when cells are aggregated a b c
At the pixel level: only 55% of the pixels remained unaffected by a minor shift At the grid cell level: Impact of positional error is largely reduced when cells are aggregated
This trade-off calls for a model that quantifies the process to aid decisions on optimal level of aggregation Aggregation: Gain in accuracyBUTloss of information
A geometric approach to the impact of positional error
Effective positional error at the grid cell level is the proportion of pixels that transgress into neighboring cells (RMSE units)
Positional error at the GRID CELL level p(loc) is the probability that positional error translates into attribute error p A (loc) is the same probability – in the context of a larger grid cell
The impact of aggregation on thematic accuracy 0.23 A p(loc) cell size error 0.6 m m m0.01
Conclusions positional error has a large impact on thematic accuracy, particularly in the context of change But can be easily mitigated: increase MMU to >10X[positional error] and do not worry about it. Within overall thematic error at the pixel level – classification error component is typically smaller than the positional error component, but is more difficult to get rid of by aggregation.
TODA THANK YOU