URBDP 422 URBAN AND REGIONAL GEO-SPATIAL ANALYSIS Lecture 16: Exploring Complexity, Uncertainty, and Error in the Urban Landscape Exercise 10: Accuracy Assessment March 6, 2014
Accuracy and Precision Accuracy is the degree to which information on a map or in a digital database matches true or accepted values. Accuracy is an issue pertaining to the quality of data and the number of errors contained in a data set or map. Precision refers to the level of measurement and exactness of description in a GIS database. Precise location data may measure position to a fraction of a unit. Precise attribute information may specify the characteristics of features in great detail.
Accuracy and Precision Closeness of results, computations or estimates to true values Difficult to assess in a digital environment Accuracy of data may not relate to accuracy of analysis using the data Precision Number of decimal places or significant digits in a measurement Not the same as accuracy GIS operates at high precision URBDP 422 Urban and Regional Spatial Anaysis - Alberti
Accuracy and Precision
Types of Error Positional accuracy Attribute accuracy Conceptual accuracy Logical accuracy
Positional Accuracy Closeness of locational information to its true position uncertainty resolution Maps roughly accurate to one line width (0.5 mm)
Positional Accuracy and Precision Applies to both horizontal and vertical positions - x, y, z Function of the scale at which spatial database was created The mapping standards employed by the United States Geological Survey specify that: "requirements for meeting horizontal accuracy as 90% of all measurable points must be within 1/30th of an inch for maps at a scale of 1:20,000 or larger, and 1/50th of an inch for maps at scales smaller than 1:20,000"
r = 3.3 Foote and Huebner 1995
Accuracy standards 1:1200 +/- 3.33 feet 1:2400 +/- 6.67 feet Means a point, line, etc. is in a “probable” location
Spatial objects are in "probable“ locations within a certain area Foote and Huebner 1995
False Accuracy and False Precision Result from interpreting spatial information beyond the levels of accuracy and precision in which they were created
URBDP 422 Urban and Regional Spatial Anaysis - Alberti Attribute Accuracy and Precision The non-spatial data linked to location may be inaccurate or imprecise. Attribute inaccuracy may result from inaccuracy of data or mistakes in entering the data. The non-spatial data itself can also vary greatly in precision. For example, a precise description of a person living at a particular address might include gender, age, income, occupation, level of education, and many other characteristics. URBDP 422 Urban and Regional Spatial Anaysis - Alberti
Attribute Accuracy Closeness of attribute values to their true values Location may not change with time, but attributes often do Assessing attribute accuracy may vary depending upon nature of data Continuous attributes surfaces Categorical attributes objects
conceptual design of the database. Conceptual Accuracy and Precision Inaccuracies and imprecision may be inherent in the conceptual design of the database. Users may use inappropriate categories or misclassify information. For example, classifying cities by voting behavior would probably be an ineffective way to study atmospheric pollution.
Logical Accuracy and Precision Information stored in a database can be employed illogically. An example could include performing mathematical operations on categorical data. GIS systems are typically unable to warn the user if inappropriate comparisons are being made or if data are being used incorrectly
Logical Consistency Rules related to: Connection Homogeneity Level of generalization EXAMPLES Intersections where intended Areas closed Over and undershoots
Completeness Degree to which data exhausts the universe of possible items Are all possible objects included in database? Affected by rules of selection, generalization and scale
Lineage Record of the data sources and operations that created the database Source documents Data capture method Data producer/collector Treatment method Steps used to process data, transformation algorithms, etc. Useful indicator of accuracy
Sources of inaccuracy Burrough (1986) divides sources of error into three main categories: 1. Obvious sources of error. 2. Errors resulting from natural variations or from original measurements. 3. Errors arising through processing.
Obvious Sources of Error Age of data Areal Cover Map Scale Density of Observations Relevance Format Accessibility Cost
Age of Data Data sources may simply be to old to be useful or relevant to current GIS projects Past collection standards may be unknown, non-existent,or not currently acceptable Much of the information base may have subsequently changed over time
Areal Cover Data on a given area may be completely lacking, or only partially available Uniform, accurate coverage may not be available User must decide whether further collection of data is required
Map Scale The ability to show detail in a map is determined by its scale Scale restricts type, quantity, and quality of data One must match the appropriate scale to the level of detail required in the project
Density of observation The number of observations within an area is a guide to data reliability and should be known by the map user. If the contour line interval on a map is 40 feet, resolution below this level is not accurately possible.
Relevance When the desired data regarding a site or area cannot be obtained, "surrogate" data may have to be used instead. A valid relationship must exist between the surrogate and the phenomenon it is used to study. An example of surrogate data are electronic signals from remote sensing that are use to estimate land cover. The data is being obtained by an indirect method. Sensors on the satellite do not "see" trees, but only certain digital signatures typical of trees and vegetation. Sometimes these signatures are recorded by satellites even when trees and vegetation are not present
Format Methods of formatting digital information for transmission, storage, and processing may introduce error in the data. Examples are: Rasterizing a vector map Vectorizing a raster map Digitizing & scanning
Accessibility Accessibility to data is not equal. What is open and readily available in one country or agency may be restricted, classified, or unobtainable in another. Also access to the quality of data may vary across agencies and data sets.
Costs and Copyrights Extensive and reliable data is often quite expensive to obtain or convert. Copyrights also may limit data access and quality control.
Errors Resulting from Natural Variation or from Original Measurements Positional accuracy Accuracy of content Sources of variation in data
Positional Accuracy Spatial analysts can accurately place well-defined objects and features such as... roads, buildings, boundary lines Less discrete boundaries such as vegetation or soil type may reflect the estimates of the surveyor Many entities lack sharp boundaries in nature and are subject to interpretation Faulty or biased field work, map digitizing errors and conversion, and scanning errors can all result in inaccurate maps for GIS projects.
Sources of Variation in Data Variations in data may be due to measurement error introduced by: - faulty observation - biased observers - mis-calibrated or inappropriate equipment
Errors Arising Through Processing Numerical Errors Errors in Topological Analysis Classification and Generalization Problems Digitizing and Geocoding Errors
Accuracy Assessment Overall Accuracy: total number of correctly classified elements divided by the total number of reference elements Accuracies of Individual Categories Producer’s Accuracy: number of correctly classified elements divided by the reference elements for that category (omission) User’s Accuracy: correctly classified elements in each category by the total elements that were classified in that category (comission)
What is Cohen’s Kappa A measure of agreement that compares the observed agreement to agreement expected by chance if the observer ratings were independent Expresses the proportionate reduction in error generated by a classification process, compared with the error of a completely random classification. For perfect agreement, kappa = 1 A value of .82 would imply that the classification process was avoiding 82 % of the errors that a completely random classification would generate.
Kappa Coefficient
The Problems of Propagation and Cascading GIS usually involve operations on many sets of data Inaccuracy, imprecision, and error may be compounded in GIS that employ many data sources - in two ways: Propagation one error leads to another Cascading erroneous, imprecise, and inaccurate information will skew a GIS solution
Foote and Huebner 1995
Components of Data Quality recognized by National Standard for Digital Data Quality 1.Lineage Narrative of source materials used & procedures applied Parameters of projections and transformations 2.POSITIONAL ACCURACY Usually the component identified with "accurate" maps National Map Accuracy Standards: 90% of well-defined points within .02 3.ATTRIBUTE ACCURACY Error in attribute value Categories: reported as misclassification matrix 4.LOGICAL CONSISTENCY Amount that the data fits into the expected structure tests based on internal evidence within database 5.COMPLETENESS Exhaustiveness of coverage
Data Documentation What is the age of the data? Where did it come from? In what medium was it originally produced? What is the areal coverage of the data? To what map scale was the data digitized? What projection, coordinate system, and datum were used in maps? What was the density of observations used for its compilation?
Data Quality How accurate are positional and attribute features? Does the data seem logical and consistent? Do cartographic representations look "clean?" Is the data relevant to the project at hand? In what format is the data kept? Why was the data compiled? How was the data accuracy assessed? What is the reliability of the data source?
Uncertainty is Inevitable - use metadata to document the uncertainty - sensitivity analysis to find the impacts of input uncertainty on output - rely on multiple sources of data -reporting the uncertainty in results of GIS analysis. - US Federal Geographic Data Committee lists five components of data quality: attribute accuracy, positional accuracy, logical consistency, completeness, and lineage (details see www.fgdc.gov)