Measurement-Based GIS Michael F. Goodchild University of California Santa Barbara
The GIS design legacy n CGIS n CAD n McHarg and overlay n Remote sensing n Early efforts in the US Bureau of the Census n ARC/INFO
Error-sensitive GIS n Storing characterizations of uncertainty n Propagation through GIS operations n Visualization n Confidence limits on products
How to build one? n Augmentation of existing data models –new attributes of objects, object classes, data sets –metadata –the five-fold way –Lanter and Veregin, GeoLineus –inheritance, object-orientation
Overlay and combine values according to the Boolean rule If (A.EQ.1).AND.(B.GT.2) then C=1 else C=0
An implicit assertion n Uncertainty can be represented by certain additions to the data model –no change of data model is required –no change of representation n GIS data models evolved before concerns about uncertainty –is a fundamental change of representation needed by an error-sensitive GIS?
An example n The area class map
An uncertain version n At every point (x,y) there exists a vector of memberships in each of the classes {p 1 (x,y),p 2 (x,y),p 3 (x,y),…} n Switch to a field representation –raster, sample point values, TIN, contours n A GIS that uses strictly polygons cannot be made error-sensitive by adding attributes to existing objects (polygons, arcs)
Positional uncertainty n The fundamental item of geographic information –uncertainty in x n Geographic location –absolute –stored at point, object, data set level –return absolute position
Measurement of position n Position measured –x = f(m) n Position interpolated –between measured locations –surveyed straight lines –registered images n The inverse function –m = f -1 (x)
Theory of measurement error n Measured value = true value + distortion –x' = x + x –some derived value y = x 2 –y + y = (x + x) 2 –expanding and ignoring terms in ( x) 2 – y = 2 x x –more generally if y = f (x); y = df/dx x –generalizes to several variables, variance- covariance matrices
Errors of position n Location distorted by a vector field –x' = x + (x) – (x) varies smoothly n Database with objects of mixed lineage –different vector fields for each group of objects –lineage may not be apparent < e.g. not all houses share same lineage
Absolute and relative error n Two points x 1, x 2 –perfect correlation of errors, (x 1 ) = (x 2 ) < no error in distance –zero correlation of errors < maximum error in distance n Absolute error for a single location –measured by (x) n Relative error for pairs of locations –value depends on error correlations
Implications n Most GIS operations involve more than one point –e.g. distance, area measurement, optimum routing –knowledge of error correlations is essential if error is to be propagated into products –joint distributions are needed –statistics such as the confusion matrix provide only marginal distributions
The inverse f -1 n An error is discovered in x –error at x 1 is correlated with error at x 2 –both errors are attributed to some erroneous measurement m –to determine the effects of correcting x 1 on the value of x 2 it is necessary to know f and its inverse f -1
Definitions n Coordinate-based GIS –locations represented by x –f, f -1 and m are lost during database creation n Measurement-based GIS –f and m available –x may be determined on the fly –f -1 may be available
Partial correction n The ability to propagate the effects of correcting one location to others –preserving the shapes of buildings and other objects –avoiding sharp displacements in roads and other linear features n Partial correction is impossible in coordinate-based GIS –major expense for large databases
The geodetic model n Equator, Poles, Greenwich n Sparse, high-accuracy points –First-order network n Dense, lower-accuracy points –Second-order network n Interpolated positions of even lower accuracy n Locations at each level inherit the errors of their parents
Formalizing measurement- based GIS n Structured as a hierarchy –levels indexed by i –locations at level i denoted by x (i) –locations at level (i+1) derived through equations of the form x (i+1) = f(m,x (i) ) –locations at level 0 anchor the tree –locations established independently (GPS but not DGPS) are at level 0
An example n A utility database n Pipe's location is measured at 3 ft from a property boundary –m = {3.0,L} –property at level 3, pipe at level 4 n Property location is later revised or resurveyed –new m = {2.9,L} –effects are propagated to dependent object
Beyond the geodetic model n National database of major highways –100m uncertainty in position < sufficient for agency < relative accuracies likely higher, e.g. highways are comparatively straight, no sudden 100m offsets n Local agency database –1m accuracy required –two trees with different anchors
Merging trees n Link with a pseudo-measurement –displacement of 0 –standard error of 100m –revisions of the more accurate anchor can now be inherited by the less accurate tree < but will normally be inconsequential
Conclusions n Almost universal adoption of coordinate- based GIS –assumes it is possible to know location exactly –design precision greatly exceeds actual accuracy –in practice exact location is not knowable –attempts at partial correction lead to unacceptable topological and geometrical distortions
Measurement-based GIS n Retains measurements and derivation functions –may obtain absolute locations on the fly n Supports incremental update and correction n Supports merger of databases with different inheritance hierarchies n Legacy GIS designs are not optimal
Implementation n Design from the ground up n Accept a model that includes necessary features –hierarchical databases –object-oriented databases < but support for complex functions, variance- covariance matrices?