Beyond Metadata: Towards User- Centric Description of Data Quality Michael F. Goodchild University of California Santa Barbara
Metadata n Data about data –handling instructions –catalog entry –fitness for use n What is known about data quality –a measure of the success of spatial data quality research –much progress has been made –FGDC CSDGM 1994 –ISO –DDI –EML
Two tests of success n Geobrowsers –Google Earth –geotagging –Wikimapia –Where 2.0
CSDGM, ISO n Do they match the state of research? –early 1990s –SDTS discussions of 1980s –the five-fold way positional accuracy attribute accuracy logical consistency completeness lineage n Do they represent a user perspective? –committees staffed by data producers –production control mechanisms?
Producer or user? n Producer-centric –details of the production process: the measurement and compilation systems used –tests of data quality conducted under carefully controlled conditions –formal specifications of data set contents n User-centric –effects of uncertainties on specific uses of the data, from simple queries to complex analyses –simple descriptions of quality that are readily understood by non-expert users –tools to enable the user to determine the effects of quality on results
Increasing complexity n Self-documentation –notes to oneself n A colleague –brief description n Another discipline, language, culture –ideal metadata/data ratio?
social distance complexity of metadata
Seven issues n Areas in which research has moved beyond the standards –Accuracy of Spatial Databases 1989 –Measurements from Maps 1989 –15 books –1000 journal articles
1. Decoupling the representative fraction n Ratio of distance on the map to distance on the ground –no flat map of a curved surface can have a constant RF n RF as a surrogate –positional accuracy –spatial resolution –map content n RF undefined for digital data –inherited from source maps –extended by convention aerial photographs (RF of the photographic plate) digital orthoimagery (positional accuracy)
2. Accuracy or uncertainty? n Accuracy –a true value z exists –a measured value z* –error z*-z –RMSE –theory of measurement error –error propagation n Uncertainty –vagueness in definitions no truth perhaps a consensus? –lack of replicability n Change of paradigm around 1992 CSDGMISO accuracy857 uncertainty00
3. Objects and fields n A fundamental distinction –1992 –appears nowhere in the standards n Discrete object conceptualization –an empty table top –occupied by discrete, countable objects –points, lines, areas, volumes n Continuous field conceptualization –a mapping from location x to value z –a single-valued function of location
z'(x) = z(x) + δz(x)
Separability n Phenomenon conceptualized as a field –impossible to separate positional and attribute accuracy –interval/ratio (elevation) –nominal (land cover class)
4. Granularity n Metadata definable at any level –individual vertex –point, line, area –layer –geodatabase n Metadata as a form of generalization –economies of scale n Spatial non-stationarity n Multiple lineages
5. Collection-level metadata n Describing the properties of entire collections n The Geospatial One-Stop – n There will always be more than one one-stop –how to know where to look?
GOS coverage, 1/06
6. Spatial dependence n Tobler’s First Law –nearby things are more similar than distant things –applies to errors –relative accuracy almost always better than absolute accuracy –covariances as important as variances
Marginal or joint properties? n Visualization of marginal properties n Analytic functions respond to joint properties –slope –area n Joint properties must be described at a higher level –relative errors of vertex positions –described at level of vertex collection
Cross-correlation n How are errors on Layer 1 related to errors on Layer 2? n Error as an issue in interoperability –what happens if I superimpose these layers? n Two layers will almost always not fit –depends on lineage of each –how bad is the misfit? –will it affect my analysis? n Binary metadata –the ability of a pair of data sets to interoperate –not available from either’s unary metadata n If GIS is about overlay –then binary metadata are essential
The way forward n Reopen the metadata debate –an unpopular move –it’s hard enough to persuade people to provide metadata –a standard before its time –standards should emerge only after research is complete n It’s our responsibility –the research task does not end with journal publication –metadata standards express the state of our research n Many other issues not related to data quality –possible allies