Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA QUALITY AND ERROR  Terminology, types and sources  Importance  Handling error and uncertainty.

Similar presentations


Presentation on theme: "DATA QUALITY AND ERROR  Terminology, types and sources  Importance  Handling error and uncertainty."— Presentation transcript:

1 DATA QUALITY AND ERROR  Terminology, types and sources  Importance  Handling error and uncertainty

2 DATA QUALITY GIGO: garbage in, garbage out Because it’s in the computer, don’t mean it’s right Accept there will always be errors in GIS

3 INTRODUCTION GIS - great tool for spatial data analysis and display question: what about error?  data quality, error and uncertainty  error propagation  confidence in GIS outputs be careful, be aware, be upfront

4 TERMINOLOGY various (often confused terms) in use:  error  uncertainty  accuracy  precision  data quality

5 ERROR AND UNCERTAINTY Error wrong or mistaken degree of inaccuracy in a calculation  e.g. 2% error Uncertainty lack of knowledge about level of error unreliable

6 Accuracy and Precision Accuracy extent of system-wide bias in measurement process Precision level of exactness associated with measurement Imprecise Precise InaccurateAccurate 1 43 2

7 DATA QUALITY degree of excellence in data general term for how good the data is takes all other definitions into account  error  uncertainty  precision  accuracy

8 DATA QUALITY based on the following elements:  positional accuracy  attribute accuracy  logical consistency  data completeness

9 POSITIONAL ACCURACY spatial: deviance from true position (horizontal or vertical) general rule: be within the best possible data resolution  i.e: for scale of 1:50,000, error can be no more than 25m can be measured in root mean square error (RMS) - measure of the average distance between the true and estimated location temporal: difference from actual time and/or date

10 ATTRIBUTE ACCURACY classification and measurement accuracy  a feature is what the GIS thinks it to be i.e. a railroad is a railroad and not a road i.e. a soil sample agrees with the type mapped rated in terms of % correct in a database, forest types are grouped and placed within a boundary in reality - no solid boundary where only pine trees grow on one side and spruce on the other

11 ATTRIBUTE ACCURACY

12 LOGICAL CONSISTENCY presence of contradictory relationships in the database non-spatial  crimes recorded at place of occurrence, others at place where report taken  data for one country is for 2000, another for 2001  data uses different source or estimation technique for different years

13 LOGICAL CONSISTENCY spatial  overshoots and gaps in road networks or parcel polygons Good logical consistency

14 COMPLETENESS reliability concept  are all instances of a feature the GIS claims to include, in fact, there? partially a function of the criteria for including features  when does a road become a track? simply put, how much data is missing?

15 SOURCES OF ERROR sources of error:  data collection and input  human processing  actual changes  data manipulation  data output

16 DATA COLLECTION AND INPUT inherent instability of phenomena itself  random variation of most phenomena (i.e. leaf size)  edges may not be sharp boundaries (i.e. forest edges) description of source data  data source  name, date of collection, method of collection, date of last modification, producer, reference, scale, projection  inclusion of metadata

17 DATA COLLECTION AND INPUT instrument inaccuracies:  satellite/air photo/GPS/spatial surveying  e.g. resolution and/or accuracy of digitizing equipment thinnest visible line: 0.1 - 0.2 mm at scale of 1:20,000 - 6.5 - 12.8 feet anything smaller, not able to capture  attribute measuring instruments

18 DATA COLLECTION AND INPUT model used to represent data  e.g. choice of datum, classification system data encoding and entry  e.g. keying or digitizing errors original digitised

19 DATA COLLECTION AND INPUT Attribute uncertainty uncertainty regarding characteristics (descriptors, attributes, etc.) of geographical entities types: imprecise or vague, mixed up, plain wrong sources: source document, misinterpretation, database error 505.9 238.4 500 240 500-510 230-240 238.4 505.9 238.4 505.9

20 HUMAN PROCESSING misinterpretation (i.e. photos), spatial and attribute effects of classification (nominal/ordinal/ interval) effects of scale change and generalization Scale of data Global DEM European DEM Nation al DEM Local DEM

21 HUMAN PROCESSING generalization - simplification of reality by cartographer to meet restrictions of map scale and physical size, effective communication and message 1:500,000 1:25,000 1:10,000 City of Sapporo, Japan can result in: reduction, alteration, omission and simplification of map elements

22 ACTUAL CHANGES gradual natural changes: river courses, glacier recession catastrophic changes: fires, floods, landslides seasonal and daily changes: lake/sea/river levels man-made: urban development, new roads attribute change: forest growth (height), discontinued trail/roads, road surfacing

23 ACTUAL CHANGES age of data Northallerton circa 1867 Northallerton circa 1999

24 DATA MANIPULATION vector to raster conversion errors coding and topological mismatch errors:  cell size (majority class and central point)

25 DATA MANIPULATION vector to raster conversion errors coding and topological mismatch errors:  grid orientation

26 DATA MANIPULATION compounding effects of processing and analysis of multiple layers  if two layers each have correctness of 90%, the accuracy of the resulting overlay is around 81% density of observations - TIN modeling and interpolation inappropriate or inadequate class intervals or inputs for models

27 DATA OUTPUT scaling accuracies  detail on scale bar and scale type error caused by inaccuracy of the output devices:  resolution of computer screen or printer  colour palettes: intended colours don’t match from screen to printer

28 DATA OUTPUT USE information may be incorrectly understood information may be inappropriately used

29 HANDLING ERROR must learn to cope with error and uncertainty in GIS applications  minimise risk of erroneous results  minimise risk to life/property/environment more research needed:  mathematical models  procedures for handling data error and propagation  empirical investigation of data error and effects  procedures for using output data uncertainty estimates  incorporation as standard GIS tools

30 HANDLING ERROR Awareness  knowledge of types, sources and effects Minimization  use of best available data  correct choices of data model/method Communication  to end user!


Download ppt "DATA QUALITY AND ERROR  Terminology, types and sources  Importance  Handling error and uncertainty."

Similar presentations


Ads by Google