Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.

Slides:



Advertisements
Similar presentations
Mr Watson’s Introduction to Spreadsheets
Advertisements

The ISA for Physics What you need to revise.
 To explain the NATURAL WORLD and how it got to be the way it is.  NOT merely to collect “facts” or describe.  Natural here means empirically sensible—that.
Business Planning using Spreasheets-2 1 BP-2: Good Spreadsheet Practice  There is always the temptation to rush in and start entering data.  However.
Programming Paradigms and languages
System Integration Verification and Validation
Lecture 13 Page 1 CS 111 Online File Systems: Introduction CS 111 On-Line MS Program Operating Systems Peter Reiher.
2.2 Validation & Verification
Polsko-Norweski Fundusz Badań Naukowych / Polish-Norwegian Research Fund Estimation of uncertainty in status class assessment for Wel waterbodies Jannicke.
NAPP Photo Five Pockets near Dubois. Google Earth.
CSC1016 Coursework Clarification Derek Mortimer March 2010.
 Image Search Engine Results now  Focus on GIS image registration  The Technique and its advantages  Internal working  Sample Results  Applicable.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Teaching Critical Thinking Skills within Ag Geospatial Curriculum Ag GIS Education Symposium Pismo Beach, California January 20, 2006 Terry Brase, Associate.
Data vs. Information  Data: raw facts or measurements  Information: collection of facts organized/processed in such a way that they have value beyond.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Value of a coordinate: geographic analysis of agricultural biodiversity Andy Jarvis, Julian Ramirez, Nora Castañeda, Samy Gaiji, Luigi Guarino, Hector.
Chemometrics Method comparison
Validation and Verification
Overview Dennis L. Johnson What is GIS? Geographic Information System Geographic implies of or pertaining to the surface of the earth Information implies.
AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.
Image Registration January 2001 Gaia3D Inc. Sanghee Gaia3D Seminar Material.
GIS for Environmental Science ENSC 3603 Class 19 3/24/09.
REFACTORING Lecture 4. Definition Refactoring is a process of changing the internal structure of the program, not affecting its external behavior and.
Sept - Dec w1d11 Beyond Accuracy: What Data Quality Means to Data Consumers CMPT 455/826 - Week 1, Day 1 (based on R.Y. Wang & D.M. Strong)
ITEC224 Database Programming
Parallels of Latitude Meridians of Longitude Graticular Network Georeferencing Using MaNIS/HerpNET/ORNIS Guidelines.
ArcGIS 9 ch 9 Edited 06/14/05 1 Getting GPS Data into ArcGIS At this point, you have successfully collected, corrected, and exported your data using Pathfinder.
GPS GIS GIS in Campbell River GPS: Global Positioning System Originally designed for use by the military It is a satellite-based, radio navigation.
The Tools of Geography FrancisciWG.1. Remember: Geography is the science that studies the lands, the features, the inhabitants and the phenomena of the.
Lab 1 slides 7/25/2005. Chapter 1Slide 2 Principles of Information Systems, Fifth Edition Data vs. Information Data: raw facts or measurements Information:
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
COMPUTER PROGRAMMING. Control Structures A program is usually not limited to a linear sequence of instructions. During its process it may repeat code.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
CS Data Structures I Chapter 2 Principles of Programming & Software Engineering.
3 / 12 Databases MIS105 Lec13 Irfan Ahmed Ilyas CHAPTER Prepared By:
New Advanced Higher Subject Implementation Events Biology: Unit Assessment at Advanced Higher.
Data Creation and Editing Based in part on notes by Prof. Joseph Ferreira and Michael Flaxman Lulu Xue | Nov. 3, :A Workshop on Geographical.
Error & Uncertainty: II CE / ENVE 424/524. Handling Error Methods for measuring and visualizing error and uncertainty vary for nominal/ordinal and interval/ratio.
NSF DUE ; Wen M. Andrews J. Sargeant Reynolds Community College Richmond, Virginia.
SEG 4110 – Advanced Software Design and Reengineering Topic T Introduction to Refactoring.
ArcGIS 9 ch 6 Edited 10/28/05 1 Land Use Analysis An important aspect of any GIS is the use of analysis. Analysis helps us to say something meaningful.
Train-the-Trainers 2 Workshop Overview August, 2013 iDigBio, Gainesville, Florida (What have we gotten ourselves into?)
GIS for Environmental Modeling GIS and GIS Models.
Unit 9: Learning aims A–D
OCR A Level F453: The function and purpose of translators Translators a. describe the need for, and use of, translators to convert source code.
Introduction to Spreadsheets The ‘Quick’ and ‘Easy’ guide to using Microsoft Excel.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Chapter 3- Coordinate systems A coordinate system is a grid used to identify locations on a page or screen that are equivalent to grid locations on the.
Asteroid Strike! Research the answers to these questions: What caused the extinction of the dinosaurs? What is the evidence for this theory? What were.
Geocoding Chapter 16 GISV431 &GEN405 Dr W Britz. Georeferencing, Transformations and Geocoding Georeferencing is the aligning of geographic data to a.
Advanced Higher Computing Science
Geocoding and Georeferencing
Mapping GIS Projections
THIS IS TO EVIDENCE YOUR WORK AND GET THE BEST GRADE POSSIBLE
WXGE6103 Software Engineering Process and Practice
SECTION 5: INFORMATION PROCESSING
Georeferencing Calculator Example
Integration by Substitution
Session 10 ROUTES.
Data Management: The Data Repatriation Re-integration Step or …
Georeferencing Concepts
Tutorial: Writing a Lab Report CHEM 1154
Technical Challenges of developing a common geographical dataset
AICT5 – eProject Project Planning for ICT
Precision & Uncertainties
Presentation transcript:

Geographic data validation

Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

Basic concepts Quality Faithful representation of a feature Quality of data related to quality of output GIGO principle Data have the potential to be used in ways unforeseen when collected. The value of the data is directly related to the fitness for a variety of uses.

Basic concepts Fitness-for-use The suitability of a set of data for a specific purpose A.K.A. usability Should not be confused with quality Quality: Abstract Usability: Specific Low-quality dataset may be of a high usability

Basic concepts Precision o Closeness of repeated measurements to a given value, either correct or not Accuracy o Closeness of a measurement to the true value

Precision vs Accuracy

Basic concepts Precision o Closeness of repeated measurements to a given value, either correct or not Accuracy o Closeness of a measurement to the true value Precision is an intrinsic value Accuracy depends on knowing the true value of the variable Data validation: assessing the accuracy Compare against a reference value

Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

Why do we need validation?

This was a striking example, but more subtle issues can (and actually do) happen We need to develop techniques and methodologies to explore the data In other words, we need to validate the data Validating gives a sense of the reliability of the records, and clues on how to improve it

Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

How to assess? Depending on the aim of the assessment, different techniques Remember that high quality datasets are more likely to show high fitness-for-use Ideally, check for quality If we know the purpose, check for its fitness

How to assess? Work with geographic information a la DarwinCore Work with individual records as well as collections of data Start with the most basic pieces of information Look for coherence with other pieces of information If not, why? Make modifications of information to see if they fit In more advanced levels, make use of available taxonomic or temporal information

How to assess? Tools Spreadsheet: Microsoft Excel, LibreOffice Calc… o Well-known environment o Visually easy Open Refine o Spreadsheet-like, but with some enhanced features Scripts o Database scripts: work directly at the source o Other programming language: enhanced capabilities GIS software o Often linked with other tools, such as spreadsheets or scripts

Visualizations Visual exploration of record set Useful for a first-level assessment Primary visualization for geographic data: maps Next picture has several issues that can be detected using a map…

Coordinate transposition This happens when latitude is stored in longitude field and vice-versa Usually difficult to detect on a one-by-one basis But when looked at the whole picture…

Zero vs Null One of the most common issues Storing 0 (zero) instead of leaving the field empty This happens with some data management systems Latitude 0 and longitude 0 are stored meaning “unknown coordinates” But we do not know that, that is not what the standard says

Negation Forgetting or altering the positive/negative of the coordinates Usually forgetting the minus sign The most common source: transforming from DMS to DD, without taking “W” or “S” into account

Check against country The easiest way of checking these issues is to check if the coordinates fall inside the specified country… Of course, if we have a country value to check against Two ways Use GIS software Use webservices like geonames (we will see this in the openRefine session)

Georeferencing Intermediate check If we have locality information and coordinates, we can check if they match Georeferencing is a tough task, and prone to uncertainties, so some level of imprecision is to be expected Make good use of the “uncertainty” fields in DarwinCore! But still…

, Anahuac NWR (UTC 049) Grandville POINT( ) Marine Nature Study Area 78º 47’ 52” S; 35º 50’ 31” E Stewart Park POINT( ) Backyard My Habitat , Wilderness Park, north of 14th St Delaney Conservation Area 57.3, 11.9

Multi-domain checks Using information from different sources to check quality Especially use taxonomic information to improve geospatial data Most basic example: check data against range map If point falls inside range map of the specified species, OK Sometimes, temporal information is useful

Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced checks Some final considerations

Considerations NEVER modify the original data Data cleaning is a human task, and thus, it is not error- free Information we believe is wrong may be right Make an “improved copy” of the data Or “flag” the records as inaccurate Re-share the improvements With the community: so that others don’t have to re- invent the wheel With the original owners of the data: so that they can correct the errors at the source