Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil
Outline Background Biodiversity Data Digitizer (BDD) & IABIN Data Quality Methodology Data Quality Tools BDD Geo Tool BDD Taxon Tool Conclusion
Background Importance of Species Occurrence Data GBIF Portal IABIN Portal Data quality impacts the uses of data Location | Taxonomic data domain Georeferencing | Identification are two major causes of error in species occurrence data Need to improve Data Quality (DQ)
Data quality & IABIN-PTN Inter-American Biodiversity Information Network (IABIN) Pollinators Thematic Network (PTN) GEF-funded project ( ) (~$180k) 11 countries in Latin America ~400,000 records Responsibilities Development of tools for data digitization and integration Data Digitization Training and support Reviewing proposals, reports, data Close contact with data owners / providers
Data Quality & IABIN-PTN Opportunities & needs Discuss digitization issues with the grantees Standards: importance and role (TDWG) Data quality: concepts Improve data quality Provide mechanisms integrated to digitization tools versus isolated tools
Biodiversity Data Digitizer (BDD) Designed for easy: Digitization Manipulation Publication Rich data content FAO-GEF pollinator project Demo: Thu
Location Data Domain DQ Assessment Methodology What is Data Quality?
DQ Management Methodology How to improve the DQ? Error prevention is considered superior to error detection
Resources to Improve DQ on BDD Tools to prevent errors on occurrence data digitization Integrated to BDD species occurrence data-entry interface BDD Geo Tool prevent location data digitization errors BDD Taxon Tool prevent taxonomic data digitization errors
BDD Geo Tool Step 1 of 3 – Primary Data
BDD Geo Tool Step 2 of 3 – Data Source
BDD Geo Tool Step 3 of 3 – Uncertainty
BDD Geo Tool Location data form is filled
BDD Geo Tool Improved Completeness : adds data not available before (ex. lat/long, municipality) Consistency : consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans ) Credibility : associate data to a credible source (BioGeomancer, Google, GeoNames) Accuracy : better than center of mass of a region Precision : uncertainty indicator increases data fitness for use
BDD Taxon Tool Step 1 of 2 – Taxonomic Name Selection
BDD Taxon Tool Step 2 of 2 – Taxonomic Hierarchy Selection
BDD Taxon Tool Taxonomic data form filled
BDD Taxon Tool Improved Completeness : taxonomic hierarchy is filled from a taxon name Consistency : consistent data are obtained from a consistent source (Catalog of Life) Credibility : data associate to a credible source (Catalog of Life) Accuracy : avoid spelling mistakes / entering an incorrect taxonomic hierarchy Precision : complete scientific names suggestions
Conclusion Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool Improved completeness, consistency, accuracy and precision of species occurrence data Error prevention in taxonomic and location data Tools available for an audience with little literacy on data digitization and DQ
Conclusion Next steps Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added Work on error correction on existing data Spreadsheet based data correction Suggestions and collaboration are welcome!
Acknowledgements IABIN – PTN Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz Sellers and Ben Wheeler (USGS) Pedro Correa (University of São Paulo) All data grantees FAO-UNEP-GEF Pollinator project in Brazil Barbara Gemmil-Herren (FAO) Ministry of the Environment - Brazil All data grantees
Thank you Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil