Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Similar presentations


Presentation on theme: " Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation."— Presentation transcript:

1  Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil

2 Outline  Background  Biodiversity Data Digitizer (BDD) & IABIN  Data Quality Methodology  Data Quality Tools  BDD Geo Tool  BDD Taxon Tool  Conclusion

3 Background  Importance of Species Occurrence Data  GBIF Portal  IABIN Portal  Data quality impacts the uses of data  Location | Taxonomic data domain  Georeferencing | Identification are two major causes of error in species occurrence data  Need to improve Data Quality (DQ)

4 Data quality & IABIN-PTN  Inter-American Biodiversity Information Network (IABIN)  Pollinators Thematic Network (PTN)  GEF-funded project (2006-2011) (~$180k)  11 countries in Latin America  ~400,000 records  Responsibilities  Development of tools for data digitization and integration  Data Digitization Training and support  Reviewing proposals, reports, data  Close contact with data owners / providers

5 Data Quality & IABIN-PTN  Opportunities & needs  Discuss digitization issues with the grantees  Standards: importance and role (TDWG)  Data quality: concepts  Improve data quality  Provide mechanisms integrated to digitization tools  versus isolated tools

6 Biodiversity Data Digitizer (BDD)  Designed for easy:  Digitization  Manipulation  Publication  Rich data content  FAO-GEF pollinator project Demo: Thu

7 Location Data Domain DQ Assessment Methodology What is Data Quality?

8 DQ Management Methodology How to improve the DQ?  Error prevention is considered superior to error detection

9 Resources to Improve DQ on BDD  Tools to prevent errors on occurrence data digitization  Integrated to BDD species occurrence data-entry interface  BDD Geo Tool  prevent location data digitization errors  BDD Taxon Tool  prevent taxonomic data digitization errors

10 BDD Geo Tool Step 1 of 3 – Primary Data

11 BDD Geo Tool Step 2 of 3 – Data Source

12 BDD Geo Tool Step 3 of 3 – Uncertainty

13 BDD Geo Tool Location data form is filled

14 BDD Geo Tool Improved  Completeness : adds data not available before (ex. lat/long, municipality)  Consistency : consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans  )  Credibility : associate data to a credible source (BioGeomancer, Google, GeoNames)  Accuracy : better than center of mass of a region  Precision : uncertainty indicator increases data fitness for use

15 BDD Taxon Tool Step 1 of 2 – Taxonomic Name Selection

16 BDD Taxon Tool Step 2 of 2 – Taxonomic Hierarchy Selection

17 BDD Taxon Tool Taxonomic data form filled

18 BDD Taxon Tool Improved  Completeness : taxonomic hierarchy is filled from a taxon name  Consistency : consistent data are obtained from a consistent source (Catalog of Life)  Credibility : data associate to a credible source (Catalog of Life)  Accuracy : avoid spelling mistakes / entering an incorrect taxonomic hierarchy  Precision : complete scientific names suggestions

19 Conclusion  Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool  Improved completeness, consistency, accuracy and precision of species occurrence data  Error prevention in taxonomic and location data  Tools available for an audience with little literacy on data digitization and DQ

20 Conclusion Next steps  Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added  Work on error correction on existing data  Spreadsheet based data correction Suggestions and collaboration are welcome!

21 Acknowledgements  IABIN – PTN  Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz Sellers and Ben Wheeler (USGS)  Pedro Correa (University of São Paulo)  All data grantees  FAO-UNEP-GEF Pollinator project in Brazil  Barbara Gemmil-Herren (FAO)  Ministry of the Environment - Brazil  All data grantees

22 Thank you Allan Koch Veiga allan.kv@gmail.com Etienne Americo Cartolano Jr etienne.cartolano@gmail.com Antonio Mauro Saraiva saraiva@usp.br Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil


Download ppt " Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation."

Similar presentations


Ads by Google