 Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation.

Slides:



Advertisements
Similar presentations
Codata Workshop1 V iNCES – Weblabs on ecosystem services Pedro Luiz Pizzigatti Corrêa Universidade de São Paulo - Brazil Agricultural Automation Laboratory.
Advertisements

US GBIF Tools and Services August 12 th, 2010 Giri Palanisamy NBII, ORNL Mike Frame NBII, USGS.
IABIN Catalog Costa Rica February 2011 Ben Wheeler, Mike Frame–US Department of the Interior, US Geological Survey Simon Aristeguieta-Trillos—IABIN Catalog.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
Arthur ChapmanData Quality Training SABIF June 2012 Taxonomic and Nomenclature Data A. D. Chapman Data Quality.
WebBee A Brazilian information network on bees. Antonio Mauro Saraiva Universidade de São Paulo CODATA Workshop – 8-10 May 2007 Atibaia - Brazil.
SinBIOTA 2.0: Planning a New Generation Environmental Information System Prof. Carlos A. Joly & Prof.João Meidanis University of Campinas & Scylla Bioinformatics.
SpeciesLink A System for integrating distributed primary biodiversity data Vanderlei Perez Canhos Centro de Referência em Informação Ambiental, CrIA.
WebBee A platform for a Brazilian information network on bees. Inter-American Workshop on Environmental Data Access 3-6 March 2004 – Campinas - Brazil.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Value of a coordinate: geographic analysis of agricultural biodiversity Andy Jarvis, Julian Ramirez, Nora Castañeda, Samy Gaiji, Luigi Guarino, Hector.
Data quality challenges in the Canadensys network of occurrence records: examples, tools, and solutions Christian Gendreau, David Shorthouse & Peter Desmet.
Federated Networks of Open Access Repositories in Mexico and Latin America Rosalina Vázquez Tapia, Autonomous University of San Luis Potosí.
"INTER-AMERICAN BIODIVERSITY INFORMATION NETWORK (IABIN)" BARCODING IN SOUTH AMERICA MEETING _________________ Campinas, Sao Paulo, Brazil Mar 19 – 20.
1 We can foster collaboration and improve the results of Biodiversity Informatics Antonio Mauro Saraiva Universidade de São Paulo Research Center on Biodiversity.
Universidade de São Paulo – School of Engineering Computing and Digital Systems Engineering Dept. Agricultural Automation Laboratory (LAA) BIODIVERSITY.
InterAmerican Biodiversity Information Network OAS InterAmerican Biodiversity Information Network OAS Alvaro Espinel Department of Sustainable Development.
Key Innovations in Biodiversity Informatics. Opportunities (and challenges) for biodiversity information management in Brazil Biggest biodiversity in.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Evaluating Web Resources Hosted by Lee Anne Morris.
TDWG I3N Database and Tools to Prevent Biological Invasions Joel Rotunda Lionfish.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
1 Improving Statistics for Food Security, Sustainable Agriculture and Rural Development – Action Plan for Africa THE RESEARCH COMPONENT OF THE IMPLEMENTATION.
Global Land-cover and Land-cover Change Task (SB-02) Brice Mora (GOFC-GOLD LC) GEO-X Ministerial Summit, Geneva, January 14, 2014.
Protected Areas Thematic Network IABIN Vision Meeting October 28 th – 29 th, 2008, Washington, DC Presented by Helena Pavese Protected Areas Programme.
IABIN’s Ecosystem Thematic Network: A regional initiative to advance the use and integration of ecosystem information Washington D.C. October 28-29, 2008.
National training, 11 January 2008, Lusaka Technical and scientific cooperation through the Clearing House Mechanism of the Convention on biological diversity.
INTERACTIVE ANALYSIS OF COMPUTER CRIMES PRESENTED FOR CS-689 ON 10/12/2000 BY NAGAKALYANA ESKALA.
IABIN Visioning Meeting Washington, D.C. October 2008 Mike Frame.
Plankton Web Application Project for AIP-7 By Lawrence E. McGovern, DSC International Council on System Engineering/WYLE Aerospace.
FOOD AND AGRICULTURE ORGANIZATION OF THE UNITED NATIONS Collecting and Compiling Food and Agricultural Prices in Latin America and the Caribbean: Current.
IABIN Network Architecture Washington, DC, April 4, 2007.
1 The National Biological Information Infrastructure and Biodiversity Collections Annette Olson BCI meeting, Washington DC, January 28-29th, 2008.
National Database on Alien Invasive Species IABIN Invasives Information Network Project - Jamaica (I3N - JA)
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
The IABIN Pollinators Thematic Network 5 th Council Meeting of IABIN Punta del Este, Uruguay May 9, 2007 Michael Ruggiero, Laurie Adams, and Antonio Saraiva.
Inter-American Biodiversity Information Network (IABIN) Fifth Council Meeting 7-12 May 2007 Presented by: Bonnie C. Carroll Head, IABIN Delegation.
Geographic data validation. Index Basic concepts Why do we need validation? How to assess geographic data Initial checks Intermediate checks Advanced.
IABIN Pollinator Thematic Network: Overview Washington, DC 28 October 2008 Michael Ruggiero Smithsonian Institution, USA
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Assembling Biological Inventories for Analysis Robert J. Meese, Ph.D. University of California, Davis (530) Presented by Andrea.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Train-the-Trainers 2 Workshop Overview August, 2013 iDigBio, Gainesville, Florida (What have we gotten ourselves into?)
Brief Ideas for the Sustainability of the IABIN Invasives Information Network (I3N) As presented to the Inter-American Biodiversity Information Network.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
GBIF - ECAT  Electronic Catalogue of Names of Known Organisms  Program Officer;  Per de Place Bjørn 
IABIN Standards & Protocols Presented by: Mike Frame, USGS NBII Developed by Darrell McClarty IABIN Regional Coordinator.
Presented by: Bonnie C. Carroll Head, IABIN GBIF Delegation Executive Committee Meeting October 28-29, Washington DC IABIN-GBIF Cooperation in the Americas.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
National Biological Information Infrastructure (NBII) BioBot & IABIN BioBot Ben Wheeler USGS Biological Informatics Office January 23 rd, 2007.
A Network for Environmental Data Exchange: Highlights and Plans Gladys Cotter BIO, USGS 27 January 2004 Washington, DC National Biological Information.
IABIN Data Thematic Networks What IABIN Can Do for YOU.
12 th Meeting of the GBIF Participant Nodes Committee 6-7 October 2013, Berlin, Germany Data mobilization and use for international policy Olaf Bánki Senior.
Zoological Institute of the Russian Academy of Sciences — first steps in global data publishing integration Roman Khalikov Leading Engineer of the IT Department.
IABIN Monitoring and Evaluation Methodology V Reunión del Consejo de IABIN __________________ 9-11 Mayo, 2007 Punta del Este Uruguay.
Protected Areas Thematic Network IABIN Council Meeting July 14 th – 17 th, 2009, Bayahibe, República Dominicana Presented by Helena Pavese Protected Areas.
Protected Areas Thematic Network III TALLER INTERNACIONAL DE LA RED DE ÁREAS NATURALES PROTEGIDAS ANDES AMAZONAS 12 al 15 de Mayo de 2008 Quito, Ecuador.
COmbining Probable TRAjectories — COPTRA
Colombia: Capacity enhancement in Latin America
Protected Areas Thematic Network
The IPT user interface and data quality tools
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GBIF Governing Board 20 12th Global Nodes Meeting
EC FP7 - Cooperation Theme 6: Environment (incl. climate change)
The Inter-American Biodiversity Information Network Progress Summary
Nodes Committee and Regional Work Plans & activities
IABIN Catalog Service Indicators Mike Frame.
Nothing Is Perfect: Error Detection and Data Cleaning
Paul J. Morris; Museum of Comparative Zoology
Presentation transcript:

 Data Quality Resources in Species Occurrence Digitization Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil

Outline  Background  Biodiversity Data Digitizer (BDD) & IABIN  Data Quality Methodology  Data Quality Tools  BDD Geo Tool  BDD Taxon Tool  Conclusion

Background  Importance of Species Occurrence Data  GBIF Portal  IABIN Portal  Data quality impacts the uses of data  Location | Taxonomic data domain  Georeferencing | Identification are two major causes of error in species occurrence data  Need to improve Data Quality (DQ)

Data quality & IABIN-PTN  Inter-American Biodiversity Information Network (IABIN)  Pollinators Thematic Network (PTN)  GEF-funded project ( ) (~$180k)  11 countries in Latin America  ~400,000 records  Responsibilities  Development of tools for data digitization and integration  Data Digitization Training and support  Reviewing proposals, reports, data  Close contact with data owners / providers

Data Quality & IABIN-PTN  Opportunities & needs  Discuss digitization issues with the grantees  Standards: importance and role (TDWG)  Data quality: concepts  Improve data quality  Provide mechanisms integrated to digitization tools  versus isolated tools

Biodiversity Data Digitizer (BDD)  Designed for easy:  Digitization  Manipulation  Publication  Rich data content  FAO-GEF pollinator project Demo: Thu

Location Data Domain DQ Assessment Methodology What is Data Quality?

DQ Management Methodology How to improve the DQ?  Error prevention is considered superior to error detection

Resources to Improve DQ on BDD  Tools to prevent errors on occurrence data digitization  Integrated to BDD species occurrence data-entry interface  BDD Geo Tool  prevent location data digitization errors  BDD Taxon Tool  prevent taxonomic data digitization errors

BDD Geo Tool Step 1 of 3 – Primary Data

BDD Geo Tool Step 2 of 3 – Data Source

BDD Geo Tool Step 3 of 3 – Uncertainty

BDD Geo Tool Location data form is filled

BDD Geo Tool Improved  Completeness : adds data not available before (ex. lat/long, municipality)  Consistency : consistent data obtained from a consistent source (avoiding errors like lat:0, long:0, municipality: New Orleans  )  Credibility : associate data to a credible source (BioGeomancer, Google, GeoNames)  Accuracy : better than center of mass of a region  Precision : uncertainty indicator increases data fitness for use

BDD Taxon Tool Step 1 of 2 – Taxonomic Name Selection

BDD Taxon Tool Step 2 of 2 – Taxonomic Hierarchy Selection

BDD Taxon Tool Taxonomic data form filled

BDD Taxon Tool Improved  Completeness : taxonomic hierarchy is filled from a taxon name  Consistency : consistent data are obtained from a consistent source (Catalog of Life)  Credibility : data associate to a credible source (Catalog of Life)  Accuracy : avoid spelling mistakes / entering an incorrect taxonomic hierarchy  Precision : complete scientific names suggestions

Conclusion  Integrated existing techniques, tools, and credible data sources to a species occurrence data-entry tool  Improved completeness, consistency, accuracy and precision of species occurrence data  Error prevention in taxonomic and location data  Tools available for an audience with little literacy on data digitization and DQ

Conclusion Next steps  Other tools, techniques, dimensions and error patterns and domains of data quality in biodiversity are yet to be explored and added  Work on error correction on existing data  Spreadsheet based data correction Suggestions and collaboration are welcome!

Acknowledgements  IABIN – PTN  Laurie Adams (P2), Mike Ruggiero (ITIS), Mike Frame, Liz Sellers and Ben Wheeler (USGS)  Pedro Correa (University of São Paulo)  All data grantees  FAO-UNEP-GEF Pollinator project in Brazil  Barbara Gemmil-Herren (FAO)  Ministry of the Environment - Brazil  All data grantees

Thank you Allan Koch Veiga Etienne Americo Cartolano Jr Antonio Mauro Saraiva Agricultural Automation Laboratory – LAA Computing Engineering Dept., Engineering School Universidade de São Paulo, Brazil