BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.

Slides:



Advertisements
Similar presentations
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
Advertisements

BIS TDWG Conference 29 October 2014, Jönköping, Sweden Publishing sample-based data using Darwin Core Archives Éamonn Ó Tuama, Markus Döring, Kyle Braak,
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
Next Steps in the Catalogue of Life Frank Bisby, Sp2000 and Thomas Orrell, ITIS Catalogue of Life Partnership.
Eye on Earth (EoE), Citizen Science and the Invasive Alien Species project Malene Bruun NRC’s for EIS June 17, 2011.
GLOBAL BIODIVERSITY INFORMATION FACILITY Greg Riccardi Co-chair 9 November Outcomes of the GBIF LSID-GUID Task Group.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
GBIF WP consultation Planning for 2014 and beyond Olaf Bánki Senior Programme Officer for Participation Global Biodiversity Information Facility (GBIF)
Value of a coordinate: geographic analysis of agricultural biodiversity Andy Jarvis, Julian Ramirez, Nora Castañeda, Samy Gaiji, Luigi Guarino, Hector.
The EDIT Platform for Cybertaxonomy as an information broker in name infrastructures Andreas Kohlbecker 1, Yde de Jong 2, Cherian Mathew 1, Lorna Morris.
Fourth Annual Summit | Feb | Tucson, AZ Scratchpads for community involvement for natural history collections Dr Dimitris Koureas Biodiversity.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
11 th GBIF Global NODES Meeting Incentivising and Strategising Publishing of Biodiversity Data Vishwas Chavan Senior Programme Officer for Digitisation.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
General strategy. Introduction Global “financial crisis” Beginning to cascade into GBIF Now thinking about the forward strategy and next work programme.
Mid-Term GBIF Committees Meetings eLearning Alberto González Talaván Global Biodiversity Information Facility (GBIF) May 2011.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
GLOBAL BIODIVERSITY INFORMATION FACILITY Éamonn Ó Tuama Senior Programme Officer, IDA 21 June Metadata publishing with the IPT.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
TDWG 2006, Missouri, U.S.A. Exchange of germplasm datasets with PyWrapper/BioCASE October 16, 2006 TDWG annual Meeting 2006 Missouri Botanical Garden St.
GBIF and the Biodiversity Informatics Landscape Donald Hobern GBIF Director Global Biodiversity Information Facility (GBIF) London, 24 July 2013.
A curation interface for reconciliation of species names for India. Thomas Vattakaven and R. Prabhakar, India Biodiversity Portal, Strand Life Sciences,
CBoL Taipei, september 2007 BARCODE DATA, MUSEUM CATALOGS AND GBIF Simon Tillier.
Encyclopedia of Life Established May 2007 First version of portal went online Feb year goals –Assemble infinitely expandable web pages for all.
Christina Flann Species 2000 October 2014 Catalogue of Life Indexing The World’s Known Species Connecting the taxonomic community and the names infrastructure.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY DNA Barcoding in Southern Africa Cape Town 7 April
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
Isabel Calabuig Lotte Endsleff 1 NODES regional MEETING Europe Digitarium,
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
BIS TDWG Conference, New Orleans, 2011 GBIF: the challenges of intra- and inter-operability at large scales David Remsen Senior Programme Officer Global.
GBIF Poland *** Current Status Piotr Tykarski University of Warsaw Polish GBIF Node EU GBIF NODES Meeting, Paris, 5.IV.2011.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Beatriz Torres IABIN 5th Council Meeting Punta del Este, Uruguay
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat The GBIF Data.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Progress Alastair Culham. i4Life – the BIG aim To move Catalogue of Life from a research project to a sustainable service 1.To enhance the content 2.To.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
GBIFS Seminar with the Science Committee and the Nodes Strategy Group Analysis of the content published by the GBIF network – Better understanding what’s.
BIS TDWG Conference, New Orleans, 2011 GBIF and Genomic Data Éamonn Ó Tuama Senior Programme Officer, Inventory, Discovery, Access (IDA) Global Biodiversity.
Where now for the taxon transfer schema and related work: collaboration possibilities? Jessie Kennedy.
GBIF - ECAT  Electronic Catalogue of Names of Known Organisms  Program Officer;  Per de Place Bjørn 
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan Senior Programme Officer for DIGIT 10 th Meeting of the GBIF Participant Node Managers Committee.
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Eric Gilman 10 th Meeting of the GBIF Participant Node Managers Committee 3 – 5 October 2009.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 GBIF Training Materials and Future Plans Alberto GONZÁLEZ-TALAVÁN.
GBIF – collaborating to promote data access for research and policy Tim Hirsch Deputy Director Global Biodiversity Information Facility (GBIF) Biodiversity.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
Emmelina monodactyla (Linnaeus, 1758), Hellerup, Denmark, 4 May 2013 ANTANANARIVO, MADAGASCAR, OCTOBER 2015 Update and Strategic Plan Donald Hobern,
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
COST Action and European GBIF Nodes Anne-Sophie Archambeau.
12 th Meeting of the GBIF Participant Nodes Committee 6-7 October 2013, Berlin, Germany Towards a generic work programme for a Node Olaf Bánki Senior Programme.
Sample-based data publication; reflections on semantics and logic 1(1) Hanna - GBIF Finland Lepidoptera collection of Hannu SaarenmaaPublicNo (but DwC.
Introduction to GBIF and the BID programme
GBIF Implementation Plan Highlights
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
GBIF Governing Board 20 12th Global Nodes Meeting
The Natural Science Collections Facility
GLOBAL BIODIVERSITY INFORMATION FACILITY
Data Management: The Data Repatriation Re-integration Step or …
GLOBAL BIODIVERSITY INFORMATION FACILITY (GBIF)
GBIF Strategic Plan Alberto González-Talaván
Big Data Needs Little CRUD:
Presentation transcript:

BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus Döring Global Biodiversity Information Facility (GBIF)

Outline 1. The GBIF network and the Data Quality challenge 2. Current DQ processes in GBIF Portal 3. DQ and GBIF Nodes 4. Addressing DQ in GBIF work programme

GBIF is … - a connected community - an informatics infrastructure - a window on biodiversity - a tool for science and society

Addressing data quality Meeting the challenge of documenting data quality as the network and volume of data grow …

As of August 2013: >405,720,500 indexed records from 10,139 datasets from 493 publishers and spanning a wide range of geospatial, temporal and taxonomic coverages. Current GBIF Network Data Coverage

DQ processes in GBIF portal Minimum obligatory metadata Check geographic values Check taxonomic values

Packaging metadata with data

Verbatim data asserted to originate in USA as shared on the network Geographic attributes

Data following quality check Coastal regions recognised Offshore islands recognised Geographic attributes 85% (355/417 mil) georeferenced records 2.7% (9.4 million) georeferenced with issues 85% (355/417 mil) georeferenced records 2.7% (9.4 million) georeferenced with issues

Trochilidae (Hummingbirds) Using verbatim higher classification Taxonomic attributes

Trochilidae (Hummingbirds) Classification based on authoritative sources 56% of name usages also found in CoL

Authoritative checklists Fill gaps in the GBIF taxonomic backbone Increase list of known synonyms Increase the number of common names known to GBIF

New improved algorithm for GBIF backbone taxonomy Some taxa (mainly autonyms) do not have stable IDs Too many accepted species created because of lack of a good database of taxonomic synonyms

Working with Catalogue of Life GBIF backbone taxonomy Catalogue of Life Global Species Databases GBIF ChecklistBank DwC-A Checklists

GBIF backbone taxonomy Catalogue of Life Global Species Databases GBIF ChecklistBank DwC-A Checklists Working with Catalogue of Life 8188 names annotated 6825 rejected names 541 placed names (added to ILDIS) remaining have syntactical problems (CoL issue, not ILDIS) First backbone based on CoL feedback loop expected around December 2013 The first two GSDs have already provided annotations: International Legume Database & Information Service (ILDIS) Scarabs: World Scarabaeidae Database 1339 names annotated 0 rejected names

Data Quality issues Non-standardised values Example: dwc:country ( 29,052 distinct values for country names Of these, 18,704 (concerning 2.2 mil records) could not be mapped to an ISO country code. Typical issues: Variants: 126 different values for “Italy” Mismappings: taxon names instead of country names Incorrect level of detail: sub-national units, non- country geographical entities

Data Quality issues Non-standardised values Example: dwc:basisOfRecord ( ) values that cannot be interpreted at all (accounting for 13.3 mil records) Typical issues: Spelling variants / language variants Mismappings Misunderstanding definition 30 mil records with no value or “unknown” Interpretable values quite varied e.g. 31 values mapped to “observation”, 146 to “specimen”

DQ and GBIF Nodes Desirable improvements Better metadata Persistent IDs Controlled vocabularies Annotations Independently validated datasets Genetic validation of taxonomy

DQ and GBIF Nodes Implementing improvements Collate experiences of all Nodes and share best practices Build reusable DQ components (e.g., tools, vocabularies, workflows)

DQ and GBIF Nodes Next steps Expand Data Quality Interest Group Establish a collaboration platform

Addressing Data Quality in GBIF Work Programme

Ensure stable identifiers for datasets and records Provide a method for citation of data sets Enable annotation of data GBIF Work Programme GBIF Work Programme Essential Infrastructure to support Data Quality

Engagement of expert communities to form fitness- for-use working groups enhancements to data standards and classes of data in use in GBIF criteria and algorithms for evaluating data quality, fitness-for-use, coverage and completeness content mobilisation priorities (inc. improving already mobilised data) identification and curation of reference data sets GBIF Work Programme GBIF Work Programme

Guidelines and supporting tools to assess and improve metadata completeness for all data Evaluation and reporting on metadata completeness and quality Seeking to ensure that the basis of record is clear for each data record GBIF Work Programme GBIF Work Programme Criteria from fitness-for-use working groups

GBIF portal upgrades to report data quality and fitness-for-use for each data set and species  Standards compliance  Metadata completeness  Presence of key data elements  Automated checks for issues and outliers  Endorsements of data publishers and data sets by Nodes, fitness-for-use working groups and other stakeholders GBIF Work Programme GBIF Work Programme Criteria from fitness-for-use working groups

Thank you GBIF Secretariat Universitetsparken 15 DK-2100 Copenhagen Ø Denmark Phone: Fax: