GBIF Publishing Platform May 2011
Core publishing focus Primary Biodiversity Data (Specimens & Observations, Ecological Data) - Core data type is an occurrence of a taxon Taxonomic Catalogues*, and Annotated Species Checklists. - Core data type is a taxon * To distinguish our efforts from COL – GBIF provides the means not the ends Enriched resource metadata – primarily focused on occurrence and taxon datasets.
Core publishing targets Sufficient coverage of fit-for-use primary biodiversity data to meet identified requirements. A Primary Biodiversity Data clearinghouse. A comprehensive “catalog of catalogues” that provide core taxonomic dictionaries and organisational framework for biodiversity data * To distinguish our efforts from COL – GBIF provides the means not the ends A comprehensive inventory of Primary Biodiversity data collections (digital and un- digitised) and nationally or thematically relevant species checklists.
Data Publishing Platform for who? Institutional publishers in developed countries. Large proportion of current publishers in this category. Smaller institutions with less technical capacity. Many in high-biodiverse regions Small (individual scientist) data holders ‘Disenfranchised” potential publishers who currently don’t recognise GBIF as a publishing option
Consolidate Strengthen Simplify Accelerate Extend Data publishing strategy How
Data publishing today (Often) Unsupported software on dedicated servers Misused protocols Multiple data formats Complex, rigid and difficult to maintain even for those with capacity Requires System administrators and programmers Re-indexing the CURRENT set of resources = 1 MONTH!! these are toothpicks TAPIR
Consolidated data standards Primary Biodiversity Data Taxonomic Data Metadata Darwin Core Ecological Metadata Language (EML) 172 Terms Ratified in 2009 Text files Extensible Rich dataset descriptions GBIF Profile
Darwin Core Archive Primary Biodiversity Data Taxonomic Data Metadata
Accelerate network performance DarwinCore archives – self-contained packages of data Published as a URL! Consolidated – one format for Primary and Taxonomic data From Months to Daily – for some users – faster Published data appears much faster! Simplified harvesting (Priority #3, Participants Report) Darwin Core Archives
Rebuilt following community consultation Provide a supported, evolving publishing tool Supports all three CORE data types (Primary/Taxon/Metadata) Establish a Steering Committee to guide product direction Seek Guidance from SC on directions too! A Platform for offering Community Services Integrated Publishing Toolkit 2.0 Strengthened publishing tool
Integrated Publishing Toolkit Metadata Authoring Primary Biodiversity Data Species Checklists Metadata Authoring Primary Biodiversity Data Species Checklists
Data Hosting Centers Coming soon… Endangered Wildlife Trust SABIF EIA Data Center INBIF EIA Data Center Coming soon… Endangered Wildlife Trust SABIF EIA Data Center INBIF EIA Data Center
Publish with spreadsheets Metadata Primary Biodiversity data Species Checklists Publishing via For biologists and database managers
No special software required
Suite of publishing options
Data Publishing documentation Full documentation for all aspects of data publishing Living documents
Improve Data Quality New roles for engaging Participants
Enable Data Quality Assessment & Improvement as part of the Network Today done in Copenhagen Tomorrow – through the network Improvements made BEFORE data published New and increased roles for participating in GBIF
Extend the standard Darwin Core Archives are extensible Simple Extensible Internationalised Standards-based
Global Names Architecture A Darwin Core-based profile using the GBIF network to share taxonomic information. Evaluation underway – 16 reviewers / 39 checklists
GBIF Schema Repository Darwin Core Terms List of Extensions Vocabularies An schema repository for developers and trainers
Ensure local uptake of technology
Make data publishing worthwhile Improved attribution through provenance improvements in Registry Improved relevance through extensibility DarwinCore Archives support multiple uses of data. Not all roads lead to Copenhagen For OrganisationsFor Individuals Data publishing = scholarly publishing Increased visibility due to simplified and consistent citation methods Data is easier to consume! For Both Make good data even better!
Persistent Identifiers Journal System Submission Acceptance Revision Peer Review Publication Registry GBRDS DOI Distributed Metadata Catalogues Metadata Authors auto conversion to manuscript GBIF Metadata Repository ZooKeys PhytoKeys BioRisks Data Paper: Recognising Data Discovery
Reward data publishing Data Paper Metadata document
Deep Data Citation Mechanism & Service Deep data citation mechanism Recognise ALL with their roles Multilayer citation – Publisher determined and User driven Cascading citation: Citations within citations Assign GUIDs for citation text Data Citation Service Register citations Resolve citation GUIDs Status: Working with CODATA Data Citation Task Group & DataCite
Anticipated Impact of Deep Data Citation Data Citation Data Discovery Data Publishing Data Preservation Data Use
Best Practice Guide on Data Publishing & Use Decleration Survey of existing ‘Data Sharing’ & ‘Data Use’ agreements (35 agreements) Issues identified: – Terminologies used: Provider v/s Publisher, Sharing v/s Publishing, Agreement v/s Declaration – Lack of adequate information: e.g. data citation, fitness-for-use, license, etc. Best Practice Guide on Publishing & Use Declaration (Q3 2011)
Catalogue of tools & services for discovery, digitisation & publishing Tools and services for : – data/metadata capture/collection – data digitisation – quality assessment – quality enhancement – discovery and publishing Community-driven Frequent updates Included in Welcome Box and Online Resource Centre
Questions: METADATA Mobilising more metadata In order to get more metadata catalogues connected, we need greater uptake by GBIF Participants and other organisations. How? act opportunistically – just accept what is offered? just target some of the key external networks (e.g., KNB)? undertake a focused campaign among Participants using the Participants Report feedback? is there a case for a second round of the incentivisation scheme for metadata catalogues but with a focus on “bigger catches” like national catalogues?
Questions: Checklists Mobilising more sources Suggest strategies should be employed for mobilising species lists? Example: Harmonising national and state-relevant species lists to ITIS in USA Priority lists – National Species Inventories Priority subject – National Clearinghouse Mechanisms
Questions: GEO BON / GEOSS How can the GBIF community get itself better represented in GEOSS / GEO BON? leave it to the GBIF Secretariat? just need to be kept informed, e.g., via community site and GBIF news items? active participation - have resources to contribute infrastructure at national level? participate in project consortia to take development forward?