GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.

Slides:



Advertisements
Similar presentations
Current Trends in Biodiversity Collection Description Neil Thomson The Natural History Museum.
Advertisements

Katia Cezón GBIF Spain, Coordination Unit Real Jardín Botánico, Madrid 2014 Mentoring Project 2014 France-Portugal-Spain DATA QUALITY WORKFLOW.
GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
To share data, all providers must agree upon a data standard.
Integrating Biodiversity Data
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Making small data big! The Biodiversity Data Journal (BDJ) Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts, Vincent Smith ViBRANT.
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
Scratchpads Publishing biodiversity: The interplay between Scratchpads and the Biodiversity Data Journal Dr Dimitrios Koureas Biodiversity Informatics.
Oregon Spatial Data Library Partnership Metadata Training OU Knight Library Eugene, Oregon December 3, 2009 Kuuipo Walsh Institute for Natural Resources.
SANBI’s role in promoting Biodiversity Information Standards in South Africa Sediqa Khatieb TDWG 2011
OpenUp! A New Project on Opening up the European Natural History Heritage for EUROPEANA W. G. Berendsohn, A. K. Michel, A. Güntsch, W.-H. Kusber (2011)
Release 4 of the COUNTER Code of Practice for e- Resources and new usage- based measures of impact Peter Shepherd COUNTER May 2014.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Publishing biodiversity data via GBIF data templates and IPT2 Hsiang-Ying Li, Jason Mai Biodiversity Research Center, Academia Sinica
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
11 th GBIF Global NODES Meeting Incentivising and Strategising Publishing of Biodiversity Data Vishwas Chavan Senior Programme Officer for Digitisation.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
BUILDING HIGHWAYS IN THE INFORMATICS LANDSCAPE Ed Baker /m9.figshare
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.
GLOBAL BIODIVERSITY INFORMATION FACILITY TDWG 2009, Montpelier, November 12, 2009 Dag Endresen (NordGen)Samy Gaiji (GBIF) Dag Endresen (NordGen) & Samy.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
A paradigm shift in biodiversity publishing: mobilization, mark up, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
Biodiversity Data Journal: mobilization, reuse and integration of small data Lyubomir D. Penev 1,3, Teodor A. Georgiev 3, Pavel E. Stoev 2,3, Jordan Bisserkov.
GBIF France GBIF EU Nodes Meeting – Joensuu March 2013 Anne-Sophie Archambeau Marie-Elise Lecoq Pere Roca Ristol (Régine Vignes & Eric Chenin)
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
An Introduction to Scratchpads: Making your data work for you Laurence Livermore Natural History Museum, London Joinville, Brazil.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Dag Endresen Knowledge Systems Engineer GBIF New Orleans (Louisiana, USA) 20 October 2011 Biodiversity Information Standards, TDWG.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
TDWG Annual Meeting Outreach and Capacity Building Work Program Beatriz Torres October 2002, Indaiatuba, Brazil.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
NLBIF The Netherlands Biodiversity Information Facility NLBIF The Netherlands Biodiversity Information Facility Cees Hof Netherlands Biodiversity Information.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Taxonomic Workflow in the EDIT Platform for Cybertaxonomy Andreas Kohlbecker, Pepe Ciardelli, Niels Hoffmann, Katja Luther, Andreas Müller Botanic Garden.
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
Where now for the taxon transfer schema and related work: collaboration possibilities? Jessie Kennedy.
Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Eric Gilman 10 th Meeting of the GBIF Participant Node Managers Committee 3 – 5 October 2009.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
Sample-based data publication; reflections on semantics and logic 1(1) Hanna - GBIF Finland Lepidoptera collection of Hannu SaarenmaaPublicNo (but DwC.
NRF Open Access Statement
Introduction to Persistent Identifiers
Session 01: Course introduction
GBIF Implementation Plan Highlights
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Academic writing for researchers
GLOBAL BIODIVERSITY INFORMATION FACILITY
1B Publishing Primary Biodiversity Data
HOW (and why?) DO WE DESCRIBE ?
Presentation transcript:

GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

DATA PUBLISHING LANDSCAPE DiGIR/TAPIR in high use to publish biodiversity data Idea for simple, compressed text-based file for publishing introduced at TDWG GBIF introduces IPT 1.0 GBIF redevelops IPT GBIF introduces IPT 2.0 Data Publishing taught at Nodes training Nodes and aggregators begin to install and use IPTs Occurrence and checklist type datasets along with IPT installations show continued growth  2011

DATA PUBLISHING LANDSCAPE - STATISTICS

DATA PUBLISHING LANDSCAPE - STATISTICS

DATA PUBLISHING LANDSCAPE 2015 The continued GBIF commitment to improving access to biodiversity data Refinement and expansion of standards and publishing software Evolving social norms Most data still published with simple occurrence core Portals do not contain the features to support richer data Many institutions still need convincing to publish biodiversity data

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

WHAT IS BIODIVERSITY DATA? Digital text or multimedia data record detailing facts about the instance of occurrence of an organism, i.e. on the what, where, when, how and by whom of the occurrence and the recording.

WHAT IS DATA PUBLISHING? “Publishing” refers to making biodiversity datasets publicly accessible and discoverable, in a standardized form, via an access point, typically a web address (a URL). IPT ∞

BIODIVERSITY DATA TYPES Checklists Occurrences Metadata

BIODIVERSITY DATA TYPES – SAMPLE DATA Samples

DATA STANDARDS ABCD Access to Biological Collection Data (2005) DwC Darwin Core (2009) AC Audubon Core Multimedia Resources Metadata Schema (2013) NCD Natural Collection Descriptions (Draft)

DARWIN CORE recordedBy: A list (concatenated and separated) of names of people, groups, or organizations responsible for recording the original Occurrence. The primary collector or observer, especially one who applies a personal identifier (recordNumber), should be listed first. Examples: "José E. Crespo", "Oliver P. Pearson | Anita K. Pearson”

SIMPLE DARWIN CORE SIMPLEDWC is a specification for one particular way to use the Darwin Core terms - to share data about taxa and their occurrences in a simply structured way - and is probably what is meant if someone suggests to "format your data according to the Darwin Core".

DARWIN CORE ARCHIVE A Darwin Core Archive (DwCA) is the text representation of data formatted to Darwin Core. A DwCA is a compressed file containing a minimum of three files.

STAR SCHEMA Ext 2 Core Ext 1 Ext 3 meta.xml EML.xml + DwC Archive Ext 4 Ext 5

MAPPING CORES Taxon Core The category of information pertaining to taxonomic names, taxon name usages, or taxon concepts. Released April 2015, this version removes terms dcterms:source and dcterms:rights, and adds dcterms:license. 43 terms. Occurrence Core The category of information pertaining to evidence of an occurrence in nature, in a collection, or in a dataset (specimen, observation, etc.). Released July 2015, this version removes terms dcterms:source, dcterms:rights, dwc:individualID, dwc:occurrenceDetails, and adds dcterms:license, dwc:organismQuantity, dwc:organismQuantityType, dwc:organismID, dwc:organismName, dwc:organismScope, dwc:associatedOrganisms, dwc:organismRemarks, dwc:parentEventID, dwc:sampleSizeValue, dwc:sampleSizeUnit. 169 terms. Event The category of information pertaining to a sampling event. Issued 29 May terms

EXTENSIONS Darwin Core does not provide terms for every possible type of data. 22 registered 25 under development Examples Audubon Media Description (aka Audubon Core) Darwin Core Identification History Darwin Core Measurement or Facts

STAR SCHEMA EXAMPLE - OCCURRENCE Media Occurrence Core Geographical Determination meta.xml EML.xml + DwC Archive Occurrence Germoplasm

STAR SCHEMA EXAMPLE - CHECKLIST Literature Taxon Core Description Occurrences meta.xml EML.xml + DwC Archive Checklist Vernacular Distribution Types

STAR SCHEMA EXAMPLE - SAMPLE Event Core Occurrences Measurement/Fact meta.xml EML.xml + DwC Archive Samples Relevé

DATA NORMALIZATION What is data normalization? Reasons to normalize a database Normal forms

DATA QUALITY Tools Should you work on improving the data? Importance of feedback

DATA PUBLISHING METHODS

DATA PUBLISHING METHODS – POLLS To be explained in the live session… 

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

PROMOTION OF DATA PUBLISHING Topic of discussion at the Nodes Training in Berlin in Core element in the day-to-day work of Node Managers.

PROMOTION OF DATA PUBLISHING - BARRIERS Psychological & cultural barriers 1.Lack of knowledge 2.Lack of understanding 3.Lack of will 4.Perceived data value 5.Privacy concerns 6.Lack of authorization 7.Lack of time / planning 8.Lack of capacity 9.Lack of funding 10.Lack of infrastructure Institutional barriers Capacity barriers Practical barriers

PROMOTION OF DATA PUBLISHING - RESTRICTIONS 1.Refuse to share. 2.Refuse to share until they have exhausted the planned use of the data. 3.Will only share their data for a fee. 4.Will only share data under specific restrictions. 5.Agree to share data openly.

PROMOTION OF DATA PUBLISHING - STRATEGIES 1.Facilitate access to financial support. 2.Call upon commitments or legal mandates. 3.Call upon open access / moral principles. 4.Show the benefits of a better data management. 5.Show the benefit for their scientific careers. 6.Peer pressure. 7.Start / support big digitization programmes. 8.Start / support data repatriation efforts.

PROMOTION OF DATA PUBLISHING – DISCUSSION Challenges Not wanting to publish and/or not wanting to publish all the data Technical threshold of an IPT Restrictive licensing of data Strategies Start smaller – meta data only Promote one-off publishing with multiple exposures Provide hosted IPTs to eliminate technical threshold Illustrate licensing with telling examples. Promote and organize trainings to bring reluctant publishers in with an easier “sell” like data papers.

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

USE CASES - INTRODUCTION Explore four use cases based on current publishing practices Literature Observation data Natural history collections Checklists Complete two exercises Definition of publishing strategies Publish datasets

USE CASES: DATA FROM LITERATURE Blue Group

USE CASE 2: OBSERVATIONAL DATA Green Group Red Group

USE CASE 3: NATURAL HISTORY COLLECTION DATA Yellow Group

USE CASE 4: TAXONOMIC CHECKLISTS Purple Group

INDEX Data publishing landscape Biodiversity data publishing Data types Data standards Data normalization and data quality Data publishing methods Promotion of data publishing Use cases

GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell