BIS TDWG Conference 29 October 2014, Jönköping, Sweden Publishing sample-based data using Darwin Core Archives Éamonn Ó Tuama, Markus Döring, Kyle Braak, Tim Robertson, Olaf Bánki Global Biodiversity Information Facility (GBIF)
Why do this? Long perceived need by GBIF to enable publishing of abundance (sample) data; Requirement with the EU Project EU BON ( Meeting the needs of the GEO Biodiversity Observation Network (GEO BON ).
Sample-based data Output of monitoring programmes; Quantitative, calibrated; Using standard protocols; Repeatable, comparable. Detect changes and trends in populations
Constraints Be available for testing in 2015 Build on existing widely used standards: Darwin Core Work within the existing tools ecosystem: IPT … while acknowledging the promise of ontologies (BCO, OBOE …)
Caveat Aim: demonstrate one way data can be exposed to maximize discoverability and reuse. Not in scope: establishing how data should be captured or modelled.
A use case Enabling the flow of sample based data in support of GEO BON Essential Biodiversity Variables (EBVs).
Essential Biodiversity Variables intermediate layer between raw data and indicators GEO BON has identified six EBV classes a measurement required for study, reporting and management of biodiversity change
EBV Class: Species populations
Building on the Darwin Core vocabulary
taxonRank higherClassification taxonConceptID collectionCode geodeticDatum specificEpithet coordinatePosition collectionCode: The name, acronym, coden, or initialism identifying the collection or data set from which the record was derived. Examples: "Mammals", "Hildebrandt", "eBird". Darwin Core – a glossary of terms
7 essential terms for encoding sample data 1.eventID 2.projectID (new) 3.samplingProtocol 4.sampleSize (new) 5.sampleSizeUnit (new) 6.quantity (new) 7.quantityType (new)
New terms required eventID: an identifier for the set of information associated with an Event; may be a global unique identifier or an identifier specific to the data set. projectID: an identifier for a project with which the data is associated; use to link related data sets, e.g., a monitoring series; may be a global unique identifier or an identifier specific to the series.
New terms required sampleSize: a numeric value for the time duration, length, area or volume involved in the sampling. sampleSizeUnit: the unit of measurement used for sampling, e.g., minute, hour, day, metre, metre^2, metre^3. 2hour 3m2 17km 1litre
Unit of measurement vocabulary
Used in IPT as controlled list for sampleSizeUnit Unit of measurement vocabulary
New terms required quantity: the number or enumeration value of the entity or category being quantified in the sample. As such it is paired with quantityType. quantityType: the entity being referred to by quantity, e.g., individuals, a percentage (e.g., species, biomass, biovolume), a scale type 14Individuals rBraunBlanquetScale 0.4%Species 31%Biomass
Publishing sample data using the IPT
Event Core An event core is the logical way of organising a sampling event; Related environmental measurements can be included in an extension; Vegetation plot data (coverages) can be included separately from “occurrences”.
Darwin Core Archive components Event core Occurrence ext Measurement-or-fact ext Relevé ext … meta.xml EML.xml … + DwC Archive
Event Core (Event, Location, Geological Context) eventID, projectID (n), samplingProtocol, sampleSize (n), sampleSizeUnit (n) Occurrence Extension (Occurrence, Taxon, Identification) eventID, quantity (n), quantityType (n) (n) = proposed new term Placing the terms in a Darwin Core Archive For term definitions, see
eventIDprojectIDsampling Protocol sample Size sample SizeUnit event Datelocationdecimal Latitude decimal Longitude … C_1428RM065AQEM1.25m2m Kinzig O3 Rothenbergen … C_1538RM065AQEM1.25m2m Kinzig W1 Bulau … eventIDscientificNamequantityquantityType… C_1428 Baetis rhodani 14individuals… C_1428 Ephemera danica 15individuals… C_1428 Gyraulus albus 2individuals… C_1538 Serratella ignita 318individuals… A sampling event uses a particular samplingProtocol with sampleSize and sampleSizeUnit, etc. and can record one or more taxa, each of which has a measurement (quantity and quantityType associated with it. Event core Occurrence extension
Adapting the IPT Now with Event Core
This project has received funding from the European Union’s Seventh Programme for research, technological development and demonstration under grant agreement No Acknowledgement EU BON and GEO BON partners, TDWG mailing list contributors and GBIF sample data workshop participants informed this work and are gratefully acknowledged.
Thank you GBIF Secretariat Universitetsparken 15 DK-2100 Copenhagen Ø Denmark Phone: Fax: