Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing John Wieczorek Information Architect Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011
Background: Data Exchange ABCD (TDWG Standard) > 1200 concepts XML Shared via BioCase, Tapir Darwin Core (pre-standard v. 1.2, 47 versions) 48 concepts, specimens XML Shared via by DiGIR Darwin Core (pre-standard v. 1.4) 46 concepts (plus extensions), specimens XML Shared via Tapir Darwin Core (TDWG Standard) 172 concepts (156 in Simple Darwin Core), biodiversity data CSV, XML, RDF, JSON, … Shared via Text files, Tapir, Darwin Core Archive…
Darwin Core Archive Primary Biodiversity Data Taxonomic Data Metadata
Darwin Core Archive Complete Package Standard Darwin Core terms in a single, self-contained dataset Taxon records or Occurrence Records Data set metadata in EML
Simple format (text files) Efficient harvesting (single file) Efficient storage (compressed) Easy access (no special software required) Extensible (related files in one archive) Darwin Core Archive: Benefits Preferred format for publishing data in the GBIF network
Darwin Core Archive: Anatomy Archives always have a metadata file as EML
Ecological Metadata Language (EML) Title and Abstract Citation and Attribution Contact and Authors Geographic Scope Sampling Methods Bibliography and more… For describing data sets – even unpublished ones
Darwin Core Archive: Anatomy Archives always have a core data file as text
Core data file types Records based on taxa – one species per row Records based on species occurrences – one per row OR
Darwin Core Archive: Anatomy Archives always have a core data file as text
Core contains a “core ID” column, unique for every record in the file Darwin Core Archive: Anatomy
Columns are matched to Darwin Core terms Darwin Core Archive: Anatomy
Columns that do not match to a Darwin Core term may be included, but are ignored “Wingspan” is not a Darwin Core term Darwin Core Archive: Anatomy
1) Rename columns in text file Two ways to match columns to Darwin Core terms Darwin Core Archive: Anatomy
2) Match columns to terms in a separate meta.xml file Two ways to match columns to Darwin Core terms Darwin Core Archive: Anatomy
meta.xml matches the columns in the core data file (species.txt) More on how to make the meta.xml file later… Darwin Core Archive: Anatomy
Archives can include extension files Species.txt Common_names.txt Extensions allow multiple records to be linked to a core record. Extensions link to the core through the core ID Darwin Core Archive: Anatomy
GBIF hosts extension definitions
Multiple extensions files can be linked to the core Darwin Core Archive: Anatomy
All files are stored in a single folder Darwin Core Archive: Anatomy
The folder is zipped. This is a Darwin Core Archive Data files Column matching file Data set documentation Darwin Core Archive: Anatomy
/my_data.zip Archives on a web server can be accessed by a URL. Share this URL to “publish” your data! Darwin Core Archive: Publishing
Darwin Core Archive: Publishing Options
GBIF Spreadsheet Templates
Integrated Publishing Toolkit
Data Hosting Centers
Darwin Core Mapping Assistant Metafile
Darwin Core Mapping Assistant
GBIF Darwin Core Archive Spreadsheet Templates: data in a spreadsheet already simple archive authoring IPT: creating/managing archives for multiple data sets managing archives for multiple organisations metadata as GBIF Metadata Profile of EML Make Your Own: automating archive generation customisation Hosting center: economy of scale Infrastructure and support Combinations… Darwin Core Archive: Publishing Options
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing Presenter ( ) Role Organization Buenos Aires (Argentina) 28 September 2011