Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing Presenter (email) Role Organization Buenos Aires (Argentina) 28 September 2011
Background: Data Exchange ABCD (TDWG Standard) > 1200 concepts XML Shared via BioCase, Tapir Darwin Core (pre-standard v. 1.2, 47 versions) 48 concepts, specimens Shared via by DiGIR Darwin Core (pre-standard v. 1.4) 46 concepts (plus extensions), specimens Shared via Tapir Darwin Core (TDWG Standard) 172 concepts (156 in Simple Darwin Core), biodiversity data CSV, XML, RDF, JSON, … Shared via Text files, Tapir, Darwin Core Archive… - Reminder about the existing standards
Darwin Core Archive Primary Biodiversity Data Taxonomic Data Metadata http://www.someplace.org/data.zip
Darwin Core Archive Complete Package Standard Darwin Core terms in a single, self-contained dataset Taxon records or Occurrence Records Data set metadata in EML C'est le schméExplain the 3 components of an archive: Core data file + optionnal extensions files Metafile Descriptor file
publishing data in the GBIF network Darwin Core Archive: Benefits Simple format (text files) Efficient harvesting (single file) Efficient storage (compressed) Easy access (no special software required) Extensible (related files in one archive) Simpler data transfer: what takes 500MB of data transfer in Tapir takes 3MB data transfer in DwC-A. Extensible format: more flexible way to map data Preferred format for publishing data in the GBIF network
Archives always have a metadata file as EML Darwin Core Archive: Anatomy Archives always have a metadata file as EML C'est le schméExplain the 3 components of an archive: Core data file + optionnal extensions files Metafile Descriptor file
Ecological Metadata Language (EML) For describing data sets – even unpublished ones Title and Abstract Citation and Attribution Contact and Authors Geographic Scope Sampling Methods Bibliography and more…
Archives always have a core data file as text Darwin Core Archive: Anatomy Archives always have a core data file as text C'est le schméExplain the 3 components of an archive: Core data file + optionnal extensions files Metafile Descriptor file
Records based on species occurrences – one per row Core data file types Records based on taxa – one species per row OR Records based on species occurrences – one per row
Archives always have a core data file as text Darwin Core Archive: Anatomy Archives always have a core data file as text C'est le schméExplain the 3 components of an archive: Core data file + optionnal extensions files Metafile Descriptor file
Core contains a “core ID” column, unique for every record in the file Darwin Core Archive: Anatomy Core contains a “core ID” column, unique for every record in the file
Columns are matched to Darwin Core terms Darwin Core Archive: Anatomy Columns are matched to Darwin Core terms
“Wingspan” is not a Darwin Core term Darwin Core Archive: Anatomy Columns that do not match to a Darwin Core term may be included, but are ignored “Wingspan” is not a Darwin Core term
1) Rename columns in text file Darwin Core Archive: Anatomy Two ways to match columns to Darwin Core terms 1) Rename columns in text file
2) Match columns to terms in a separate meta.xml file Darwin Core Archive: Anatomy Two ways to match columns to Darwin Core terms 2) Match columns to terms in a separate meta.xml file
Darwin Core Archive: Anatomy meta.xml matches the columns in the core data file (species.txt) More on how to make the meta.xml file later…
Archives can include extension files Darwin Core Archive: Anatomy Archives can include extension files Species.txt Extensions link to the core through the core ID Common_names.txt Extensions allow multiple records to be linked to a core record.
GBIF hosts extension definitions http://rs.gbif.org/extension/
Multiple extensions files can be linked to the core Darwin Core Archive: Anatomy Multiple extensions files can be linked to the core
Darwin Core Archive: Anatomy All files are stored in a single folder
Darwin Core Archive: Anatomy The folder is zipped. Data files Column matching file Data set documentation This is a Darwin Core Archive
Darwin Core Archive: Publishing http://www.organisation.org /my_data.zip Archives on a web server can be accessed by a URL. Share this URL to “publish” your data!
Darwin Core Archive: Publishing Options
GBIF Spreadsheet Templates
Integrated Publishing Toolkit
Data Hosting Centers
Darwin Core Mapping Assistant Metafile http://tools.gbif.org/dwca-assistant/
Darwin Core Mapping Assistant
Darwin Core Archive: Publishing Options GBIF Darwin Core Archive Spreadsheet Templates: data in a spreadsheet already simple archive authoring IPT: creating/managing archives for multiple data sets managing archives for multiple organisations metadata as GBIF Metadata Profile of EML Make Your Own: automating archive generation customisation Hosting center: economy of scale Infrastructure and support Combinations… Explain the 3 components of an archive: Core data file + optionnal extensions files Metafile Descriptor file
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape of biodiversity data publishing Presenter (email) Role Organization Buenos Aires (Argentina) 28 September 2011