GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing biodiversity information Thanks: Peter Desmet, Canadensys- (graphics)
Primary Biodiversity data One Record equals A single occurrence of a taxon Collected or observed somewhere in the world (WHERE) At a specific Time (WHEN) Identified by a Person (WHO) …and residing in a particular place (VOUCHER)
TDWG provided Data Formats Darwin Core Access to Biological Collections Data (ABCD) PhysicalObject T12:43:31 Museum of Vertebrate Zoology Creative Commons License MVZ Mammals urn:catalog:MVZ:Mammals:14523 PreservedSpecimen Richard Sage 2000 Ctenomys dorbignyi Ctenomys
TDWG provided Protocols Distributed Generic Information Retrieval TDWG Access Protocol for Information Retrieval Biological Collections Access Service For Europe DiGIR BIOCASE TAPIR Send requests (in XML) To a URL Get a response (in XML)
“Wrapper” Software PyWrapper (Python) TAPIR Link (PHP) DiGIR (PHP) Your biodiversity database Insect Collection Install one of these ‘wrappers’ ABCD Format here Bird Observations Herbarium Data now accessible via DarwinCore here
The promise of federation Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? GBIF Data Portal I will ask! I do! Nope! GBIF Data Portal as a Gateway
Missing the point of federation Insect Collection Insect Collection Copy Curator “Live” Database inside managesupdates “Public” Database outside Any data records from Thailand? GBIF Data Portal If you are going to make a copy, send it to me
The failure of federation Insect CollectionHerbarium Bird Observations Herbarium Hello? Server Not Available GBIF Data Portal Hi!
The rise of Indexing Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me an index of your data once per month GBIF Data Portal (now with Data!) GBIF Data Portal as a Data Index
The wrong tools for the job Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me an index of your data once per month Here is page one. If I go offline, start again Not too fast! You ask the same questions every time GBIF Data Portal (now with Data!)
A Refined Approach Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? I’ll take a copy of the file whenever it’s updated GBIF Data Portal (now with Data!) This is easy URL
Darwin Core Archives A text-based solution to publishing biodiversity data
Darwin Core Ratified in 2009 Significant additions/refinements Set of terms – Simple Darwin Core (Subset) Express as Text – x.htm x.htm
Two Core Content Types (currently) Basis of Record OCCURRENCE of a taxon Basis of Record TAXON
Core components – single file Taxon Basis of Record Occurrence Taxon Simple to Export Simple to Manage Comma-Separated Values Text File
Schema Repository Operational Now: more coming
Extending Darwin Core Taxon Types and Specimens Bibliography one-to-many Extensions defined via simple schema Darwin Core or other terms Linked to controlled vocabularies One taxa – many extension records Simple to Export Simple to Manage Comma-Separated Values Text File
Extensions Example taxonIDClassscientificName 1001MammaliaPanthera leo 1002AmphibiaRana pipiens 1003AvesFrancolinus after (Müller) taxonIDvernacularNamelanguage 1003Red-necked Francolineng 1003La Perdrix d’ Afriquefre 1003 アカノドシャコ jpn Species.csv Common_names.csv
Metafile describes the set MetafileCore Describes Types and Specimens Bibliography one-to-many Describes
Core + Set of Extensions Metafile Taxa Types and Specimens Bibliography one-to-many Vernacular Names Distribution one-to-many describes “GNA Simple Exchange Format”
Metadata documents resource Metafile Taxa Types and Specimens Bibliography one-to-many Vernacular Names Distribution one-to-many describes GBIF EML profile documents
A Darwin Core Archive
Validator Status: Under Evaluation
Vernacular Names TermDescription vernacularNameThe common name sourceBibliographic reference languageISO language code temporalWhen the name is/was used locationIDLocation by ID localityLocation by description countryCodeCountries where name used SexName related to gender lifeStageName related to lifestage isPluralName is a plural form isPreferredNamePreferred by source in language organismPartName related to part of organism taxonRemarksOther remarks related to common name
References TermDescription IdentifierDOI, ISBN, URI, etc. bibliographicCitationUnparsed full citation titleTitle of book or article creatorAuthor or authors datePublication date sourceIf part of a larger work descriptionAbstract, remarks, notes subjectkeywords languageSource language rightsCopyright info taxonRemarksTaxon-specific annotations typeTaxonomic/nomenclatural categories (new species)
Species Distribution TermDescription locationIDLocation by ID (polygon, locality, etc.) localityLocality description countryCodeISO country list where species occurs lifeStageDistribution pertains to specific life stage occurrenceStatusRare, frequent, absent, etc. threatStatusAs defined by the IUCN Species group establishmentMeansTaxon is native, introduced, etc. eventDateRelevant temporal context for this distribution startDayOfYearSeasonal temporal subcontext within the eventDate endDayOfYearSeasonal temporal subcontext within the eventDate sourcePublication citation, a webpage URL occurrenceRemarksComments or notes about the distribution
Identifiers TermDescription identifierOther known identifier used for the same taxon. (URL, DOI, LSID, etc) formatmime type of resolvable content returned by identifier
Summary Biodiversity Data publishing is – Simplified – basic text files – Extensible – describe what YOU need to share – Supported – Tools exist and in development – Structured – a framework for defining terms, extensions, and vocabularies – Scale-able – Lessons have been learned Next Session – A look at some publishing tools?