Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek

Similar presentations


Presentation on theme: "Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek"— Presentation transcript:

1 Laura Russell (larussell@vertnet.org) VertNet Meherzad Romer (mromer@natureserve.ca)mromer@natureserve.ca NatureServe Canada John Wieczorek (tuco@berkeley.edu) Museum of Vertebrate Zoology, UC Berkeley Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to the new ways of data publishing

2 Data Publishing Options

3 Terminology Data Publisher, Provider Data Resource, data set Data resource type (e.g., Metadata, Occurrence, Taxon Data record Data record element, term, field, column, property, attribute, concept (e.g., basisOfRecord, scientificName) Data value Standards, Vocabularies

4 Data Publishers Institutions with multiple organisational units, each with multiple data resources. Institutions, groups, or individuals with multiple data resources. Institutions or individuals with a single data resource.

5 Data Resource Types Primary Biodiversity Data (Specimens & Observations, Ecological Data) Core data type is an Occurrence of a organism Taxonomic Catalogues*, and Annotated Species Checklists. Core data type is a Taxon * To distinguish our efforts from COL – GBIF provides the means not the ends Enriched resource metadata – primarily focused on Occurrence and Taxon data sets.

6 Data Records Taxon resource type Occurrence resource type

7 Data Fields Taxon resource type Occurrence resource type

8 Data Values Taxon resource type Occurrence resource type

9 Data Standards Primary Biodiversity Data Taxonomic Data Darwin Core 172 Terms Ratified in 2009 Text files Extensible Metadata Ecological Metadata Language (EML) Rich data set descriptions GBIF Profile

10 Data Publishing Options

11

12 Suppose TAPIR allows 1000 records per request For a data set of 260 000 records: 260 data exchanges / 500MB total data transfer 2 hours to harvest Only 32MB of the transferred data are "used" for the GBIF network Tapir Example

13 Data Publishing Options

14 For a data set of 260 000 records: 1 data exchanges / 3MB total data transfer seconds to harvest Darwin Core Archive Example Darwin Core Archive

15 For a data set of 260 000 records: 1 data exchanges / 3MB total data transfer seconds to harvest Darwin Core Archive Example Darwin Core Archive Compare to Tapir/DiGIR/BioCASE: 260 data exchanges / 500MB total data transfer 2 hours to harvest

16 Simple format (text files) Efficient storage (compressed) Efficient harvesting (single file) Easy access (no special software required) Extensible (related files in one archive) Darwin Core Archive: Benefits Preferred format for publishing data in the GBIF network

17 Data Discovery

18 GBIF Registry

19 GBIF Data Portal

20 GBIF Online Resource Centre (http://www.gbif.org/orc/)http://www.gbif.org/orc/ Data Publishing Documentation

21 IPT v2 User Manual http://code.google.com/p/gbif- providertoolkit/wiki/IPT2ManualNotes Publishing Using Dropbox http://www.youtube.com/user/gbiffrance References

22 Presenter (email) Role Organization Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to the new ways of data publishing


Download ppt "Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek"

Similar presentations


Ads by Google