Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How to publish (data set) meta data
Data publishing process
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
”Data Intensive Science” ”Fourth Science Paradigm” e-Infrastructure Reflection Group (European Strategy Forum on Research Infrastructures). Report on Data Management, November ”Digital Data Deluge” The Fourth Paradigm: Data-Intensive Scientific Discovery high quality metadata for long-term curation and use of data sets Key requirement: Why metadata?
William K. Michener, Meta-information concepts for ecological data management, Ecological Informatics, Volume 1, Issue 1, January 2006, Pages 3-7, ISSN , DOI: /j.ecoinf ( Information about data sets deteriorates over time!
Why metadata? Metadata supports: - Discovery - Interpretation/Evaluation - Provenance - Quality - Fitness-for-use - Analytical re-use
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
Metadata Standards Ecological Metadata Language (EML) v Dublin Core Directory Interchange Format (DIF) ISO 19115/19139 Geographic Metadata ISO 19115: ISO 19139:
Metadata Standards Natural Collections Descriptions (NCD) Federal Geographic Data Committee (FGDC) Biological Profile* *An extension of the FGDC CSDGM (Content Standard for Digital Geospatial Metadata) Multimedia Resources Metadata Schema
ISO 19115/19139 North American Profile of ISO projects/NAP-Metadata/napMetadataProfileV101.pdf/view Several Resources available for crosswalk; transform; view EML to FGDC Biological Profile # FGDC CSDGM to ISO Transform # FGDC CSDGM to ISO Crosswalk # ISO XML to HTML View: # FGDC BIO to ISO Transform # FGDC BIO to ISO Crosswalk FGDC CSDGM ISO EML to ISO Open source INSPIRE- compliant MD editor (multilingual functionality)
Metadata and Languages A Multilingual Metadata Catalog for the ILTER: Issues and Approaches. Vanderbilt, K.L., et al., Ecological Informatics, Volume 5, Issue 3, May 2010, Pages , doi: /j.ecoinf Adopt a lingua franca, e.g., English -data publishers provide discovery level metadata in English; -full metadata in local language. Just use local language with keywords from multilingual thesauri, e.g., GEMET, AGROVOC -GEMET, the GEneral Multilingual Environmental Thesaurus; 27 languages AGROVOC; agriculture, forestry, fisheries, food and related domains; 20 languages. Long term solution: multilingual ontologies Issues? -additional burden; tools, metadata standards
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
GBIF EML Profile - Requirements gathering -GBIF Metadata Task Group -EML; ISO 19115; NCD; INSPIRE Directive GBIF EML schema - GBIF community site: metadata network - GBIF profile documentation
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
Preparing metadata - Metadata editors e.g., IPT; Spreadsheet template; Morpho; EUOSME - Scripting -Output directly from existing metadata database -Transform from another metadata specification to EML - Editing XML directly -Validation essential
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
Where does the metadata go?
Sources of Metadata GBIF Data Cache - Registered IPT installations - National/regional/organisation level catalogues - Thematic catalogues, e.g., OBIS GBIF approach: -no imposed metadata standard or preferred catalogue implementation for participants; -avoidance of lossy conversions in submitting metadata GBIF Participants External networks e.g., Knowledge Network for Biocomplexity (KNB)
GBIF metadata architecture GBIF Catalogue GBIF Registry EuroGEOSS Catalogue e.g., GBIF Node IPT Instance Catalogue e.g.,KNB GBIF Data Cache OAI-PMH Direct payload GBIF metadata catalogue specification:
OAI-PMH Open Archives Initiative Protocol for Metadata Harvesting Providing a low-barrier mechanism for interoperability across distributed metadata repositories Data providers expose metadata Service providers consume metadata through a client application known as a harvester that issues OAI-PMH service requests over HTTP. GBIF: role as harvester and provider
Outline - Why metadata? - The GBIF EML profile - Metadata standards - Preparation of metadata - Where does the metadata go? - Preparing metadata (examples)
Spreadsheet processor 1.Download the spreadsheet from the site 1_v1.xls 1_v1.xls 2.Complete the spreadsheet 3.Transform it as a GBIF metadata profile file by using the spreadsheet processor Note: the processor doesn’t publish a file to GBIF, it provides a publication-ready file.
IPT metadata editor 1.Create a new resource in the IPT 1.Complete the metadata -Dataset (Resource) -Project -People and Organisations -Keyword Set (General Keywords) -Coverage -Taxonomic Coverage -Geographic Coverage -Temporal Coverage -Intellectual Property Rights -Methods -Additional Metadata and Natural Collections Descriptions Data 2.Publish it -Metadata for published and unpublished data sets -Output as part of DwC-A zip file (EML.xml)
Presenter ( ) Role Organization Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How to publish (data set) meta data