Biodiversity Informatics

Slides:

Advertisements

Similar presentations

Australian Faunal Directory (AFD) and Australian Plant Census (APC): Content, Architecture and Services Documenting and delivering nomenclature and taxonomy.

Advertisements

OpenUp! General Overview. OpenUp! – What it aims at: Because access to multimedia resources from natural history collections in Europe.

GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.

To share data, all providers must agree upon a data standard.

Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.

IFLA Namespaces Gordon Dunsire Chair, IFLA Namespaces Technical Group Session 204 — IFLA library standards and the IFLA Committee on Standards – how can.

Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.

CORDRA Philip V.W. Dodds March The “Problem Space” The SCORM framework specifies how to develop and deploy content objects that can be shared and.

 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.

DataCite: Making Data Citable Jan Brase (DataCite/TIB Hannover) Brigitte Hausstein (GESIS) Wolfgang Zenk-Möltgen (GESIS)

SERNEC Image/Metadata Database Goals and Components Steve Baskauf

IDs in and out of the database Entomological Collections Network (ECN) 2012 November 10 – 11, Knoxville, TN Debbie Paul, Greg Riccardi.

IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.

Globally Unique Identifiers Workshop (GUID-1) International Working Group on Taxonomic Databases - TDWG Global Biodiversity Information Facility - GBIF.

Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.

Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.

Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.

TDWG 2006, Missouri, U.S.A. Exchange of germplasm datasets with PyWrapper/BioCASE October 16, 2006 TDWG annual Meeting 2006 Missouri Botanical Garden St.

Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.

Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &

TDWG Life Sciences Identifiers Applicability Statement Ben Richardson Review Manager, LSID Applicability Statement Western Australian Herbarium Department.

An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.

Beispielbild BioCASe, ABCD and its extensions Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories.

Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008.

4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group Should.

Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.

P088; Presented in Canberra, 27 th March, 2008 GR000: Presented in Fremantle on 20 th October, 2008 GAIA RESOURCES Experiences in mobilizing biodiversity.

HISCOM An Australian Virtual Herbarium Jim Croft Australian National Herbarium.

11 Researcher practice in data management Margaret Henty.

4 way comparison of Data Citation Principles: Amsterdam Manifesto, CoData, Data Cite, Digital Curation Center FORCE11 Data Citation Synthesis Group.

Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.

GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.

IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.

Course on persistent identifiers, Madrid (Spain) Information architecture and the benefits of persistent identifiers Greg Riccardi Director Institute for.

Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.

GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.

Jens Klump | OCE Science Leader Earth Science Informatics

Research and Education Space

Introduction to Persistent Identifiers

Introduction to Persistent Identifiers

Jessie Kennedy Rob Gales, Robert Kukla

Integration of the UC Davis Biological Collections Data via a Web Portal [A Pilot Project] Project Goals To develop a Web Portal allowing better & more.

Flanders Marine Institute (VLIZ)

Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,

CREATIVE COMMONS FOR CULTURAL HERITAGE

Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.

Active Data Management in Space 20m DG

Jenn Riley Metadata Librarian Digital Library Program

The Natural Science Collections Facility

Maggie, Carlo, Peter, Rebecca (GEDE discussions)

Data Management: The Data Repatriation Re-integration Step or …

OBIS Data flows Dave Watts 8 March 2017 Data Centre, O&A.

Biodiversity Informatics 101

Attributes and Values Describing Entities.

2. An overview of SDMX (What is SDMX? Part I)

Research Data Management

Module P4 Identify Data Products and Views So Their Requirements and Attributes Can Be Controlled Learning Objectives: Understand the value of data. Understand.

Introduction to the MIABIS SOP Working Group

Bird of Feather Session

Attributes and Values Describing Entities.

Atlas of Living Australia Sharing Biodiversity Knowledge

Jenn Riley Metadata Librarian Digital Library Program

Data + Research Elements What Publishers Can Do (and Are Doing) to Facilitate Data Integration and Attribution David Parsons – Lawrence, KS, 13th February.

HOW (and why?) DO WE DESCRIBE ?

Fundamental Science Practices (FSP) of the U.S. Geological Survey

Presentation transcript:

Biodiversity Informatics Metadata standards and GUIDS in Biological Collections Anne Fuchs (ANBG/CANBR) and Margaret Cawsey (ANWC) May 2017 National Research Collections Australia

Biological Collections Manage Preserved specimens and/or parts thereof Living organisms (plants, seeds, algae, bacteria) Genetic samples Sounds/images/videos Biological Collections have been managing their collection objects for not as many millennia as libraries, but for a long time. These are examples of the types of ‘collection objects’ in our collections. Some might be dried e.g. plant specimens and animal skins and bones, others preserved in other ways, e.g. in ethanol or frozen, and of course how each is managed physically depends on the preservation method involved. http://www.cpbr.gov.au/cpbr/herbarium/specimen/index.html Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Accessioning in Biological Collections Long history of cataloguing collections, which included registration of unique institution codes CANB : Australian National Herbarium ANIC : Australian National Insect Collection In the Herbarium community – Index Herbariorum “Each institution is assigned a permanent unique identifier in the form of a one to eight letter code, a practice that dates from the founding of IH in 1935.” (1) More recently the Global Registry of Biodiversity Repositories Institutions allocate Accession/Catalogue Numbers internally For delivery to national/international datasets combined as institutionCode:collectionCode:catalogNumber: ANIC:Hymenoptera:31-035454-384 or CANB:ANH:CANB 621770.1 As part of the cataloguing process specimens are allocated a unique accession or catalogue number It was recognised that institutions (or large collections) needed to also be identifiable, so (CLICK) systems where put in place to ‘register’ these like the Index Herbariorum for herbaria and the Global Registry of Biodiversity Repositories (CLICK). In addition, institutions or collections allocate their own catalogue number When these are combined they uniquely identify a specimen. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Metadata in Biological Collections Extensive metadata is collected and held Who made the collection When the collection was made Where collected locality, co-ordinates Type of material collected (Bird, Egg, Leaves, Fruit etc) Taxon collected Possibly additional data such as habitat Institutions often hold this metadata in digital repositories (CLICK) In the process of collecting specimens additional metadata is collected (CLICK) The trend towards managing metadata in specially designed collection management systems has gained momentum (since 1980’s). (CLICK) As part of the storage of collection items labels are produced from this metadata These collection management hold all of the information which lets us curate and track our specimens inside collections and around the world, assisting in their use and re-use through specimen loans, tissue grants etc. (As an aside, CSIRO NRCA will be migrating from our national collections to an enterprise CMS solution in the near future. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Data sharing and discoverability Exchange of metadata between institutions for duplicate and loaned specimens Supply of data to national and international aggregators Australian Virtual Herbarium (AVH) Online Collections of Australian Museums (OZCAM) Atlas of Living Australia (ALA) Global Biodiversity Information Facility (GBIF) Therefore, need standards (CLICK) Even prior to the aggregation tools we see today, institutions had established practises for the depositing of material as “backup” in other institutions and loaning of specimens for taxonomic work. (CLICK) More recently this metadata has underpinned the data in national and international aggregators to meet the needs for research, land management, policy, education etc (CLICK) In order to deliver both institution to institution exchange of data and to aggregated systems we need to talk a common language, therefore standards Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Introducing TDWG or the Taxonomic Working Group Data sharing and discoverability: Standards Introducing TDWG or the Taxonomic Working Group “The TDWG community's priority is the development of standards for the exchange of biological/biodiversity data.” Established 1985 The natural history collections community has been working with data standards for a long time – The Biodiversity Information Standards Working Group – still affectionately known as TDWG (Taxonomic Database Working Group) was established 1985 and is If you visit their website the work and standards they address are listed Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Data sharing and discoverability: Standards Darwin Core (DwC) Access to Biological Collection Databases (ABCD) Extensions e.g. Audubon Core (multimedia) Global Genome Biodiversity Network Data Standard Herbarium Interchange Specimen Protocol for Interchange of Data (HISPID) The standards which Australian Institutions work with Darwin Core – TWDG std, is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. ABCD – TDWG, comprehensive and commented schema for biological collection records (ABCD Schema). XML based Aududon Core - set of vocabularies designed to represent metadata for biodiversity multimedia resources and collections GGBN - The GGBN Data Standard is based on ABCDDNA, that has been developed within the. The current GGBN Data Standard is a result of further reviews of ABCDDNA done with the GGBN community. The GGBN Data Standard is intended to be used with ABCD or Darwin Core and is not a stand-alone solution! HISPID, example of a domain specific standard which started specifically for the herbarium community and has evolved through various iterations to the current standard which follows and maps to the international standards. Additional terms are minted for attributes which are not covered, vocabularies are provided where applicable, terms are described in a domain friendly manner. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

FCIG Faunal Collections Informatics Group HISCOM Herbarium Information Systems Committee In Australia, the natural history collections in Australia have peak councils who represent the interests of their members. They are served by informatics committees who provide technical advise and actualise decisions. The Director of the SA Herbarium is the current chair of CHAH, Anne is a member of HISCOM. The director of the ANWC is the current chair of CHAFC and Margaret is a member of the Faunal Collections Informatics Group, or FCIG. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

2003 first iteration of OZCAM 2007 - the ALA Discoverability – a brief history 1999 first iteration of AVH 2001 – GBIF 2003 first iteration of OZCAM 2007 - the ALA Discoverability of taxon occurrence information has been on the agenda for a long time in the biodiversity data space, with the development of data aggregators 1999 saw the first version of the Australian Virtual Herbarium 2001 GBIF was initiated in Europe to globally share taxon occurrence data in 2003 the first iteration of OZCAM made its online debut In 2007 the Atlas of Living Australia was initiated and came online in 2010 – it now powers the engine for OZCAM and the AVH Which brings us to the vexed history of GUIDs in the biodiversity informatics space. The Atlas specified the required data standard as the Darwin Core. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Guids – a vexed history But ... 2006 – TDWG GUID Task Group The LSID: Life Science Identifier URN technology Each collection can mint its own But ... Don’t resolve Not used ALA mints its own record identifiers As does GBIF so why bother? urn:lsid:ozcam.taxonomy.org.au:ANWC:Birds:B56401 In 2006 the TDWG GUID task group recommended the use of Life Science identifiers to uniquely identify objects in the biodiversity domain (brandish document) – here’s the TDWG – available from github At the time there were relatively simple requirements They had to be persistent and globally unique so Like the IGSN, their technology is that of a URN (Uniform Resource Name) which makes them harder to implement than simple URLs because you need resolution technologies However, once you’ve chosen the name space and the accepted format, each collection can mint it’s own LSIDs for example (CLICK) As you can see here, LSIDs were actually implemented by members of CHAFC – including the Wildlife collection some LSID providers and services exist and the GUID technology was tested However, the tests to date have resulted in the identification of a variety of issues Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

14 recommendations Yes, a GUID is a good idea GUID technologies TDWG GUID applicability statement 2010 14 recommendations Yes, a GUID is a good idea GUID technologies *HTTP URI (used as a basis for some of the following options) URN — LSID *Life Science Identifier DOI — Digital Object Identifier PURL — Permanent URL UUID — Universally Unique Identifier Handle System In 2010 the TDWG Globally Unique Identifiers Task group produced a GUID applicability statement (- wave document 2 about – also available from github) from which I’ve derived much of the info in this talk. This document makes 14 recommendations. One of them is that, yes, a GUID is a good idea. Among them is a list of potential technologies, of which the LSID is one, although, because it is URN technology, LSIDs cannot function in a linked data environment without being represented as http URIs for example something like this It should be noted that document (wave it again) presents reservations about ALL of them. http://bioguid.info/urn:lsid:ozcam.taxonomy.org.au:ANWC:Birds:B56401 Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Apply to objects e.g. scientific names datasets collections specimens TDWG GUID applicability statement 2010 Apply to objects e.g. scientific names datasets collections specimens genetic samples images, videos, sound recordings etc. geological samples? the applicability statement is not prescriptive on which objects GUIDs may apply to, but is prescriptive on HOW they should NOT be applied e.g. >1 of the same GUID technology applied to the same sample In the past LSIDs have applied to all of these objects and then some. It is not inconceivable that they might be applied to geological objects, much as is being suggested that IGSNs could apply to biological objects Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Current situation for the natural history collections... GBIF DOIs for downloads dataset persistence? dataset content changes possible? Conclusions: No consensus reached ... It is unlikely that any particular GUID technology will be successfully implemented until TDWG achieves consensus Recently we’ve heard that GBIF has decided to apply DOIs to data downloads. However, we don’t know yet how persistent these datasets will be; the implication is that the datasets may not be kept for more than 12 months, so the DOI’s won’t resolve beyond that time. Also, there’s the implication that the content of the dataset itself might change i.e. if the download is for all records of a particular species at time X and more records for that species arrive in the GBIF database at time X+n, then the DOI will, at time X+n have a different complement of records than it did at time X. This renders arguable the usefulness of the DOI. As a conclusion: There is as yet no consensus within the TDWG community on which GUID technology is acceptable. It is unlikely that any will be successfully implemented until a consensus is reached in the TDWG community ... But we all recognise that whatever technology is adopted, it will have to be compatible with the use of linked data – which means that the URN technology is not likely to be the one which hits the jackpot. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

Future - Linked Data and the Semantic Web? Principally, the Semantic Web is a Web 3.0 web technology - a way of linking data between systems or entities that allows for rich, self-describing interrelations of data available across the globe on the web. (2) W3C Best Practices for Publishing Linked Data (3) Data is explicitly connected to a license URI design (HTTP based, machine readable, unchangeable, opaque) URI’s are persistent Vocabulary Based on existing standards where ever possible Machine accessible (RFD/SPARQL, restfulAPI) Linked data is a tool of the semantic web in which data and its relationships are machine readable thus opening up possibilities for an environment where applications can query that data, draw inferences using vocabularies (https://www.w3.org/standards/semanticweb/data#summary) Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

National Species List – Linked data URI for the name Acacia dealbata Link. https://id.biodiversity.org.au/name/apni/61294 Content negotiation resolves via web services to HTLM, JSON, XML or CSV https://biodiversity.org.au/nsl/services/name/apni/61294.xml Used in exports and data delivery as the identifier ICNAFP APNI scientific http://id.biodiversity.org.au/name/apni/61294 Acacia dealbata Link Acacia dealbata Link ….. URI’s also used for Taxon concepts (instances), Publications/References, Authors, Taxonomic classifications, etc SPARQL service Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

List of Resources http://rs.tdwg.org/abcd/2.06 Resource Link Global Registry of Biodiversity Repositories http://grbio.org/ Index Herbariorum (1) http://sciweb.nybg.org/science2/IndexHerbariorum.asp Australian Virtual Herbarium http://avh.chah.org.au/ OZCAM http://ozcam.org.au/ Global Biodiversity Information Facility http://www.gbif.org/ Taxonomic Data Working Group http://www.tdwg.org/ Darwin Core http://rs.tdwg.org/dwc/terms/index.htm ABCD http://rs.tdwg.org/abcd/2.06 HISPID https://github.com/hiscom/hispid Audubon Core https://terms.tdwg.org/wiki/Audubon_Core Global Genome Biodiversity Network Data Standard. https://terms.tdwg.org/wiki/GGBN_Data_Standard Atlas of Living Australia http://ala.org.au LSID https://github.com/tdwg/guid-as/tree/master/lsid LSID applicability https://github.com/tdwg/guid-as/tree/master/guid Linked data tools (2) http://www.linkeddatatools.com/semantic-web-basics Data.gov.au statement https://github.com/AGLDWG/TR/blob/master/guidelines/URI-Guidelines-for-publishing-linked-datasets-on-data.gov.au-latest.md W3C Best Practices for Publishing Linked Data (3) https://www.w3.org/TR/ld-bp/ Biodiversity IBest Practices for Publishing Linked Data nformatics in Australian National History Collections; Fuchs & Cawsey

Thank you Presenter details Anne Fuchs (Centre for Australian National Biodiversity Research) Margaret Cawsey (Australian National Wildlife Collection) National Facilities and Collections, National Research Collections Australia