Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biodiversity Informatics

Similar presentations


Presentation on theme: "Biodiversity Informatics"— Presentation transcript:

1 Biodiversity Informatics
Metadata standards and GUIDS in Biological Collections Anne Fuchs (ANBG/CANBR) and Margaret Cawsey (ANWC) May 2017 National Research Collections Australia

2 Biological Collections
Manage Preserved specimens and/or parts thereof Living organisms (plants, seeds, algae, bacteria) Genetic samples Sounds/images/videos Biological Collections have been managing their collection objects for not as many millennia as libraries, but for a long time. These are examples of the types of ‘collection objects’ in our collections. Some might be dried e.g. plant specimens and animal skins and bones, others preserved in other ways, e.g. in ethanol or frozen, and of course how each is managed physically depends on the preservation method involved. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

3 Accessioning in Biological Collections
Long history of cataloguing collections, which included registration of unique institution codes CANB : Australian National Herbarium ANIC : Australian National Insect Collection In the Herbarium community – Index Herbariorum “Each institution is assigned a permanent unique identifier in the form of a one to eight letter code, a practice that dates from the founding of IH in ” (1) More recently the Global Registry of Biodiversity Repositories Institutions allocate Accession/Catalogue Numbers internally For delivery to national/international datasets combined as institutionCode:collectionCode:catalogNumber: ANIC:Hymenoptera: or CANB:ANH:CANB As part of the cataloguing process specimens are allocated a unique accession or catalogue number It was recognised that institutions (or large collections) needed to also be identifiable, so (CLICK) systems where put in place to ‘register’ these like the Index Herbariorum for herbaria and the Global Registry of Biodiversity Repositories (CLICK). In addition, institutions or collections allocate their own catalogue number When these are combined they uniquely identify a specimen. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

4 Metadata in Biological Collections
Extensive metadata is collected and held Who made the collection When the collection was made Where collected locality, co-ordinates Type of material collected (Bird, Egg, Leaves, Fruit etc) Taxon collected Possibly additional data such as habitat Institutions often hold this metadata in digital repositories (CLICK) In the process of collecting specimens additional metadata is collected (CLICK) The trend towards managing metadata in specially designed collection management systems has gained momentum (since 1980’s). (CLICK) As part of the storage of collection items labels are produced from this metadata These collection management hold all of the information which lets us curate and track our specimens inside collections and around the world, assisting in their use and re-use through specimen loans, tissue grants etc. (As an aside, CSIRO NRCA will be migrating from our national collections to an enterprise CMS solution in the near future. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

5 Data sharing and discoverability
Exchange of metadata between institutions for duplicate and loaned specimens Supply of data to national and international aggregators Australian Virtual Herbarium (AVH) Online Collections of Australian Museums (OZCAM) Atlas of Living Australia (ALA) Global Biodiversity Information Facility (GBIF) Therefore, need standards (CLICK) Even prior to the aggregation tools we see today, institutions had established practises for the depositing of material as “backup” in other institutions and loaning of specimens for taxonomic work. (CLICK) More recently this metadata has underpinned the data in national and international aggregators to meet the needs for research, land management, policy, education etc (CLICK) In order to deliver both institution to institution exchange of data and to aggregated systems we need to talk a common language, therefore standards Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

6 Introducing TDWG or the Taxonomic Working Group
Data sharing and discoverability: Standards Introducing TDWG or the Taxonomic Working Group “The TDWG community's priority is the development of standards for the exchange of biological/biodiversity data.” Established 1985 The natural history collections community has been working with data standards for a long time – The Biodiversity Information Standards Working Group – still affectionately known as TDWG (Taxonomic Database Working Group) was established 1985 and is If you visit their website the work and standards they address are listed Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

7 Data sharing and discoverability: Standards
Darwin Core (DwC) Access to Biological Collection Databases (ABCD) Extensions e.g. Audubon Core (multimedia) Global Genome Biodiversity Network Data Standard Herbarium Interchange Specimen Protocol for Interchange of Data (HISPID) The standards which Australian Institutions work with Darwin Core – TWDG std, is body of standards. It includes a glossary of terms (in other contexts these might be called properties, elements, fields, columns, attributes, or concepts) intended to facilitate the sharing of information about biological diversity by providing reference definitions, examples, and commentaries. ABCD – TDWG, comprehensive and commented schema for biological collection records (ABCD Schema). XML based Aududon Core - set of vocabularies designed to represent metadata for biodiversity multimedia resources and collections GGBN - The GGBN Data Standard is based on ABCDDNA, that has been developed within the. The current GGBN Data Standard is a result of further reviews of ABCDDNA done with the GGBN community. The GGBN Data Standard is intended to be used with ABCD or Darwin Core and is not a stand-alone solution! HISPID, example of a domain specific standard which started specifically for the herbarium community and has evolved through various iterations to the current standard which follows and maps to the international standards. Additional terms are minted for attributes which are not covered, vocabularies are provided where applicable, terms are described in a domain friendly manner. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

8 FCIG Faunal Collections Informatics Group HISCOM
Herbarium Information Systems Committee In Australia, the natural history collections in Australia have peak councils who represent the interests of their members. They are served by informatics committees who provide technical advise and actualise decisions. The Director of the SA Herbarium is the current chair of CHAH, Anne is a member of HISCOM. The director of the ANWC is the current chair of CHAFC and Margaret is a member of the Faunal Collections Informatics Group, or FCIG. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

9 2003 first iteration of OZCAM 2007 - the ALA
Discoverability – a brief history 1999 first iteration of AVH 2001 – GBIF 2003 first iteration of OZCAM the ALA Discoverability of taxon occurrence information has been on the agenda for a long time in the biodiversity data space, with the development of data aggregators 1999 saw the first version of the Australian Virtual Herbarium 2001 GBIF was initiated in Europe to globally share taxon occurrence data in 2003 the first iteration of OZCAM made its online debut In 2007 the Atlas of Living Australia was initiated and came online in 2010 – it now powers the engine for OZCAM and the AVH Which brings us to the vexed history of GUIDs in the biodiversity informatics space. The Atlas specified the required data standard as the Darwin Core. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

10 Guids – a vexed history But ... 2006 – TDWG GUID Task Group
The LSID: Life Science Identifier URN technology Each collection can mint its own But ... Don’t resolve Not used ALA mints its own record identifiers As does GBIF so why bother? urn:lsid:ozcam.taxonomy.org.au:ANWC:Birds:B56401 In 2006 the TDWG GUID task group recommended the use of Life Science identifiers to uniquely identify objects in the biodiversity domain (brandish document) – here’s the TDWG – available from github At the time there were relatively simple requirements They had to be persistent and globally unique so Like the IGSN, their technology is that of a URN (Uniform Resource Name) which makes them harder to implement than simple URLs because you need resolution technologies However, once you’ve chosen the name space and the accepted format, each collection can mint it’s own LSIDs for example (CLICK) As you can see here, LSIDs were actually implemented by members of CHAFC – including the Wildlife collection some LSID providers and services exist and the GUID technology was tested However, the tests to date have resulted in the identification of a variety of issues Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

11 14 recommendations Yes, a GUID is a good idea GUID technologies
TDWG GUID applicability statement 2010 14 recommendations Yes, a GUID is a good idea GUID technologies *HTTP URI (used as a basis for some of the following options) URN — LSID *Life Science Identifier DOI — Digital Object Identifier PURL — Permanent URL UUID — Universally Unique Identifier Handle System In 2010 the TDWG Globally Unique Identifiers Task group produced a GUID applicability statement (- wave document 2 about – also available from github) from which I’ve derived much of the info in this talk. This document makes 14 recommendations. One of them is that, yes, a GUID is a good idea. Among them is a list of potential technologies, of which the LSID is one, although, because it is URN technology, LSIDs cannot function in a linked data environment without being represented as http URIs for example something like this It should be noted that document (wave it again) presents reservations about ALL of them. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

12 Apply to objects e.g. scientific names datasets collections specimens
TDWG GUID applicability statement 2010 Apply to objects e.g. scientific names datasets collections specimens genetic samples images, videos, sound recordings etc. geological samples? the applicability statement is not prescriptive on which objects GUIDs may apply to, but is prescriptive on HOW they should NOT be applied e.g. >1 of the same GUID technology applied to the same sample In the past LSIDs have applied to all of these objects and then some. It is not inconceivable that they might be applied to geological objects, much as is being suggested that IGSNs could apply to biological objects Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

13 Current situation for the natural history collections...
GBIF DOIs for downloads dataset persistence? dataset content changes possible? Conclusions: No consensus reached ... It is unlikely that any particular GUID technology will be successfully implemented until TDWG achieves consensus Recently we’ve heard that GBIF has decided to apply DOIs to data downloads. However, we don’t know yet how persistent these datasets will be; the implication is that the datasets may not be kept for more than 12 months, so the DOI’s won’t resolve beyond that time. Also, there’s the implication that the content of the dataset itself might change i.e. if the download is for all records of a particular species at time X and more records for that species arrive in the GBIF database at time X+n, then the DOI will, at time X+n have a different complement of records than it did at time X. This renders arguable the usefulness of the DOI. As a conclusion: There is as yet no consensus within the TDWG community on which GUID technology is acceptable. It is unlikely that any will be successfully implemented until a consensus is reached in the TDWG community ... But we all recognise that whatever technology is adopted, it will have to be compatible with the use of linked data – which means that the URN technology is not likely to be the one which hits the jackpot. Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

14 Future - Linked Data and the Semantic Web?
Principally, the Semantic Web is a Web 3.0 web technology - a way of linking data between systems or entities that allows for rich, self-describing interrelations of data available across the globe on the web. (2) W3C Best Practices for Publishing Linked Data (3) Data is explicitly connected to a license URI design (HTTP based, machine readable, unchangeable, opaque) URI’s are persistent Vocabulary Based on existing standards where ever possible Machine accessible (RFD/SPARQL, restfulAPI) Linked data is a tool of the semantic web in which data and its relationships are machine readable thus opening up possibilities for an environment where applications can query that data, draw inferences using vocabularies ( Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

15 National Species List – Linked data
URI for the name Acacia dealbata Link. Content negotiation resolves via web services to HTLM, JSON, XML or CSV Used in exports and data delivery as the identifier ICNAFP APNI scientific Acacia dealbata Link Acacia dealbata Link ….. URI’s also used for Taxon concepts (instances), Publications/References, Authors, Taxonomic classifications, etc SPARQL service Biodiversity Informatics in Australian National History Collections; Fuchs & Cawsey

16 List of Resources http://rs.tdwg.org/abcd/2.06 Resource Link
Global Registry of Biodiversity Repositories Index Herbariorum (1) Australian Virtual Herbarium OZCAM Global Biodiversity Information Facility Taxonomic Data Working Group Darwin Core ABCD HISPID Audubon Core Global Genome Biodiversity Network Data Standard. Atlas of Living Australia LSID LSID applicability Linked data tools (2) Data.gov.au statement W3C Best Practices for Publishing Linked Data (3) Biodiversity IBest Practices for Publishing Linked Data nformatics in Australian National History Collections; Fuchs & Cawsey

17 Thank you Presenter details
Anne Fuchs (Centre for Australian National Biodiversity Research) Margaret Cawsey (Australian National Wildlife Collection) National Facilities and Collections, National Research Collections Australia


Download ppt "Biodiversity Informatics"

Similar presentations


Ads by Google