GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer August 2010 WWW.GBIF.OR G Informatics Infrastructure and Portal (IIP)

Slides:



Advertisements
Similar presentations
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Advertisements

TERN Eco-informatics – Overview for DCRG Craig Walker Eco-informatics Facility Director.
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Document Manager Product Overview.
To share data, all providers must agree upon a data standard.
Page 1© Crown copyright 2006 Registry technology & case study implementation J. Tandy, D. Thomas - November 2006.
Integrating Biodiversity Data
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Data Sources & Using VIVO Data Visualizing Scholarship VIVO provides network analysis and visualization tools to maximize the benefits afforded by the.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
DYNAMICS CRM AS AN xRM DEVELOPMENT PLATFORM Jim Novak Solution Architect Celedon Partners, LLC
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
Introduction to OBIS-USA Biological Data, Applications, & Relationships March 14, 2011.
Updates from EOSDIS -- as they relate to LANCE Kevin Murphy LANCE UWG, 23rd September
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.
Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, SCAPE Scalable Preservation Environments.
1 The NERC DataGrid DataGrid The NERC DataGrid DataGrid AHM 2003 – 2 Sept, 2003 e-Science Centre Metadata of the NERC DataGrid Kevin O’Neill CCLRC e-Science.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Tools and Resources to Assess and Enhance Fitness-For-Use.
GLOBAL BIODIVERSITY INFORMATION FACILITY TDWG 2009, Montpelier, November 12, 2009 Dag Endresen (NordGen)Samy Gaiji (GBIF) Dag Endresen (NordGen) & Samy.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
GBIF Publishing Platform May Core publishing focus Primary Biodiversity Data (Specimens & Observations, Ecological Data) - Core data type is an.
GLOBAL BIODIVERSITY INFORMATION FACILITY Éamonn Ó Tuama Senior Programme Officer, IDA 21 June Metadata publishing with the IPT.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
TAPIR 1.0 Renato De Giovanni, Markus Döring, Javier de la Torre October 2006.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
1 © 2006 Cisco Systems, Inc. All rights reserved. Session Number Presentation_ID Using the Cisco Technical Support & Documentation Website for IP Routing.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
1 SIMDAT Simdat Project –GTD. Meteo Activity – SIMDAT Meteo Activity OGF June 2008 Barcelona Marta Gutierrez, Baudouin Raoult, Cristina.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan Senior Programme Officer for DIGIT 10 th Meeting of the GBIF Participant Node Managers Committee.
Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
International Planetary Data Alliance Registry Project Update September 16, 2011.
GCI Architecture GEOSS Information System Meeting 20 September 2013, ESA/ESRIN (Frascati, Italy) M.Albani (ESA), D.Nebert (USGS/FGDC), S.Nativi (CNR)
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
Peter Shepherd COUNTER March 2012
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GBIF Governing Board 20 12th Global Nodes Meeting
GLOBAL BIODIVERSITY INFORMATION FACILITY
Presentation transcript:

GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer August G Informatics Infrastructure and Portal (IIP)

Contents Publishing – Current developments in DarwinCore, its extension, and publishing solutions (incl. the Integrated Publishing Toolkit) Integration and discovery – Status of tools for the harvesting, interpretation through controlled vocabularies and plans for the Data Portal Communications – Evaluation of the communication platforms, update on staffing changes and resources

Informatics Overview

Publishing Objectives – Simplify the publication of primary biodiv. data – Support the publication of species-level data – Improve data quality & dataset documentation – Reduce the latency between publishing and discovery through portals – Support the capacity to extend the published content – Expand data publishing configuration options

Publishing What Species Occurrence Data – Primary Biodiversity Data – Observations / Nat. Hist. Collections Species-level Data – Taxonomic Catalogues – Annotated Species Checklists Floral and Faunal lists Thematically-defined lists (Red-List, Invasive, etc.) Dataset (Resource) Metadata

Standards and Protocols Primary Biodiversity data – Darwin Core via DiGIR protocol – ABCD (Access to Biological Collections Data) via BioCase protocol – TAPIR protocol– multiple output formats Taxonomic data – Taxon Concept Schema (TCS) Few tools Low uptake Protocols impact harvesting latency – Schemas are complex and constrain data scope

Darwin Core Ratified in 2009 Significant additions/refinements Set of terms – Expressed via XML Simple Darwin Core (Subset) Express as Text – x.htm x.htm

Darwin Core Archives (DwC-A)

Extensions are text files

DwC-A Case Study: Ireland National Biodiversity Data Centre (Ireland) Ireland joined GBIF in 2009 Selected DwC-A as the easiest integration Incorporated into internal systems – Under 2 weeks of development Automatic registration through RegistryAPI – Collections today 450,000 records harvested

Publish via: Direct Export of DwC-A Requires basic DBA skills and documentation – Darwin Core Terms – Darwin Core Archive Format – Publishing Taxonomic Catalogues & Annotated Checklists via DwC-A – Publishing Occurrence Data via DwC-A Access to list of terms, supported extensions, and schemas – (Schema repository) Status: Documentation release September 2010 via GBIF website

XML Descriptor file <archive xmlns=" xmlns:xsi=" xsi:schemaLocation= metadata=” taxa.txt vernacular.txt

Authoring meta.xml Status: Beta release Sept. 1

Excel Spreadsheet Templates Status: Beta release September (by TDWG)

Excel Spreadsheet Templates

Integrated Publishing Toolkit A supported platform for publication of: – Occurrence-level content – Species checklist content – Dataset metadata Sampling methods Bibliographic citations Temporal coverage DwC-A compatible – Reduced latency between publishing and discovery

Integrated Publishing Toolkit GBIF Review 2009 “…with regards to software and tool development…: – Lack of rigorous technical documentation; open source software must be documented and annotated meticulously in order to take advantage of improvements made by users. – Release of unstable code that is being worked on still by its initiators to a community who are not made aware that it is not finalised.”

Integrated Publishing Toolkit Received good feedback in first year of use Primary request: Simplify and “lighten” up the product RC4 testing initiating now – Enhanced metadata (still EML) – Darwin Core Archive import – Multiple organisation association (for hosting centres) – Bug fixing

Integrated Publishing Toolkit RC4 will not address feedback, but will be a more stable version for new users RC5 development underway to address feedback – Simplification all around (intuitiveness) – Performance improvements – Enriched documentation/examples/webcasts – Server requirements dropping significantly (target of 256MB of memory)

Integrated Publishing Toolkit Following RC5, we will initiate user testing and bug fixing, with no unnecessary functionality changes to move to the target of a stable robust platform by end

Vocabulary server Drupal implementation developed as a proof of concept – IPT uses extensions, vocabularies and schemas for the operation – No well defined workflows yet for community ownership of vocabularies – Discuss at TDWG ’10 – ViBRANT Funding to operationalise

Vocabulary server Draft New Extensions Draft New Vocabularies Publish them Internationalise them

Indexing and Discovery Objectives – Extend the classes of content that can be discovered – Improve the means to discover (flexible indexes) – Better determination of fitness for use Through dataset metadata – Annotation / Feedback brokerage – Accurate citation – Reduce the latency between publishing and discovery through portals

GBIF Registry (GBRDS) Index of the technical access points of the datasets comprising the GBIF network Captures basic metadata about institutions, datasets, nodes and their relationships Enhanced features under development – Improved attribution – Better data provenance declaration – More accurate reporting on the total participation within the GBIF network – Dynamic definitions of thematic networks – API / Web app for automating registration

GBIF Registry (GBRDS)

Registry: GBIF is complex…

Metadata catalogue Collection of XML-based dataset metadata documents (ISO, FGDC, EML, DIV formats) Associated with entities known to the GBIF Registry Common search across content Currently using Metacat – Will review this following prototyping Goal: Enriched documentation, discovery of unpublished datasets Status: Under development, promoting publication of data documents through “small grant awards”

Harvesting and Indexing Toolkit The GBIF harvesting software: – Foundations to harvest DiGIR, BioCASe, TAPIR, DwC-A – Synchronisation with the GBIF Registry – User interface for controlling and scheduling harvesting operations – Metrics for the success of harvest runs – Access to logs for diagnostics – Sychronisation against the GBIF Portal database

Harvesting and Indexing Toolkit

In production use in GBIFS only Some external users are testing – collecting feedback now In light of the GBIF review comments, need to assess: – The need for such a tool – requirements are sought by community – Resources needed to meet expectation by community (versioning, bug fixing, support, manuals) Is it rather a library to aid developers than a product per-say? Remember a homogenous network does not require multi protocol support and can be handled far more simply!

Data Portal – Little functional development in recent months Bug fixing activities only – Continues to grow in content Jan 2010: 196 million Aug 2010: 203 million – 2500 – 3000 visitors per day (plus web service use) – US visitors account approx 22% (2010 traffic) 2 nd is UK visitors at 5%

Data Portal Evolutions Portal will evolve by end 2011 – Improved taxonomic services and content Achieved through the Global Names Architecture – Improved attribution and provenance Achieve by enhancing the Registry – Improved occurrence indexing Scalable solution, richer fields, reduced latency… etc – Improved map visualisations – Custom information feeds Abstracts, repatriation, records modified – Improved dataset metadata Determining fitness for use

Portal evolution CurrentlyRoadmap Limited to 250,000 records for download Access to unlimited volume of export formats 23 Darwin core properties available for search Ability to support multiple indexes (Common, marine, terrestrial Plantae etc) 30 fields available on record detailFull record detail visible Limited ability to determine fitness for use Improved access to metadata where available. Improvements in automated determination of fitness for use (spatial resolution) Poor understanding of the basis of record Improvements in determining point versus grid based content, ex situ versus in situ records etc. Limited spatial searchProvide means to access content through user defined polygons Occurrence content

Portal evolution CurrentlyRoadmap Synthesised taxonomy assembled from all content Multiple taxonomic organisation Assembly methods of synthesised taxonomy poorly documented Rigorous documentation for taxonomic organisation Common name search limitedMany names sources used to enable common name search Limited comparison between taxonomies Services to enable taxonomic comparison (overlap and contradiction) Limited services for external integrationImproved APIs for connecting external systems Few checklist sources included (4-5?) in current data portal 100s of checklists accessible NOW in Dev Version – integrated into new Data Portal Taxonomic content and organisation

Portal evolution CurrentlyRoadmap Metadata limited to contact informationAbility to use rich dataset metadata where available Only datasets with digitised records discoverable Datasets described through metadata discoverable Attribution of data limited to provider and dataset Better attribution of all parties involved in data publication Citation in data exported limited to datasets only Prototype citation services Feedback delivered on a per record basis by Annotation brokerage services Metadata, attribution and feedback

Checklist Bank Slide Status: Dev In-use by ALA

Data Portal Evolution The Portal is more than just a discovery system. The Portal will be a hub that allows: a)Data custodians to – Register the existence of biodiversity data sources – Publish their content and in addition, rich information about the content (e.g. metadata documenting assembly methods) – Subscribe to annotations made against their content – Subscribe to information about the usage of their content – Access services of interest to them (e.g. quality control)

Data Portal Evolution b)Users to – Search content in real time, through various customised search options (e.g. Terrestrial plants, marine mammals, natural history collections, protected areas) – Browse content taxonomically, temporally, geographically etc. – Define and run reports (not real time) to extract a data subset or derive metrics – Subscribe to customised information feeds (e.g. Modified Pinaceae specimens in Australia) – Publish annotations related to record quality, or assertions about that record (e.g. confirmed suitable for 100km modeling) – Build better information systems that utilise services offered by the Portal

Nodes Portal Toolkit Customise-able toolkit to deploy of a National/Regional/Thematic discovery portal Technical advisory group for NPT recommend that: – To fully engage the NODES community in the design, development, testing and deployment of the NPT. – To ensure tight integration of the NPT with the GBIF Informatics Infrastructure, while taking benefit from a wide array of additional biodiversity-related web services. – To adopt an open source content management platform such as Drupal, upon which to build and develop specific NPT modules (specifically those for integration, visualisation, and access of biodiversity- related data and information) A call for an NPT coordinator is currently in draft ViBRANT funds will support

Communications Participant forums 13 August Launch

Communications Consolidate tech docs RSS Feeds More updates

Secretariat Tech Capacity Resources – 3 current openings Java Developers – Vocabularies/Ontology Developer (30 months) –ViBRANT – Taxonomic Publishing Developer (18 months) – i4Life / Catalogue of Life

Summary: Informatics Targets Data custodians Registry Harvesters Processing Indexes APIs (user / machine) Clients Data flow Point based occurrences Grid based occurrences Checklists Dataset Metadata Refined end to end workflows for: