Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa Norwegian GBIF meeting Oslo 25 September 2003 WWW.GBIF.ORG.

Slides:



Advertisements
Similar presentations
The SDMX Registry Model April 2, 2009 Arofan Gregory Open Data Foundation.
Advertisements

European Clearing-House Mechanism Portal Toolkit Expert Group Meeting
Integrating Biodiversity Data
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
1 Adaptive Management Portal April
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa, Donald Hobern, Larry Speers, Per Bjørn & Giorgos Ksouris.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
Species Banks a GBIF mechanism to provide electronic access to quality species information Peter H. Schalk, Marc Brugman ETI, University of Amsterdam Tinde.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
Using the SAS® Information Delivery Portal
ISpheres Project. Project Overview iSpheresCore iSpheresImage Demonstration References.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Mid-Term GBIF Committees Meetings eLearning Alberto González Talaván Global Biodiversity Information Facility (GBIF) May 2011.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.
1 DanBIF Danish Biodiversity Information Facility Arbejdsseminar om GBIF i Norge Norges Forskningsråd, Oslo 25. September 2003 Isabel Calabuig.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Zope/Plone/Python for Research Ben Best OBISSEAMAP mapping marine megavertebrates
Designing Authentication for a Microsoft Windows 2000 Network Designing Authentication in a Microsoft Windows 2000 Network Designing Kerberos Authentication.
Mellon Year 1 Review Michael J. Halm Alex Valentine.
1 GBIF and Ocean Biodiversity, OBI'07 Conference, Oct 2-4, 2007, Dartmouth, Nova Scotia GBIF and Ocean Biodiversity Building the data web with OBIS Éamonn.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Copenhagen, 7 June 2006 Toolkit update and maintenance Anton Cupcea Finsiel Romania.
BioCASE – A Biological Collection Access Service for Europe BioCASE programme – metadata and computing methods The Irish National Node Workshop: October.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
TAPIR 1.0 Renato De Giovanni, Markus Döring, Javier de la Torre October 2006.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
The new European Toolkit EC-CHM Miruna Bădescu EEA contractor: Eau de Web.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Tallinn, 13 December 2005 Syndication Adriana Baciu Finsiel Romania.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
Laura Russell Programmer VertNet Buenos Aires (Argentina) 28 September 2011 Training course on biodiversity data publishing and.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
WEB SERVICE DESCRIPTION LANGUAGE (WSDL). Introduction  WSDL is an XML language that contains information about the interface semantics and ‘administrivia’
TDWG Annual Meeting Outreach and Capacity Building Work Program Beatriz Torres October 2002, Indaiatuba, Brazil.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Global Biodiversity Information Facility. GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa & al. Ecoinformatics Workshop Brussels 22 September.
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Amazon Basin Biodiversity Information Facility – ABBIF.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
INFSO-RI Enabling Grids for E-sciencE Web Services Mike Mineter National e-Science Centre, Edinburgh.
1 Data.gov Initiative Implementation Acceleration Discussion Architecture and Infrastructure Committee Meeting March 19, 2009 Mike Carleton and Sonny Bhagowalia.
IABIN Standards & Protocols Presented by: Mike Frame, USGS NBII Developed by Darrell McClarty IABIN Regional Coordinator.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa IABIN/CHM Cancún, Mexico, August
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa ECOINFORMATICS 2006 JRC, Ispra,
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
International Planetary Data Alliance Registry Project Update September 16, 2011.
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
An Overview of Data-PASS Shared Catalog
Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GLOBAL BIODIVERSITY INFORMATION FACILITY
Wsdl.
Geospatial Data Use and sharing Concepts
SDMX IT Tools SDMX Registry
Presentation transcript:

Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa Norwegian GBIF meeting Oslo 25 September The GBIF Information System

Global Biodiversity Information Facility GBIF is a global inte- grator

Global Biodiversity Information Facility Outline 1. Data and its use 2. Software architecture 3. Building the network 4. Status – where are we?

Global Biodiversity Information Facility GBIF is concerned with primary biodiversity data The pyramid of information l Policy and decisions can benefit from l Knowledge and l Information which depend on l Primary data Refinement, analysis, synthesis GBIF area of responsibility

Global Biodiversity Information Facility What are GBIF’s primary data ? l Associated notes, recordings, observational databases, etc. l These data must be digitised in order to be shared and fully utilised l Modern observation data is often digital when created l Point data is the basis of analysis and synthesis l Label data on ~1 billion specimens in natural history collections

Global Biodiversity Information Facility How can the primary point data becoming available through GBIF network be used? Some examples (based on data in REMIB and the Species Analyst network – Courtesy J.Soberon & T.Peterson)

Global Biodiversity Information Facility with Daniel A. Kluza Building Maps of Species Diversity Primary concentration of endemic species (12) Secondary concentration (4 species) Reserve Locations in Southwestern Mexican Dry Forest

Global Biodiversity Information Facility Predicting Species Invasions - Asian Longhorn Beetle

Global Biodiversity Information Facility Invasive Species and Endangered Species Barred Owls invading the range of Spotted Owls

Global Biodiversity Information Facility Predicting the Effects of Global Climate Change Ortalis poliocephala Before (green) vs. After (red)

Global Biodiversity Information Facility Canada Butterflies – Current Species Richness

Global Biodiversity Information Facility HSDX 2020 prediction

Global Biodiversity Information Facility Compare Maximum Species Richness: The present compared with HSDX 2020 prediction Present 2020

Global Biodiversity Information Facility Software architecture GBIF is building a distributed network of databases using a web services approach

Global Biodiversity Information Facility Information model Biodiversity Data Index Services Registry NodesServicesRecords GBIF Portal Participant Nodes Data Nodes Taxonomic Name Service Specimen/Observation Service General Resource Service Name List Service … Taxonomic Names Specimen/Observation Records HTML Pages Images … holds metadata for provides index of holds metadata for provide supply

Global Biodiversity Information Facility Data exchange standards are the key Data description in XML l Specimen, observation l Name, taxon l Institutions, providers, collections, and persons in various roles Standards process l GBIF works with TDWG l Discussion, documentation l Open source digir.sourceforge. net Standards for protocols and data exchange l DiGIR/Darwin Core l ABCD/BioCASE l Dublin Core l SOAP l Grid OGSA

Global Biodiversity Information Facility The l XML messaging on top of http l Enables single point of access (portal/search) to distributed information resources l Enables search & retrieval of structured data l Makes location and technical characteristics of the native resource transparent to the user l The Distributed Generic Information Retrieval protocol was created by the TDWG/CODATA subgroup on biological collection data protocol

Global Biodiversity Information Facility A simple DiGIR architecture DiGIR providers Databases Portals, search engines, and applications

Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR

Global Biodiversity Information Facility How does the GBIF UDDI registry work? GBIF UDDI Registry Services Registrations Provider Registrations 1) GBIF Secretariat and other developers create and populate the registry with descriptions of standards (tModels) 2) Museums and other data providers install data provider packages which are automatically registered 6) Scientists, decision- makers, and others use portals to build data sets for analysis and synthesis 5) Specialised portals and search engines can be built to query the registry and the index 4) A global index queries the registry, caches metadata, and creates a unique identifier for each record (and name) 3) GBIF Participant is notified of new provider in their domain, for endorse- ment as a GBIF data provider

Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR

Global Biodiversity Information Facility Metadata and names index l Closely paired with the services registry will be a global index of the available data l Retrieves metadata about the datasets/resources available from the registered providers l Indexes on scope and coverage of datasets/resource (Dublin Core registry)Dublin Core l Taxonomic, spatial, temporal,... l Maintains a cache of key data in case provider goes off-line

Global Biodiversity Information Facility Logging and accounting l Track the usage of the network and document the data provided by the nodes. l Why? l Recognise the efforts of the data providers l Help the users to acknowledge the sources of the data they are using l Report back to the Participants whether the GBIF network is really used l Optimise network performance and services l How? l Central accounting service provides statistics of usage to each data provider

Name Service is a major component of the global index Catalogue of Life Biodiversity Data Portal Index Taxonomic Name Service (ECAT) Specimen Data Observation Data Name Lists Unstructured Data URLs HTML/XML Data Access GBIF central services Indexing of usage Indexing of usage Index Manager GBIF Data Nodes

Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR

Global Biodiversity Information Facility Data provider software l Each system entails l Provider software l Communication with the DiGIR protocol l Data standards Darwin Core, Dublin Core l Configuration for each resource (local existing database) l Registration with GBIF UDDI registry l Turn-key package for easy installation l Based on PHP and digir.sourceforge.net code l Packaged and supported by GBIF l Available now for Linux and Windows l Installs automatically

Global Biodiversity Information Facility Sharing of biodiversity data is not always easy... l Taxonomists often record their data in spreadsheet, word processor, etc. l Data sets become orphans l Giving data to somebody else to manage is not an easy decision and updates are problematic l Management of an online database requires resources and knowledge l Goal: Make available a simple tool for sharing data without database

Global Biodiversity Information Facility Data is commonly entered in spreadsheet files...

Global Biodiversity Information Facility GBIF Data Repository Tool l A simple tool to enable sharing of small, scattered datasets l Users upload and manage datasets in document format such as a) spreadsheet, b) embedded Darwin Core, or c) ABCD l System parses the data into embedded MySQL database that becomes available to the public as a DiGIR resource l User can revoke release (data is deleted from database) l Stand-alone package or module of GBIF Portal Toolkit l For Linux and Windows, based on Python and Zope l Includes automatic registration in GBIF registry

Global Biodiversity Information Facility GBIF Data Repository Tool

Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR

Global Biodiversity Information Facility Portals l Portals are gateways to distributed information resources l You do not need your own portal in order to become data provider l Just access to one that talks to a registry l Anybody can write their specialised portal/search tool that uses the registry and the index through their open interfaces (DiGIR, SOAP) l The MANIS portal is available now (Java) l GBIF Portal Toolkit v2 that can be used to access data planned for availability Q1/2004

Global Biodiversity Information Facility GBIF Portal Toolkit Communications portal (version 1) released at the end of 2002, and as portal toolkit (PTK) for use by nodes l News syndication with RSS/RDF l Events, calendar of calendars, projects l Articles, documents, images, audio and video content l Search within the site, across the GBIF network l Download area l Getting started service and how to become a node l About GBIF l Integration with CIRCA-based group collaboration services l Integration with directory services (CIRCA-based open LDAP) l Suggestions and feedback from users l Prototype data repository tool Data access portal (version 2) Q1/2004, l Registry l Access to primary biodiversity data derived from the central index l Accounting service of use of data l Links to Participant nodes and their content

Global Biodiversity Information Facility Building the network Building a data network requires also building of a human network of collaboration. Data is served by providers through the nodes, which act as conduits.

Global Biodiversity Information Facility GBIF node responsibilities GBIF Registry, Index, and Portal Data Node Participant Node Portal 1.Network 2.Registry 3.Standards 4.Tools 1.Encourage participation 2.Manage registration of Data Nodes 1.Coordination 2.Network 3.Registry 4.Standards 5.Tools 6.Consolidated Data 1.Metadata 2.Data 1.Identify Data Nodes 2.Endorse and quality assure data nodes 3.National Language Interfaces

Global Biodiversity Information Facility Each Participant Node coordinates its Provider Network l Participant Nodes are in the key position to promote and assist in including new data providers and data sets l Building a data network requires building a human network l The NODES Committee l Comprises the managers of the Participant nodes l Works with the Information and Communications Technology (ICT) staff of the Secretariat to develop the network of nodes l Maintains global directory of people, roles, data providers l Shares best practices, experiences and ideas l Shares software tools

Global Biodiversity Information Facility Participants may choose different architectures Decentralised Centralised Participant Portal A Participant Portal C Data Warehouse Participant Portal B Data Warehouse GBIF Portal GBIF Registry GBIF Index

Global Biodiversity Information Facility Decentralised model: Pros & cons J Pros l Contributors in full control of the data they choose to publish l Most current and accurate version of data likely to be available l Contributors develop a sense of ownership of the process l Contributors develop a commitment to principles of long term data management l Potential number of data nodes unlimited L Cons l Requires more human and material resources l Requires stable network connections – in reality impossible to keep large number of providers online at all times l Security requirements l Requires strong integrative services

Global Biodiversity Information Facility Centralised model: Pros & cons J Pros l Cost-effective l Performance and availability are more controlled l Short term solution for rapid start up l Management of orphaned data sets is easier L Cons l Risk of losing local control of data l Less buy-in from data providers l Difficulty keeping information current l Requires extensive design and planning

Global Biodiversity Information Facility Possible tools for Participant nodes l Registry tools to endorse institutions and data providers l Access to the central UDDI registry l Local directory server or UDDI server l Directory of people, collections, institutions and related communication tools l Portal server for domain-specific website l National language support as needed l Data warehouse to host data from those nodes willing to share but unable to do so l Tools for quality assurance

Global Biodiversity Information Facility Training l Training programme is being shaped l 7 regional workshops in 2003 on ”Becoming a GBIF data provider” l Stockholm, Ottawa, Tsukuba, Lisbon, San José, Africa, ”francophonie” l Secretariat works through the Participant nodes, therefore: l ”Train the trainer” concept l Certification of a cadre of trainers l Standardised tools and materials

Global Biodiversity Information Facility Helpdesk l For all operational services l Ticket handling, followup l Will be geographically distributed l For ”GBIF-approved packages”

Global Biodiversity Information Facility Why share data through GBIF? l The value of data is in its use l Data that potential users are not aware of or cannot access is of little or no value. l Currently, a significant proportion biodiversity data is under- utilised because potential users are not aware of its existence or cannot access it l Increased awareness of and utilisation of existing species level biodiversity data highlights the importance of natural history collections and observational data. l This in turn increases the recognition of the importance of the associated work and will in the longer term increase funding opportunities. l Synergistic effects in combining data: 1+1>2 l Exposing information leads to improved quality l Feedback and data cleansing

Global Biodiversity Information Facility Why you can be comfortable with sharing data through GBIF network l GBIF IPR principles keep you in control l Identity of each record will be maintained and highlight source of data l User and provider agreements l Usage will be logged and statistics provided l The efforts of data providers will be recognised l Users required to acknowledge the sources of the data they are using l Providers will be informed about where their data is used

Global Biodiversity Information Facility Conclusion

Global Biodiversity Information Facility GBIF network status l NODES committee set its goal to have a DiGIR network up and running by end of 2003, integration of the BioCASE network to follow l Seven regional workshops and training events l Two DiGIR provider implementations available August 2003 l UDDI registry up and running July 2003 l Global index Q4/2003 l Portal to browse and search data Q4/2003, toolkit Q1/2004