Download presentation
Presentation is loading. Please wait.
Published byBrice Chase Modified over 9 years ago
1
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa Norwegian GBIF meeting Oslo 25 September 2003 WWW.GBIF.ORG The GBIF Information System
2
Global Biodiversity Information Facility GBIF is a global inte- grator
3
Global Biodiversity Information Facility Outline 1. Data and its use 2. Software architecture 3. Building the network 4. Status – where are we?
4
Global Biodiversity Information Facility GBIF is concerned with primary biodiversity data The pyramid of information l Policy and decisions can benefit from l Knowledge and l Information which depend on l Primary data Refinement, analysis, synthesis GBIF area of responsibility
5
Global Biodiversity Information Facility What are GBIF’s primary data ? l Associated notes, recordings, observational databases, etc. l These data must be digitised in order to be shared and fully utilised l Modern observation data is often digital when created l Point data is the basis of analysis and synthesis l Label data on ~1 billion specimens in natural history collections
6
Global Biodiversity Information Facility How can the primary point data becoming available through GBIF network be used? Some examples (based on data in REMIB and the Species Analyst network – Courtesy J.Soberon & T.Peterson)
7
Global Biodiversity Information Facility with Daniel A. Kluza Building Maps of Species Diversity Primary concentration of endemic species (12) Secondary concentration (4 species) Reserve Locations in Southwestern Mexican Dry Forest
8
Global Biodiversity Information Facility Predicting Species Invasions - Asian Longhorn Beetle
9
Global Biodiversity Information Facility Invasive Species and Endangered Species Barred Owls invading the range of Spotted Owls
10
Global Biodiversity Information Facility Predicting the Effects of Global Climate Change Ortalis poliocephala Before (green) vs. After (red)
11
Global Biodiversity Information Facility Canada Butterflies – Current Species Richness
12
Global Biodiversity Information Facility HSDX 2020 prediction
13
Global Biodiversity Information Facility Compare Maximum Species Richness: The present compared with HSDX 2020 prediction Present 2020
14
Global Biodiversity Information Facility Software architecture GBIF is building a distributed network of databases using a web services approach
15
Global Biodiversity Information Facility Information model Biodiversity Data Index Services Registry NodesServicesRecords GBIF Portal Participant Nodes Data Nodes Taxonomic Name Service Specimen/Observation Service General Resource Service Name List Service … Taxonomic Names Specimen/Observation Records HTML Pages Images … holds metadata for provides index of holds metadata for provide supply
16
Global Biodiversity Information Facility Data exchange standards are the key Data description in XML l Specimen, observation l Name, taxon l Institutions, providers, collections, and persons in various roles Standards process l GBIF works with TDWG l Discussion, documentation l Open source digir.sourceforge. net Standards for protocols and data exchange l DiGIR/Darwin Core l ABCD/BioCASE l Dublin Core l SOAP l Grid OGSA
17
Global Biodiversity Information Facility The l XML messaging on top of http l Enables single point of access (portal/search) to distributed information resources l Enables search & retrieval of structured data l Makes location and technical characteristics of the native resource transparent to the user l The Distributed Generic Information Retrieval protocol was created by the TDWG/CODATA subgroup on biological collection data protocol
18
Global Biodiversity Information Facility A simple DiGIR architecture DiGIR providers Databases Portals, search engines, and applications
19
Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR
20
Global Biodiversity Information Facility How does the GBIF UDDI registry work? GBIF UDDI Registry Services Registrations Provider Registrations 1) GBIF Secretariat and other developers create and populate the registry with descriptions of standards (tModels) 2) Museums and other data providers install data provider packages which are automatically registered 6) Scientists, decision- makers, and others use portals to build data sets for analysis and synthesis 5) Specialised portals and search engines can be built to query the registry and the index 4) A global index queries the registry, caches metadata, and creates a unique identifier for each record (and name) 3) GBIF Participant is notified of new provider in their domain, for endorse- ment as a GBIF data provider
21
Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR
22
Global Biodiversity Information Facility Metadata and names index l Closely paired with the services registry will be a global index of the available data l Retrieves metadata about the datasets/resources available from the registered providers l Indexes on scope and coverage of datasets/resource (Dublin Core registry)Dublin Core l Taxonomic, spatial, temporal,... l Maintains a cache of key data in case provider goes off-line
23
Global Biodiversity Information Facility Logging and accounting l Track the usage of the network and document the data provided by the nodes. l Why? l Recognise the efforts of the data providers l Help the users to acknowledge the sources of the data they are using l Report back to the Participants whether the GBIF network is really used l Optimise network performance and services l How? l Central accounting service provides statistics of usage to each data provider
24
Name Service is a major component of the global index Catalogue of Life Biodiversity Data Portal Index Taxonomic Name Service (ECAT) Specimen Data Observation Data Name Lists Unstructured Data URLs HTML/XML Data Access GBIF central services Indexing of usage Indexing of usage Index Manager GBIF Data Nodes
25
Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR
26
Global Biodiversity Information Facility Data provider software l Each system entails l Provider software l Communication with the DiGIR protocol l Data standards Darwin Core, Dublin Core l Configuration for each resource (local existing database) l Registration with GBIF UDDI registry l Turn-key package for easy installation l Based on PHP and digir.sourceforge.net code l Packaged and supported by GBIF l Available now for Linux and Windows l Installs automatically
27
Global Biodiversity Information Facility Sharing of biodiversity data is not always easy... l Taxonomists often record their data in spreadsheet, word processor, etc. l Data sets become orphans l Giving data to somebody else to manage is not an easy decision and updates are problematic l Management of an online database requires resources and knowledge l Goal: Make available a simple tool for sharing data without database
28
Global Biodiversity Information Facility Data is commonly entered in spreadsheet files...
29
Global Biodiversity Information Facility GBIF Data Repository Tool l A simple tool to enable sharing of small, scattered datasets l Users upload and manage datasets in document format such as a) spreadsheet, b) embedded Darwin Core, or c) ABCD l System parses the data into embedded MySQL database that becomes available to the public as a DiGIR resource l User can revoke release (data is deleted from database) l Stand-alone package or module of GBIF Portal Toolkit l For Linux and Windows, based on Python and Zope l Includes automatic registration in GBIF registry
30
Global Biodiversity Information Facility GBIF Data Repository Tool
31
Global Biodiversity Information Facility Portal Data provider Provider Services Provider query Request Marshaller Query Engine Available providers Registry Institutions Providers Services ( UDDI ) User Resource Metadata Resource Metadata GBIF DiGIR Architecture Index Name provider Provider Services Resource Metadata Resource Metadata and name query Metadata response Full data query Full data response Metadata and logs Synonyms, GUIDs Publish availability Cache Metadata Accounting SOAP DiGIR
32
Global Biodiversity Information Facility Portals l Portals are gateways to distributed information resources l You do not need your own portal in order to become data provider l Just access to one that talks to a registry l Anybody can write their specialised portal/search tool that uses the registry and the index through their open interfaces (DiGIR, SOAP) l The MANIS portal is available now (Java) l GBIF Portal Toolkit v2 that can be used to access data planned for availability Q1/2004
33
Global Biodiversity Information Facility GBIF Portal Toolkit Communications portal (version 1) released at the end of 2002, and as portal toolkit (PTK) for use by nodes l News syndication with RSS/RDF l Events, calendar of calendars, projects l Articles, documents, images, audio and video content l Search within the site, across the GBIF network l Download area l Getting started service and how to become a node l About GBIF l Integration with CIRCA-based group collaboration services l Integration with directory services (CIRCA-based open LDAP) l Suggestions and feedback from users l Prototype data repository tool Data access portal (version 2) Q1/2004, l Registry l Access to primary biodiversity data derived from the central index l Accounting service of use of data l Links to Participant nodes and their content
34
Global Biodiversity Information Facility Building the network Building a data network requires also building of a human network of collaboration. Data is served by providers through the nodes, which act as conduits.
35
Global Biodiversity Information Facility GBIF node responsibilities GBIF Registry, Index, and Portal Data Node Participant Node Portal 1.Network 2.Registry 3.Standards 4.Tools 1.Encourage participation 2.Manage registration of Data Nodes 1.Coordination 2.Network 3.Registry 4.Standards 5.Tools 6.Consolidated Data 1.Metadata 2.Data 1.Identify Data Nodes 2.Endorse and quality assure data nodes 3.National Language Interfaces
36
Global Biodiversity Information Facility Each Participant Node coordinates its Provider Network l Participant Nodes are in the key position to promote and assist in including new data providers and data sets l Building a data network requires building a human network l The NODES Committee l Comprises the managers of the Participant nodes l Works with the Information and Communications Technology (ICT) staff of the Secretariat to develop the network of nodes l Maintains global directory of people, roles, data providers l Shares best practices, experiences and ideas l Shares software tools
37
Global Biodiversity Information Facility Participants may choose different architectures Decentralised Centralised Participant Portal A Participant Portal C Data Warehouse Participant Portal B Data Warehouse GBIF Portal GBIF Registry GBIF Index
38
Global Biodiversity Information Facility Decentralised model: Pros & cons J Pros l Contributors in full control of the data they choose to publish l Most current and accurate version of data likely to be available l Contributors develop a sense of ownership of the process l Contributors develop a commitment to principles of long term data management l Potential number of data nodes unlimited L Cons l Requires more human and material resources l Requires stable network connections – in reality impossible to keep large number of providers online at all times l Security requirements l Requires strong integrative services
39
Global Biodiversity Information Facility Centralised model: Pros & cons J Pros l Cost-effective l Performance and availability are more controlled l Short term solution for rapid start up l Management of orphaned data sets is easier L Cons l Risk of losing local control of data l Less buy-in from data providers l Difficulty keeping information current l Requires extensive design and planning
40
Global Biodiversity Information Facility Possible tools for Participant nodes l Registry tools to endorse institutions and data providers l Access to the central UDDI registry l Local directory server or UDDI server l Directory of people, collections, institutions and related communication tools l Portal server for domain-specific website l National language support as needed l Data warehouse to host data from those nodes willing to share but unable to do so l Tools for quality assurance
41
Global Biodiversity Information Facility Training l Training programme is being shaped l 7 regional workshops in 2003 on ”Becoming a GBIF data provider” l Stockholm, Ottawa, Tsukuba, Lisbon, San José, Africa, ”francophonie” l Secretariat works through the Participant nodes, therefore: l ”Train the trainer” concept l Certification of a cadre of trainers l Standardised tools and materials
42
Global Biodiversity Information Facility Helpdesk l For all operational services l Ticket handling, followup l Will be geographically distributed l For ”GBIF-approved packages”
43
Global Biodiversity Information Facility Why share data through GBIF? l The value of data is in its use l Data that potential users are not aware of or cannot access is of little or no value. l Currently, a significant proportion biodiversity data is under- utilised because potential users are not aware of its existence or cannot access it l Increased awareness of and utilisation of existing species level biodiversity data highlights the importance of natural history collections and observational data. l This in turn increases the recognition of the importance of the associated work and will in the longer term increase funding opportunities. l Synergistic effects in combining data: 1+1>2 l Exposing information leads to improved quality l Feedback and data cleansing
44
Global Biodiversity Information Facility Why you can be comfortable with sharing data through GBIF network l GBIF IPR principles keep you in control l Identity of each record will be maintained and highlight source of data l User and provider agreements l Usage will be logged and statistics provided l The efforts of data providers will be recognised l Users required to acknowledge the sources of the data they are using l Providers will be informed about where their data is used
45
Global Biodiversity Information Facility Conclusion
46
Global Biodiversity Information Facility GBIF network status l NODES committee set its goal to have a DiGIR network up and running by end of 2003, integration of the BioCASE network to follow l Seven regional workshops and training events l Two DiGIR provider implementations available August 2003 l UDDI registry up and running July 2003 l Global index Q4/2003 l Portal to browse and search data Q4/2003, toolkit Q1/2004
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.