BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.

Slides:



Advertisements
Similar presentations
Current Trends in Biodiversity Collection Description Neil Thomson The Natural History Museum.
Advertisements

GUID-1 Workshop Welcome and Introduction Donald Hobern GBIF Program Officer for Data Access and Database Interoperability February 2006.
To share data, all providers must agree upon a data standard.
The DNA Bank Network Gabriele Droege Botanic Garden and Botanical Museum Berlin-Dahlem Freie Universität Berlin.
Integrating Biodiversity Data
The Natural History Museum Speaker: Charles Hussey Science Data Co-ordinator Department of Information and Library Systems
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Collections Management New features in KE EMu 3.1 and beyond.
The EDIT Platform for Cybertaxonomy as an information broker in name infrastructures Andreas Kohlbecker 1, Yde de Jong 2, Cherian Mathew 1, Lorna Morris.
ISO/TC211 Geographic Information/Geomatics Implementing ISO Metadata David Danko Work Item 15—Project Leader
SilverLining. Stuff we're covering Hardware infrastructure and scaling Cloud platform as a service The SilverLining Project.
SERNEC Image/Metadata Database Goals and Components Steve Baskauf
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
GLOBAL BIODIVERSITY INFORMATION FACILITY The Global Biodiversity Information Facility (GBIF ): The distributed architecture Samy Gaiji Head of Informatics.
The Internet and the World Wide Web. The Internet A Network is a collection of computers and devices that are connected together. The Internet is a worldwide.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer October DarwinCore Archives – Simplified Format for publishing.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Archival information system ARHiNET Croatian national archival information system Vlatka Lemić Croatian State Archives, Croatia.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
Open access to biodiversity data: the speciesLink experience Dora Ann Lange Canhos
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
GLOBAL BIODIVERSITY INFORMATION FACILITY TDWG 2009, Montpelier, November 12, 2009 Dag Endresen (NordGen)Samy Gaiji (GBIF) Dag Endresen (NordGen) & Samy.
ABCD & BioCASe A Quick Introduction. Motivation & Rationale – ABCD I “Access to Biological Collection Data”  v2.06 ratified by TDWG, v1.20 still in use.
Image Workflow Processes Elspeth Haston, Robert Cubey, Martin Pullan & David J Harris.
Digitization of Natural History Collections (DIGIT) Larry Speers Program Officer Digitization of Natural History Collections Data TDWG Annual Meeting Oct.
GISIN Web Portal Enabling Invasive Species Data Interchange GISIN technical team.
TDWG 2006, Missouri, U.S.A. Exchange of germplasm datasets with PyWrapper/BioCASE October 16, 2006 TDWG annual Meeting 2006 Missouri Botanical Garden St.
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Nicholas King February 12, GBIF efforts in digitizing and.
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
BY INNOCENT AKAMPURIRA, UgaBIF NODE MANAGER, UNCST 2011 TDWG CONFERENCE NEW ORLEANS, USA 16 TH – 21 ST OCTOBER
BioCASE – A Biological Collection Access Service for Europe BioCASE programme – metadata and computing methods The Irish National Node Workshop: October.
Experts Workshop on the IPT, v. 2, Copenhagen, Denmark The Pathway to the Integrated Publishing Toolkit version 2 Tim Robertson Systems Architect Global.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
TAPIR 1.0 Renato De Giovanni, Markus Döring, Javier de la Torre October 2006.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
An introduction to data exchange protocols in TDWG Renato De Giovanni TDWG 2008.
BIS TDWG Conference, New Orleans, 2011 GBIF: the challenges of intra- and inter-operability at large scales David Remsen Senior Programme Officer Global.
Canadensys update. Canadensys: what is it? A Canadian network of 11 universities, 5 botanical gardens and 2 museums. Over 25 biological collections and.
Beispielbild BioCASe, ABCD and its extensions Jörg Holetschek Botanic Garden & Botanical Museum Berlin-Dahlem Dept. of Biodiversity Informatics and Laboratories.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
LSIDs and RDF in TDWG Roger Hyam, TDWG, RBGE Donald Hobern, GBIF June 7-9, Edinburgh, UK.
IABIN Executive Committee / Coordinating Institution Meeting GBIF and IABIN: status and opportunities in 2011 Juan Bello, Mélianie Raymond & Alberto González-Talaván.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Hannu Saarenmaa EC CHM & GBIF European Regional Nodes Meeting Copenhagen,
Taxonomic Workflow in the EDIT Platform for Cybertaxonomy Andreas Kohlbecker, Pepe Ciardelli, Niels Hoffmann, Katja Luther, Andreas Müller Botanic Garden.
Networking Biodiversity Data – Online Access to Distributed Data Sources in GBIF-D Andrea Hahn, A. Kirchhoff & W.G. Berendsohn Botanic Garden and Botanical.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
Amazon Basin Biodiversity Information Facility – ABBIF.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
TapirLink: Enabling the transition to TAPIR Renato De Giovanni TDWG 2007.
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan Senior Programme Officer for DIGIT 10 th Meeting of the GBIF Participant Node Managers Committee.
Laura Russell VertNet Meherzad Romer NatureServe Canada John Wieczorek
GLOBAL BIODIVERSITY INFORMATION FACILITY Vishwas Chavan and Eric Gilman 10 th Meeting of the GBIF Participant Node Managers Committee 3 – 5 October 2009.
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
GBIF – collaborating to promote data access for research and policy Tim Hirsch Deputy Director Global Biodiversity Information Facility (GBIF) Biodiversity.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
IABIN Architecture and Interoperability Boris Ramirez Thematic Network Coordinator Fifth Council Meeting Punta del Este, Uruguay, May 10, 2007.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
GB22 TRAINING EVENT FOR NODES – 4 OCTOBER 2015 Session 02: 2015 Data Publishing Landscape Laura Russell.
New features in KE EMu 3.1 and beyond
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
GLOBAL BIODIVERSITY INFORMATION FACILITY
Presentation transcript:

BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF)

Issue #2: Geospatial integration Issue #3: Taxonomic integration Issue #1: The consequences of scale 3 issues

Issue #1: The consequences of scale Goal – Provide timely access to a large federated network of biodiversity databases

About GBIF 341 publishers 9290 datasets 310M records The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development. 57 countries 45 organisations

“Wrapper” Software PyWrapper (Python) TAPIR Link (PHP) DiGIR (PHP) Your database Insect Collection Install one of these ‘wrappers’ ABCD Bird Observations Herbarium Data DarwinCore

The promise of federation Insect CollectionHerbarium Bird Observations Herbarium Any specimens from Thailand? GBIF Data Portal I will ask! I do! Nope! GBIF Data Portal as a Gateway

The challenge of federation Insect CollectionHerbarium Bird Observations Herbarium Hello? Server Not Available GBIF Data Portal Hi!

The rise of Indexing Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data GBIF Data Portal (now with Data!) GBIF Data Portal as a Data Index

The wrong tools for the job Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data once per month Here is page one. If I go offline, start again Not too fast! You ask the same questions every time GBIF Data Portal (now with Data!)

TAPIR request example dataset of 260,000 specimens 200 records retrieved per request requires 1300 request/response pairs over 9 hours to complete 500 MB of XML data is transferred becomes 32 MB text file in the GBIF server 32 MB is compressible to 3 MB zip file

Darwin Core Archives A text-based solution to publishing biodiversity data

A Refined Approach Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? This is fast! GBIF Data Portal (now with Data!) This is easy URL - index very large data sets - reduce latency

2007Today 70 million million 180 million 201 million 302 million Growth Need for a new standard identified

Issue #2: Geospatial Integration Goal – Provide accurate reporting of nationally-bound data Challenge – Inaccurate recording of geospatial coordinates

Geo-referenced USA data Verbatim data as shared on the network

Issue #2: Geospatial Integration Remediation includes: Use of country boundary shapefiles to verify that coordinates fall within them – Including EEZ boundaries – Including islands Outliers identified Nature of the error qualified (e.g., “coordinates inverted”) Offending records marked and omitted from display

Geo-referenced USA data Data following interpretation -Coastal regions recognised -Offshore islands recognised

Issue #3: Taxonomic Integration Goal – Provide access to biodiversity data according to taxonomic groups and concepts Challenge – – Heterogeneous and sometimes inaccurate classification Same taxon appearing in different classifications – Presence of homonyms that complicate reconciling above – Misspellings – Wide range of orthographies for the same name

Enabling authoratative taxonomic data to be published through GBIF

Trochilidae (Hummingbirds) (today) Misinterpretations (Hummingbirds are restricted to the Americas)

Trochilidae (Hummingbirds) (next month) Improved interpretation

Search for Oenanthe (water dropwort plant or wheatear bird) Difficult for user to interpret Accurate search results Today Next month resolution of homonyms

Improved means to match names to authority files

In summary GBIF has had to deploy different data access strategies in order to effectively scale Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF Geospatial filtering via shapefiles provides basis for more accurate national reporting – Basis for additional services later (e.g., ecosystem shapefiles, protected areas, etc.) Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure. – Comprehensive authoritative taxonomic data is a key organisational component of collections data

Thank you