Download presentation
Presentation is loading. Please wait.
Published byGervais Berry Modified over 9 years ago
1
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior Programme Officer Global Biodiversity Information Facility (GBIF)
2
Issue #2: Geospatial integration Issue #3: Taxonomic integration Issue #1: The consequences of scale 3 issues
3
Issue #1: The consequences of scale Goal – Provide timely access to a large federated network of biodiversity databases
4
About GBIF 341 publishers 9290 datasets 310M records The mission of the Global Biodiversity Information Facility (GBIF) is to facilitate free and open access to biodiversity data worldwide via the Internet to underpin sustainable development. 57 countries 45 organisations
5
“Wrapper” Software PyWrapper (Python) TAPIR Link (PHP) DiGIR (PHP) Your database Insect Collection Install one of these ‘wrappers’ ABCD Bird Observations Herbarium Data DarwinCore
6
The promise of federation Insect CollectionHerbarium Bird Observations Herbarium Any specimens from Thailand? GBIF Data Portal I will ask! I do! Nope! GBIF Data Portal as a Gateway
7
The challenge of federation Insect CollectionHerbarium Bird Observations Herbarium Hello? Server Not Available GBIF Data Portal Hi!
8
The rise of Indexing Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data GBIF Data Portal (now with Data!) GBIF Data Portal as a Data Index
9
The wrong tools for the job Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? Send me a copy of your data once per month Here is page one. If I go offline, start again Not too fast! You ask the same questions every time GBIF Data Portal (now with Data!)
10
TAPIR request example dataset of 260,000 specimens 200 records retrieved per request requires 1300 request/response pairs over 9 hours to complete 500 MB of XML data is transferred becomes 32 MB text file in the GBIF server 32 MB is compressible to 3 MB zip file
11
Darwin Core Archives A text-based solution to publishing biodiversity data
12
A Refined Approach Insect CollectionHerbarium Bird Observations Herbarium Any data records from Thailand? This is fast! GBIF Data Portal (now with Data!) This is easy URL - index very large data sets - reduce latency
13
2007Today 70 million 201020082009 147 million 180 million 201 million 302 million Growth Need for a new standard identified
14
Issue #2: Geospatial Integration Goal – Provide accurate reporting of nationally-bound data Challenge – Inaccurate recording of geospatial coordinates
15
Geo-referenced USA data Verbatim data as shared on the network
16
Issue #2: Geospatial Integration Remediation includes: Use of country boundary shapefiles to verify that coordinates fall within them – Including EEZ boundaries – Including islands Outliers identified Nature of the error qualified (e.g., “coordinates inverted”) Offending records marked and omitted from display
17
Geo-referenced USA data Data following interpretation -Coastal regions recognised -Offshore islands recognised
18
Issue #3: Taxonomic Integration Goal – Provide access to biodiversity data according to taxonomic groups and concepts Challenge – – Heterogeneous and sometimes inaccurate classification Same taxon appearing in different classifications – Presence of homonyms that complicate reconciling above – Misspellings – Wide range of orthographies for the same name
19
Enabling authoratative taxonomic data to be published through GBIF
20
Trochilidae (Hummingbirds) (today) Misinterpretations (Hummingbirds are restricted to the Americas)
21
Trochilidae (Hummingbirds) (next month) Improved interpretation
22
Search for Oenanthe (water dropwort plant or wheatear bird) Difficult for user to interpret Accurate search results Today Next month resolution of homonyms
23
Improved means to match names to authority files
24
In summary GBIF has had to deploy different data access strategies in order to effectively scale Darwin Core Archive offers a scalable solution that has led to rapid growth in data published through GBIF Geospatial filtering via shapefiles provides basis for more accurate national reporting – Basis for additional services later (e.g., ecosystem shapefiles, protected areas, etc.) Heterogenous taxonomy inherent to collections data is nearly impossible to consolidate into a taxonomically accurate structure. – Comprehensive authoritative taxonomic data is a key organisational component of collections data
25
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.