Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Larry Speers Global Biodiversity Information Facility Biodiversiteitsinformatie in Nederland woensdag 14 januari 2004
Global Biodiversity Information Facility “..there will be winners and there will be losers… The next century will be the ‘Age of Biology’, just as this one has been the age of physics and astronomy. Specifically, those countries who best know how to correlate, analyze, and communicate biological information will be in the leading position to achieve economic and scientific advances” Sir Robert May, Chief Scientist, U.K., July 1998
Global Biodiversity Information Facility What is GBIF ? A distributed megascience facility aimed at l Making the world’s biodiversity data freely and universally available via the Internet l Sharing primary scientific biodiversity data to benefit society, science and a sustainable future
Global Biodiversity Information Facility MEGASCIENCE FORUM of the OECD (became Global Science Forum after the GBIF recommendation was adopted) Examples of Working Groups: l Neutron Sources l Nuclear Physics l Radio Astronomy l Biological Informatics (1996–1999) Subgroup : Biodiversity Informatics Subgroup : Neuroinformatics Recommended that the Megascience Forum endorse development of the Global Biodiversity Information Facility
Global Biodiversity Information Facility When was GBIF started ? l The MoU resulted from the recommendations of an international working group / steering committee l The group met several times between June 1996 and December 2000, when the MoU was opened for signature l GBIF came into existence on 1 March 2001, when the first 10 countries signed the Memorandum of Understanding (MoU) and pledged a total of US$2M
Global Biodiversity Information Facility GBIF Mission...making the world’s biodiversity data freely and universally available via the Internet.
Global Biodiversity Information Facility GBIF Voting Participants 24 l Australia l Belgium l Canada l Costa Rica l Denmark l Estonia l Finland l France l Germany l Iceland l Japan l Republic of Korea l Mexico l Netherlands l New Zealand l Nicaragua l Portugal l Peru l Slovenia l South Africa l Spain l Sweden l UK l USA
Global Biodiversity Information Facility GBIF Associate Participants l Argentina l Austria l Bulgaria l Czech Republic l Ghana l Madagascar l Morocco l Pakistan l Poland l Slovak Republic l Switzerland l Taiwan l Tanzania l European Commission l ALL Species Foundation l ASEANET l BioNET l BIOSIS l CABI Bioscience l EASIANET l Expert Centre for Taxonomic Identification l Inter-American Biodiversity Information Network l Integrated Taxonomic Information System l NatureServe l Ocean Biogeographic Information System Société de Bactériologie Systématique et Vétérinaire l Species 2000 l Taxonomic Databases Working Group l UNESCO Man and the Biosphere Program l UNEP (World Conservation Monitoring Centre) l World Federation for Culture Collections l Wildscreen Trust
Global Biodiversity Information Facility l Demand for Biological Information: l Biotechnology, biodiversity, climate change, environmental problems, invasive species, human health, sustainable development Why was GBIF established ?
Global Biodiversity Information Facility Nature is so complex We know so little
Global Biodiversity Information Facility l Demand for Biological Information: l Biotechnology, biodiversity, climate change, environmental problems, invasive species, human health, sustainable development l Bioinformatics l Computing Power: l Moore’s Law Why was GBIF established ?
Global Biodiversity Information Facility “With $2500 desktop PCs now delivering more raw computing power than the first Cray, bioinformatics is rapidly becoming the critical technology for the 21st century biology” R. Robbins, Fred Hutchinson Cancer Research Center
Global Biodiversity Information Facility Biodiversity informatics is the application of information technology to biodiversity with the emphasis on persistent data stores. Modified from R. Robbins, Fred Hutchinson Cancer Research Center Definition
DNA Phenotypes Proteins Populations Species Ecosystems Abiotic Factors Fundamental Dogma Adapted from R. Robbins
DNA Phenotypes Proteins Populations Species Ecosystems Abiotic Factors Bioinformatics Adapted from R. Robbins
Abiotic Factors Bioinformatics Persistent Primary Data Stores DNA Phenotypes Proteins Populations Species Ecosystems Adapted from R. Robbins Map Databases GenBank EMBL DDBJ PDB SwissPROT PIR
DNA Phenotypes Proteins Populations Species Ecosystems Abiotic Factors Biodiversity Informatics Adapted from R. Robbins
Abiotic Factors Persistent Primary Data Stores DNA Phenotypes Proteins Populations Species Ecosystems Adapted from R. Robbins Living Collections + Museum Collections Literature Observational Databases
Molecular Biological Informatics “bioinformatics” Age of “molecular biology” virtually equals age of computers (ca. 50 yr); > 95% of all data are digitized Many of the data automatically share common language (i.e., ATGC, amino acids, etc.) Minimum of $1B spent per year on "bioinformatics" Biodiversity Informatics Knowledgebase is 5X older than computers (ca. 250 yr); < 5% is digitized Data languages are immensely complex on biological and sociological levels (no standardization) $50M per year spent on biodiversity informatics, even though a minimum of $1B is spent per year on environmental observations globally Biodiversity Informatics as a Megascience Activity
Global Biodiversity Information Facility l Demand for Biological Information: l Biotechnology, biodiversity, climate change, environmental problems, invasive species, human health, sustainable development l Bioinformatics l Computing Power: l Moore’s Law l Electronic Connectivity l Internet l Distributed Information Systems Why was GBIF established ?
Global Biodiversity Information Facility
Where is GBIF located ? l Unlike CERN, the megascience instrumentation facility for particle physics that is located in Switzerland, GBIF is a megascience facility that is distributed all over the world, with its many parts connected by the Internet l The small, non-bureaucratic GBIF Secretariat is hosted by the Zoological Museum of the University of Copenhagen, Denmark CERN
Global Biodiversity Information Facility
What does GBIF do ? l In order to promote the sharing and use of scientific biodiversity data by everyone, it focuses on four areas of activity: l Data Access and Database Interoperability ( DADI) l Electronic Catalog of Names of Known Organisms ( ECAT ) l Outreach and Capacity Building ( OCB ) l Digitisation of Natural History Collections ( DIGIT)
Global Biodiversity Information Facility How does GBIF work ? l NODES Committee l Comprises the managers of the Participant nodes l Works with the Information and Communications Technology (ICT) staff of the Secretariat to develop the network of nodes l Participant nodes share software and ideas with each other and with data providers l Secretariat ICT staff advise, coordinate and provide software toolkits
Global Biodiversity Information Facility Network Structure Participant Portals or Nodes GBIF Portal Data-rich local sources Distributed, local or regional, specialised databases
Global Biodiversity Information Facility GBIF Principles l Equitable sharing of data l Data providers retain control l Protection of intellectual property rights l Distributed network architecture l Common standards and protocols l Partnership with other networks l Avoidance of duplication of effort l Promotion of technical developments to deal with complexity of biodiversity data
Global Biodiversity Information Facility The following is a simple classification of the biodiversity data for which GBIF is responsible: l Taxonomic data, including: l Scientific names, including data on synonymy l Vernacular names l Taxonomic descriptions, including diagnostic keys l Taxon occurrence information (primarily species- level, but including data for taxa at different ranks where appropriate): l Specimen records (from natural history collections) l Observation records l Links to other taxon-level information, including: l Information on taxon biology and life history l Ecological interactions l Genetic data l Sound and image resources
Global Biodiversity Information Facility Characteristics of the Species Level Biodiversity Data Domain- l Data developers are numerous, specialized and widely distributed l Government labs l Universities l Museums l Private individuals l Quality data critical to environmental decision making l Legacy data extremely valuable l Data are dynamic l Legacy data continually being updated and enhanced l New data continually being added l Primary data has common core attributes
Global Biodiversity Information Facility Primary species occurrence core data includes but is not limited to the following essential details: l Name of the taxon to which the organism has been assigned l Location where the specimen was collected or the observation made l Date on which the specimen was collected or the observation made l Where the specimen or record is held and how to access more information
Global Biodiversity Information Facility GBIF-DIGIT: Mission To facilitate the expansion of biodiversity knowledge by having legacy and newly acquired primary species occurrence data digitised and dynamically accessible.
Global Biodiversity Information Facility What are GBIF’s primary data ? l Associated notes, recordings, metadata, etc. l These data must be digitised in order to be shared and fully utilised GBIF-DIGIT l Label data on ~ billion specimens in natural history collections l Species level observational data sets
> 2 billion specimens worldwide
Global Biodiversity Information Facility
Natural History Collections Data l Strengths l Identification of specimens auditable l Potential for DNA analysis l Often long time series l Broad taxonomic coverage l Type specimens l Weakness l Presence only data l Often poorly curated l Locality data often lacks precision l Seldom collected in a systematic way l Often not in digital format l Any one collection has limited taxonomic, spatial and temporal coverage
Global Biodiversity Information Facility Observational Data Sets l Strengths l Often presence-absence data l Often collected in a systematic way l Usually precise locality information l Usually in digital format l Weakness l Individual identifications NOT auditable l Generally short time series l Limited taxonomic coverage l Any one data set has limited taxonomic, spatial and temporal coverage
Global Biodiversity Information Facility
TEX (University of Texas at Austin) UADY (University of Yucatan) ARIZ (University of Arizona) CIDIIR (Center of Scientific Research of Durango)
XAL (Institute of Ecology, Xalapa)CAS (California Academy of Sciences) MEXU (National University of Mexico) CICY (Center of Scientific Research of Yucatan)
The Virtual Herbarium of Mexico 700,000 registers from 25 Herbaria In Mexico and the United States.
Global Biodiversity Information Facility “Taken collectively, the plant and animal specimens in the U.S. museum collections provide our most complete picture of the biological diversity of the entire nation.” U.S Dept. of the Interior Electronic National Museum Proposal
Global Biodiversity Information Facility Characteristics of a Megascience Effort l Something that cannot be undertaken by only one country l expense l no one country has access to all the data l Some components of the research can be done at the national or regional levels, but some must be truly global l Usually infrastructural in nature (e.g. CERN) l Involves collaboration among many scientists and others l The topic is hugely inclusive and affects many disciplines
Global Biodiversity Information Facility “Interoperability must be perceived as the sharing of information.” Eliminating Legal and Policy Barriers to Interoperable Government Systems - Electronic Commerce, Law, and Information Policy Strategies Report June 1999
Global Biodiversity Information Facility “ The value of data lies in their use.” Bits of Power – Issues in Global Access to Scientific Data National Academy Press 1997
Global Biodiversity Information Facility
Analysis API Prediction Algorithms Environmental Resources Points to Distributions Server Information Retrieval API Desktop Applications Specimen Databases
Current Environmental Coverages Prediction Tools Point Data Distribution Predicted for Native Region Distribution After Climate Change Climate Distribution Predicted In Non-native Region Environmental Niche Model Prediction Algorithm (GARP) Region
Global Biodiversity Information Facility Why share data? l Advantages of sharing core collection data for individual curators l Increased use of collections l Increased justification for funding, collection development, staffing etc. (Use it or lose it) l Advantages of sharing core collection data for individual biodiversity scientists l Making available high quality data for use by others l Helps improve quality of data by making it visible l Increased visibility and relevance of biodiversity community will result in increased funding l Advantages of sharing data for individual institutions l Increase value of collections by increasing access and use l Increased use will increase relavance and result in increased funding l Decrease of staff time answering queries
Global Biodiversity Information Facility “Interoperability must be perceived as the sharing of information.” Eliminating Legal and Policy Barriers to Interoperable Government Systems - Electronic Commerce, Law, and Information Policy Strategies Report June 1999
Global Biodiversity Information Facility “The most profound barriers to interoperability are the soft “human technologies” implied in fundamental policy and organizational design.” Eliminating Legal and Policy Barriers to Interoperable Government Systems - Electronic Commerce, Law, and Information Policy Strategies Report June 1999