Introduction to biological data management

Slides:



Advertisements
Similar presentations
Status of tests with handling marine biological data in SeaDataNet - follow-up of SeaDataNet Deliverable 8.4 TTT March 2014.
Advertisements

Diana Hernandez Integrating the catalogue of Mexican biota: different approaches for different client perspectives.
BIS TDWG Conference, New Orleans, 2011 GBIF: Issues in providing federated access to digital information related to biological specimens David Remsen Senior.
Ocean Biodiversity Information – 29/11-1/12/20041 European Register of Marine Species version 2.0 data management, current status and plans for the future.
Ocean Biogeographic Information System Edward Vanden Berghe
FADA workshop, 5-7 December 2008 in Bruges (Belgium) World Register of Marine Species and Aphia IT platform Ward Appeltans
Plant names: obstacles and solutions
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen ECAT Program Officer September G A Darwin-Core Archive solution to publishing and.
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
SDN2 First Training Course, Oostende IODE-PO, 2-6 July 2012 Metadata Directories Management Sissy Iona, HCMR/HNODC.
The EDIT Platform for Cybertaxonomy as an information broker in name infrastructures Andreas Kohlbecker 1, Yde de Jong 2, Cherian Mathew 1, Lorna Morris.
What EDIT brings : Funding, Fieldwork, Training, Web, Software Gaël Lancelot EDIT Communication officer.
Ocean Biogeographic Information System. ‘Mission’ OBIS publishes primary data on marine species locations online through –It.
What is a database? An organized collection of data. This can be in an electronic, paper, or other format. Types of databases Operational -constantly changing.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 Tim ROBERTSON Systems Architect GBIF Secretariat Data Publishing.
MEDIN Data Guidelines. Data Guidelines Documents with tables and Excel versions of tables which are organised on a thematic basis which consider the actual.
PESI Pan-European Species-directories Infrastructure European GBIF nodes Meeting — Paris, 4 April 2011 Walter Berendsohn (based on presentation by Yde.
17.1 History of Classification
VOCABULARIES A data management presentation. Data management best practices Inventory of resources/datasets – Database level or series of datasets/collections.
Controlled Vocabularies (Term Lists). Controlled Vocabs Literally - A list of terms to choose from Aim is to promote the use of common vocabularies so.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Improving search in scanned documents: Looking for OCR mismatches David Morse David King Anton Dil Alistair Willis David Roberts Chris Lyal.
OBIS Portal Architecture Concepts plus potential for utilization as a basis for Regional OBIS Nodes Tony Rees, CSIRO Marine Research, Hobart (and OBIS.
Key Components and Urgent Needs of the Global Species Information System Rainer Froese IFM-GEOMAR.
Access Primer Africamuseum 5 June MS Access  Relational Database Management System Data/information resides in series of related tables Principle.
Online Data Flanders Marine Data & Information Centre InnovOcean site SeadataNet Annual Meeting, Madrid 2009.
A taxonomic and biogeographic information system of marine species in the Southern North Sea developed by Flanders Marine Institute Ward Appeltans, Edward.
GLOBAL BIODIVERSITY INFORMATION FACILITY ECAT Programme Update David Remsen & Markus Döring.
Knowledge base for growth and innovation in ocean economy: assembly and dissemination of marine data for seabed mapping LOT NO: 5 – BIOLOGY Simon Claus.
Klaas Deneudt, Stefanie Dekeyzer, Bart Vanhoorne, Leen Vandepitte, Simon Claus, Francisco Hernandez Marine community initiatives building biodiversity.
A curation interface for reconciliation of species names for India. Thomas Vattakaven and R. Prabhakar, India Biodiversity Portal, Strand Life Sciences,
Scratchpads The virtual research environment for biodiversity data Simon Rycroft, Dave Roberts, Vince Smith, Alice Heaton, Katherine Bouton, Laurence Livermore,
Development of a Marine Biological Data Portal within the framework of EMODNet Simon Claus, Leen Vandepitte and Tjess Hernandez Flanders Marine Institute.
NeMys: an evolving biological information system, a state of art Deprez, Tim (UGent) Vincx, Magda (UGent) Vanden Berghe, Edward (VLIZ) Mees, Jan (VLIZ)
1 The National Biological Information Infrastructure and Biodiversity Collections Annette Olson BCI meeting, Washington DC, January 28-29th, 2008.
Fábio Lang da Silveira – This talk on behalf of OBIS International Committee and OBIS North & South America Nodes USP – Zoology.
Ocean Biogeographic Information System Edward Vanden Berghe.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
Classification of Organisms. ► The study of the kinds and diversity of organisms and their evolutionary relationships is called taxonomy  Taxonomy is.
The History of Classification Copyright © McGraw-Hill Education Early Systems of Classification Classification is the grouping of objects or organisms.
The New GBIF Data Portal Web Services and Tools Donald Hobern GBIF Deputy Director for Informatics October 2006.
1 EMODNET pilot biological lot Francisco Hernandez, Simon Claus, Leen Vandepitte.
IABIN Species and Specimens Thematic Network (SSTN) IABIN Executive Committee/Coordinating Institution Meeting. Tierras Enamoradas, Costa Rica. February.
Lifewatch tools. Software 2 data Species observations > 40 M records Tracking data birds : 1pos / 10min Taxonomy > names Environmental data….
GLOBAL BIODIVERSITY INFORMATION FACILITY David Remsen Senior Programme Officer, ECAT 3 Oct th Nodes Meeting.
Open Access data at VLIZ Experience in retrieving data from EMODnet “Data ingestion, archiving, citation and DOI” June 26, 2014.
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
African Register of Marine Species AfReMas Leen Vandepitte On behalf of WoRMS data management team.
OBIS IODE PO OBIS INCOIS OBIS- SEAMAP Separate files OBIS Nodes Data providers Separate files GBIFLifeWatchGEOSSEOL,…CBDFAOISA Fail-over mirrorGeo-load.
Quality control of biodiversity data: tools & techniques Leen Vandepitte On behalf of WoRMS, EurOBIS & LifeWatch data management teams.
GBIF NODES Committee Meeting Copenhagen, Denmark 4 th October 2009 The GBIF Integrated Publishing Toolkit Alberto GONZÁLEZ-TALAVÁN Programme Officer for.
Leen Vandepitte On behalf of WoRMS data management team Introduction to WoRMS, the World Register of Marine Species.
Development of a Marine Biological Data Portal within the framework of EMODNet Simon Claus, Leen Vandepitte & Tjess Hernandez Flanders Marine Institute.
Taxonomic standards Leen Vandepitte On behalf of WoRMS data management team.
Wednesday 25 June 2014 – FAO, Rome BiOnym A concept-mapping workflow for taxon names reconciliation iMarine Board 5 – 25 June 2014, FAO, Rome, Italy Fabio.
Metadata standards Leen Vandepitte On behalf of WoRMS data management team.
MIKADO – Generation of ISO – SeaDataNet metadata files
Taxonomy is described sometimes as a science and sometimes as an art,
Taxonomy is described sometimes as a science and sometimes as an art,
The IPT user interface and data quality tools
Flanders Marine Institute (VLIZ)
Simon Claus Flanders Marine Institute (VLIZ)
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Improving search in scanned documents: Looking for OCR mismatches
Daphnis De Pooter on behalf of the WoRMS data management team
OBIS Data flows Dave Watts 8 March 2017 Data Centre, O&A.
An introduction to MEDIN Data Guidelines.
Overview EMODnet Biology Portal Standards used Web services available
Simon Claus, Leen Vandepitte, Klaas Deneudt & Tjess Hernandez
Presentation transcript:

Introduction to biological data management Leen Vandepitte – Flanders Marine Institute

SDN and biological data Marine biological data system in Europe International data flow Types of biological data Data formats Taxonomy Definition Spelling and other errors or variations Synonyms & homonyms World Register of Marine Species What? Users How to use it Exercises

SeaDataNet & biological data Marine observing systems highly fragmented; measurement of physical, geophysical, geological, biological and chemical parameters, biological species, … SDN: standardized system to manage these kind of data Now: mostly oceanographic data Near future: also biological data from SDN partners available According to standard data format Public data will become part of international data flow Now: short introduction to biological data; more elaborate training in future

Marine biological data system in Europe EurOBIS – European Ocean Biogeographic Information System Biogeographic data on European marine species: taxon name – position (lat-long) – time Freely available online, quality controlled data Developed within MarBEF (FP6) (2004-2009) Further maintenance by VLIZ 1 of the 14 regional nodes (RoNs) of OBIS Backbone of EMODnet Standards: Darwin Core & World Register of Marine Species (WoRMS) Currently available in EurOBIS (July 2012) 386 datasets 15,2 million distribution records International data flow: EurOBIS <=> OBIS <=> GBIF

Data flow for public data Marine data from Europe outside Europe, by European institutes Data delivery: Through email: excell, access, CSV, … Through servers: DIGIR, IPT toolkit EurOBIS: one of the 14 regional nodes of the Ocean Biogeographic Information System (OBIS) OBIS: marine thematic sub-network of the Global Biodiversity Information Facility (GBIF)

Types of biological data Observation and results Occurence Density Biomass Body morphology Condition Substance concentrations or ratios Sequencing material Biological components (=non-taxonomic groups) benthos, plankton, fish, birds, mammals, ... manatee

Geometry and sampling protocol Point soft-bottom grabs & cores (depth layer separation possible) vertical net and water samples (multiple depths possible) static net samples hard-bottom sampling (scraping or visual) static observations/underwater photography Curve net trawl, dredge or sledge transect observations/underwater video Surface surface observations

Data formats

“Common denominator” = taxonomy from Ancient Greek: τάξις taxis "arrangement" and Ancient Greek: νομία nomia "method" = the academic discipline of defining groups of biological organisms on the basis of shared characteristics and giving names to those groups. To bring order into the “chaos” of species, to help scientists in how to deal with species, so they know they are talking about the same creature and to classify them in larger groups. International Code on Zoological Nomenclature (ICZN) International Code of Nomenclature for algae, fungi, and plants Kingdom > Phylum > Class > Order > Family > Genus > Species http://www.sharky-jones.com/Sharkyjones/Slow/what%20page/whatmain1bii.html

Taxonomy: spelling errors … some names are harder to spell than others … Actinobacillus actimomycetemcomitans Actinobacillus actimycetemcomitans Actinobacillus actinmycetemcomitans Actinobacillus actinomicetemcomitans Actinobacillus actinomy Actinobacillus actinomyce Actinobacillus actinomycemcomitans Actinobacillus actinomyceremcomitans Actinobacillus actinomycetam Actinobacillus actinomycetamcomitans Actinobacillus actinomycetecomitans Actinobacillus actinomycetemcmitans Actinobacillus actinomycetemcomintans Actinobacillus actinomycetemcomitance Actinobacillus actinomycetemcomitans Actinobacillus actinomycetemcomitants Actinobacillus actinomycetemcommitans Actinobacillus actinomycetemocimitans Actinobacillus actinomycetencomitans Actinobacillus actinomycetum Actinobacillus actinomyctemcomitans Actinobacillus actinomyectomcomitans Actinobacillus actinomyetemcomitans Actinobacillus actinonmycetemcomitans Actinobacillus actionomycetemcomitans Actinobacillus actynomicetemcomitans Actinobacillus antinomycetemcomitans Difficulties with Latinized Names Transcription errors Which one is correct? Controlled taxonomic vocabulary necessary!

Dataset Before tax. check After tax. check 1 Amphiura sunderali 2 Amphiura sundevali Amphiura sundevalli 3 Amphiura sundvali 4

Taxonomy: computer errors … automatic ‘series filling’ is not always recommended … Clupea harengus Linnaeus, 1758 Clupea harengus Linnaeus, 1759 Clupea harengus Linnaeus, 1760 Clupea harengus Linnaeus, 1761 Clupea harengus Linnaeus, 1762 … Clupea harengus Linnaeus, 2254 Clupea harengus Linnaeus, 2255

Taxonomy: many ways to (correctly) spell a name Agalinus paupercula borealis Agalinus pauperculum borealis Agalinis paupercula var. Borealis Agalinus pauperculum var. borealis Agalinus paupercula var. borealis Agalinus paupercula var. borealis Pennell Agalinus paupercula Britton var. borealis Pennell Agalinus paupercula (Gray) Britt. var. borealis Pennell Agalinis paupercula (A.Gray) Britton var. borealis Pennell Agalinus paupercula (Gray) Britton var. borealis (Pennell) Zenkert 1934 Gerardia paupercula borealis Gerardia paupercula var. borealis Gerardia paupercula var. borealis (Pennell) Deam Gerardia paupercula (Gray) Britt. var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton var. borealis (Pennell) Deam Gerardia paupercula (A. Gray) Britton subsp. borealis (Pennell) Pennell Gerardia paupercula (Gray) Britt. ssp. borealis (Pennell) Pennell Gerardia paupercula Britton ssp. borealis Pennell

Taxonomy: variation in author name Ceratium hirudinella Ceratium hirudinella (Muller 1773) Ceratium hirundienella Ceratium hirundinella Ceratium hirundinella (Mull ) Ceratium hirundinella (Mull) Ceratium hirundinella (Muller ) Ceratium hirundinella (Muller 1773) Ceratium hirundinella (Muller) Ceratium hirundinella (O. F. Müller) Bergh Ceratium hirundinella (O.F. Müller) Bergh Ceratium hirundinella (O.F. Müller) Bergh Ceratium hirundinella (O.F. Müller, 1773) Dujardin, 1841 Ceratium hirundinella (O.F. M�ller) Bergh Ceratium hirundinella Dujardin Ceratium hirundinella O. F. M. Ceratium hirundinella O. F. Muller (Example from the Global Names Index - GNI)

Taxonomy: synonymy Halichondria (Halichondria) panicea (Pallas, 1766) Bread-crumb sponge (> 60 synonyms)

Taxonomy: homonymy Homonym A name for a taxon that is identical in spelling to another such name, that belongs to a different taxon. Only one of the two names can stay “valid”, the other becomes “invalid”. Scientific name: Alebion Alebion Krøyer, 1863 => Animalia, Crustacea, parasitic copepods Alebion Gray, 1867 => Animalia, Porifera => Accepted as Iophon Gray, 1867

How to deal with all this variation? Link taxon names with World Register of Marine Species (WoRMS) WoRMS: Standard list of marine taxon names First authoritative list of names of all marine & brackish water taxa worldwide Managed by VLIZ, directed by taxonomic experts Open access Follow international standards & serve permanent Global Unique IDs (LSIDs) http://www.tdwg.org/standards/150/download/ Up-to-date and (near) complete (incl. synonyms &commonly used spelling mistakes) If no link possible: Consult with data provider(s) Consult with taxonomic expert(s) at info@marinespecies.org Originally delivered name is always safeguarded! www.marinespecies.org

WoRMS content 455 642 taxa 213 997 accepted species of which 91% is checked by a taxonomic editor 20 573 images 47 275 vernacular names 155 076 key literature references (=sources) 45 000 specimen details 345 914 published distributions > 500 000 web links Notes, feeding type & habitat information, host-parasite relationships …

integrates over 100 global, regional and thematic species databases into a common IT platform, which means every species occurs in the system only once 9 RSDs 71 GSDs 4 TSDs 9 ext. GSDs

WoRMS users As standard taxonomic reference for organizations and programmes => e.g. GBIF, OBIS, CoL, EoL, ICES, NODCs, … Quality control purposes => through webservices & taxon match tool Website: 4 000 visitors per day 3 million hits per month > 600 citations of “World Register of Marine Species” through Google Scholar

How to use WoRMS? For single name: ‘search taxa’ www.marinespecies.org

For a batch of names: ‘match taxa’ (online ‘taxon match’ tool) This tool uses the following components: TAXAMATCH fuzzy matching algorithm by Tony Rees PHP/MySql port of TAXAMATCH by Michael Giddens Scientific Names Parser by Dmitry Mozzherin

Prepare your own file (Plain text [TXT], Comma Separated [CSV] & Excel Sheet [XLS, XLSX] Upload onto website

Some exercises Online taxon match Check providers page of EurOBIS (www.eurobis.org => providers) Datasets from your country/institute already present? Additional (large monitoring) datasets missing?

Online taxon match Download exercise files from training server (3 excel files) Go to www.marinespecies.org => match taxa (check online manual) Match files with online taxon match tool (! Data format) Double check possible doubtful (dubious) entries Any questions? Just ask 

SOAP web services Allows to dynamically link own applications to the register Allows to match a locally stored species list to the register Allows to extract taxonomic and other information from this register A few examples of possible applications: get the AphiaID for your taxon check the spelling of your taxa get the authority for your taxa get the full classification for your taxa resolve your unaccepted names to accepted ones get all synonyms for a taxon match your species list resolve a common name/vernacular to a scientific name get the sources/references for a taxon get the WoRMS citation for a taxon As a user or developer you can use this service to feed your own application with standard taxonomy. We currently use the platform-independent SOAP/WSDL standard. SOAP (Simple Object Access Protocol) is a way for a program running in one kind of operating system (such as Windows 2000) to communicate with a progam in the same or another kind of an operating system (such as Linux) by using the World Wide Web's Hypertext Transfer Protocol (HTTP) and its Extensible Markup Language (XML) defined by the Web Service Definition Language (WSDL), as a mechanisms for information exchange