Download presentation
Presentation is loading. Please wait.
Published byMadlyn Strickland Modified over 6 years ago
3
What needs to be checked?
4
Quality control procedures for OBIS data - background
Aim: Help data providers & data managers Checking quality Checking completeness Detect (possible) errors Assign quality flags to each available record Evaluation of fitness for purpose & use (NOT: good or bad) Filter out records with certain quality standard Example Abra alba at latitude 24,53 & longitude 67,94 in 1983 Record suitable for general distribution analysis (species occurrence) Record suitable for general temporal analysis (yearly trends) Record not suitable for seasonal analysis Record not suitable for abundance-related analyses (presence only) Slide over waarom we qc doen. Voorbeeld toont aan dat we een record niet als goed/slecht willen bestempelen, maar vooral willen aangeven waarvoor het wel of niet geschikt is. => Communication with provider can improve quality of the contributing data => users can take direct action based on the results
5
Quality control procedures for OBIS data - background
Aim: Help data providers & data managers Checking quality Checking completeness Detect (possible) errors Assign quality flags to each available record Evaluation of fitness for purpose & use (NOT: good or bad) Filter out records with certain quality standard Approach: Automated process, within the database Allows creation of filters Allows feedback to data providers Online web services Freely available for use to everyone Allows direct feedback to user (result reports) => Communication with provider can improve quality of the contributing data users can take direct action based on the results Will not go into detail on all the checks, will be the second presentation Technical: 18 quality control steps, on individual record level 10 outlier checks, on dataset or species level Each QC step = yes (1)/no (0) question Creation of a bit-sequence (2(x-1)) => stored as an integer value for the QC => unique value for each possible combination
6
Technical: 18 quality control steps, on individual record level
10 outlier checks, on dataset or species level Each QC step = yes (1)/no (0) question Creation of a bit-sequence (2(x-1)) => stored as an integer value for the QC => unique value for each possible combination QC step Value Bit-seq. 1 2(1-1) = 1 2 2(2-1) = 2 3 = 0 4 2(4-1) = 8 5 TOTAL = 11 QC step Value Bit-seq. 1 2(1-1) = 1 2 2(2-1) = 2 3 2(3-1) = 4 4 2(4-1) = 8 5 2(5-1) = 16 TOTAL = 31 If a record is evaluated positive for a certain QC, then two to the power of the unique number of this QC minus one is added to the field QC in the table ‘Eurobis’. If for example a record is evaluated positive for QC number 3 then the value 2(3-1) is added to the field QC. This way, a bit sequence is created and stored as an integer value in the field QC. This makes it possible to store the results of different QC’s in one (integer) field.
7
LifeWatch: home for a multitude of web services
Part of European Strategy Forum on Research Infrastructures (ESFRI) Distributed virtual laboratory: Biodiversity research Climatological & environmental impact studies Support development of ecosystem services Provide information for policy makers Biodiversity observatories, databases, web services and modelling tools Integration of existing systems, upgrades, new systems LifeWatch wants Standardization of species data Integration of distributed biodiversity data repositories & operating facilities LifeWatch needs Species information services LifeWatch was established as part of the European Strategy Forum on Research Infrastructures (ESFRI). LifeWatch is a distributed virtual laboratory and will be used for biodiversity research, for climatological and environmental impact studies, to support the development of ecosystem services and to provide information for policy makers in Europe. This large European research infrastructure will consist of several biodiversity observatories, databases, web services and modeling tools. It will be integrating the existing systems, upgrading them were possible and developing new systems where needed. Flemish contribution: Lifewatch.eu will build a distributed infrastructure for the study and monitoring of biodiversity in europe Pariticipation from many institutes in eu countries, and existing european networks (Marbef, LTER, PESI, GBIF, SPECIES2000,4D4live,….) VLIZ and INBO have several existing systems to offer and are already participating as partner (lifewatch preparatory project) , or as network (all of the above) The price is reasonable and the chances of success are high because it’s an upgrade of existing systems, and the expertise of both partners is proven. But the project is also innovative: integration to very large scale, novel web technologies, development of new biosenors, and unprecedented: such infrastrucure is not yet available The infrastructure will benefit many scientists because available online Marine observatory; taxonomic backbone Note: ‘central’ does not mean 1 server or 1 database but that it’s central in the workflow mechanisms. Similarly ‘integrating’ does not imply creating a new database.
8
LifeWatch offers compilation and combination of several web services
These services = taxonomic backbone Taxonomy access services Taxonomic editing environment Species occurrence services Catalogue services LifeWatch infrastucture: Identify, analyze and design online data services, models and applications Make use of all LifeWatch data = interactive part of LifeWatch
9
LifeWatch web services
Login / password required System keeps track of all your “jobs”
11
Taxonomic quality control
Taxon match: World Register of Marine Species (WoRMS) Taxon match: LifeWatch taxon match: World Register of Marine Species Integrated Taxonomic Information System (ITIS) Catalogue of Life (CoL) International Plant Name Index (IPNI) Index Fungorum (IF) PalaeoBiology Database (Palaeo-DB) Pan-European Species Infrastructure (PESI)
12
Taxonomic QC step by step
TAXON NAME X Match with WoRMS? yes Document LSID Check habitat (marine/non-marine) Check tax level (genus/species) no Match with other registers? yes no Contact the data provider for secondary check Go through matches again Is the taxon marine? 2004: MarBEF EU FP6 => European Register of Marine Species (ERMS) 2007: further development to World Register Ook nog: index fungorum, Interim Register of Marine & Non-Marine Genera (IRMNG) World Register of Marine Species Not just a name index, but expert-based taxonomic database > 200 taxonomic editors, Steering Committee & data management team Permanently hosted at VLIZ Web-based system, including web-services International standards, permanent LSID’s Taxon name available in WoRMS? Is taxon equal or more detailed than genus/species? Is taxon marine or not? Other taxonomic databases Catalogue of Life (CoL) International Plant Names Index (IPNI) Integrated Taxonomic Information System (ITIS) … yes Contact taxonomic editor: add taxon to WoRMS no Add taxon to annotated list
13
WoRMS Taxon Match Tool Freely available, no password/login required
This tool uses the following components: TAXAMATCH fuzzy matching algorithm by Tony Rees PHP/MySql port of TAXAMATCH by Michael Giddens Scientific Names Parser by Dmitry Mozzherin
14
Prepare your own file (Plain text [TXT], Comma Separated [CSV] & Excel Sheet [XLS, XLSX]
For convenience => colum “scientific_name” Upload onto website
17
WoRMS taxon match results:
Exact match Phonetic match Near_1 match Near_2 match No match Check and verify everything that is not an exact match… Some examples: Phonetic: Fragilaria aurivillii => Fragilaria aurivilii Near_1: Chaetoceros seychellarum => Chaetoceros seychellarus Near_2: Gammarus finnmarchius => Gammarus finmarchicus Syllis armoricanus => Syllis armoricana
18
LifeWatch taxon match tool
19
If a taxon is not in WoRMS: Send email to info@marinespecies.org
Currently available taxon services If a taxon is not in WoRMS: Send to Let us know if it is available in any of the other registers
21
Use this report as feedback to your provider / WoRMS
22
Taxonomic quality control – ambiguous matches
Tax. QC Scientific name: Chondracanthus, unknown species Kingdom Plantae (Rhodophyta) Kingdom Animalia (Crustacea) Scientific name: Alebion Alebion Krøyer, 1863 => Animalia, Crustacea, parasitic copepods Alebion Gray, 1867 => Animalia, Porifera => Accepted as Iophon Gray, 1867 Alebion Krøyer, 1863 Alebion Gray, 1867 accepted as Iophon Gray, 1867
23
Taxonomic quality control – its importance illustrated…
Tax. QC “… In total, 6,172 unique taxon names were submitted …. After a thorough QC, however, this number was reduced to 4,525, mostly due to spelling variations and synonymy.” “ … Such [taxonomic] quality control is highly needed, since a misspelled or obsolete name could be compared to the introduction of a rare species, with adverse effects on further (biodiversity) calculations…” Source: Vandepitte et al. (2010). Data integration for European marine biodiversity research: creating a database on benthos and plankton to study large-scale patterns and long-term changes. Hydrobiologia 644: 1-13
24
? Geographic quality control LifeWatch: Show on map
LifeWatch: Marine Regions Gazetteer services Get lat-lon by MrgID Get lat-lon by name Get Gazetteer name by lat-lon Get lat-lon by accepted name
25
Geographic QC – the concept
Communication with provider Before quality control After quality control 18°30’25’’N – 5°15’E ; 5.25 54,23N – 16.5S ; WGS84 = World Geodetic System 1984; most used geographical reference system Decimal degrees => easy to work with
26
Coordinates are indispensable
Coordinates = basis of a biogeographic information system When no coordinates are provided… Check with the data provider / the source When existing: complete the file & run QC When not existing: Derive from provided map Check Marine Regions to assign coordinates
27
Marine Regions = Standard, relational list of geographic names
Coupled with information and maps of the geographic location Improve access and clarity of the different geographic, mainly marine names such as seas, sandbanks, ridges and bays
28
Fish species “A” present in Kenya
Marine species on land? Link with adjacent sea area: EEZ Indicate precision!!!!
31
The importance of geographical QC
Some examples “Monitoring in Belgian part of the North Sea” “Monitoring in Kongsfjorden area” “+” & “-” signs switched Latitude & longitude switched
32
Sightings and strandings of marine turtles around the coast of UK and Ireland
Left: coordinates as received; right: corrected. Errors due to missing minus sign
33
What else to check…? Use common sense…
34
Dates OBIS data format check includes check on the date format: but…
Year: “1972” vs “72” vs “972” Month: between 1-12 Day: between 1-31, check takes into account the given month but… Dataset from 1990, with a few records in 1909…
35
Units OBIS can capture: Are units defined? Counts Biomass Depth
Counts: individuals per m², cm², liter, m³ Biomass: wet weight, dry weight, ash-free dry weight Depth: meter, centimeter “I collected 4 individuals of species X from location Z” => Sample size? 10 cm² - 50 cm² - 1 m² - …?
36
Significance: Needs thorough documenting
Know what you are dealing with Comparison Convert to OBIS standards Depth: in meter, positive values Abundance: NULL versus 0 (absence); positive values
37
Quality control procedures for OBIS data - automated
Remember - technical: 18 quality control steps, on individual record level 10 outlier checks, on dataset or species level Each QC step = yes (1)/no (0) question Creation of a bit-sequence (2(x-1)) => stored as an integer value for the QC => unique value for each possible combination Some steps not (yet) available through web services Will be identified once data is in OBIS Harvest report will give indication of possibly erroneous records
38
Geographic quality control – 3-dimensional check
Latitude – longitude <> 0 Latitude – longitude between -90/+90 and -180/+180 Latitude – longitude within sea area (20 km buffer) Depth value possible Plot corresponding latitude-longitude on the General Bathymetric Chart of the Oceans (GEBCO) Compare GEBCO depth with actual sampling depth Take into account 100m margin Deze kun je desnoods weglaten. We doen ook een check op diepte: als de gegeven diepte te sterk afwijkt van de GEBCo diepte, wordt deze ‘geflagd’. Mensen moeten er ook logisch bij nadenken. Die vis is een bodem-vis, dus de kans dat je die inderdaad kort onder het oppervlak vind is eerder klein… Taxon Given depth (m) GEBCO depth (m) Difference (m) Desmoscolex 2080 510 + 1570 Halieutichthys aculeatus 110 1140 - 1030 Negative evaluation in QC Needs to be looked at… Pancake batfish => usually bottom-dwelling…
39
Quality control procedures for OBIS data - outlier analyses
Only performed on OBIS database (=global coverage) Geographic outliers – dataset level Analysis on dataset level Possible location outlier(s) Methodology based on centroid calculations and assuming normal distribution => not applicable for strong asymetric datasets… Communication with provider on results Centroid No outlier Possible outlier Dataset: “ICES Biological Community” (DOME) Also identified as incorrect in record-level check of lat-lon (=land) Outliers, op dataset niveau. Geografisch: we kijken welke coordinaten statistisch afwijken van de norm binnen de dataset. Kan moeilijk zijn, voor datasets die een globale spreiding hebben. De outlier analyses kunnen best gecombineerd worden met andere qc stappen (cfr de rode cirkels). Dergelijke zaken worden gecommuniceerd met de provider, om tot de correcte oplossingen te komen. Not identified through record-level check of lat-lon (=sea), but seen as potential outlier through geographic outlier check Provider communication: Antarctic locations are incorrect (data error) Northern locations are correct (sampling bias) Vandepitte et al. (2015). Fishing for data and sorting the catch […]. Database. DOI: /database/bau125
40
Verruca stroemia (Crustacea: Cirripedia)
Environmental outliers – species level => Check for outliers within the available distribution records of a species => Geography, depth, sea surface salinity (SSS), sea surface temperature (SST) Geography Depth Centroid No outlier Possible outlier No outlier Possible outlier Environmental outliers. Hier wordt zowel geografie, diepte, SSS en SST in rekening gebracht. Deze kaart toont vooral dat verschillende analyses verschillende resultaten kunnen hebben, en dat de gebruiker zijn gezond verstand moet gebruiken bij het gebruik van deze data. QC flags zijn een hulpmiddel voor de gebruiker om te beslissen welke data hij wel/niet gebruikt in zijn analyses. De dubbelzinnige resultaten (oranje cirkels) werden voorgelegd aan expert. Middellandse zee data zijn fouten, noordelijke data zijn OK. Dubious results => additional verification: WoRMS, species distribution => not in Mediterranean, but yes in north Expert => Mediteranean records are due to erroneous identifications in the field => depth range of species varies between intertidal and 548 metres (available data only to 300 metres depth) Verruca stroemia (Crustacea: Cirripedia) Vandepitte et al. (2015) Dubious results => additional verification: World Register of Marine Species: literature & expert-based species distribution information Expert: ecological information
41
Questions? Analysing the content of the European Ocean Biogeographic Information System (EurOBIS): available data, limitations, prospects and a look at the future Hydrobiologia 667(1): 1-14 (2011) Vandepitte L., Hernandez F., Claus S., Vanhoorne B., De Hauwere N., Deneudt K., Appeltans W., Mees, J. Finding what you need in a sea of data: Assessing the data quality, completeness and fitness for use of data in marine biogeographic databases Database (2015), 1-14 (doi: /database/bau125) Vandepitte L., Bosch S., Tyberghein L., Waumans F., Vanhoorne B., Hernandez F., De Clerck O. & Mees J.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.