Presentation is loading. Please wait.

Presentation is loading. Please wait.

Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and.

Similar presentations


Presentation on theme: "Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and."— Presentation transcript:

1 Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to fitness-for-use

2 Overview 1.Value of data 2.Defining Fitness for use 3.Fitness for use in biological occurrence data – Metadata – Taxonomy Data – Spatial Data – Collector and Collection Data – Descriptive Data

3 http://willscullypower.wordpress.com/2011/07/15/infographic-the-world-of-data/ Are we living in the "data century?”

4 Value of data Are we living in the "data century" ? Available data is increasing exponentially. -The GBIF community is part of this movement! These data have the potential to dramatically increase our knowledge and capabilities.

5 Data and Politics http://dirtyenergymoney.org

6 Data and Advertising Advertising agencies are important consumers of data and statistics.

7 Data and Maps

8 2010 OpenStreetMap response to Haiti earthquake Before...

9 2010 OpenStreetMap response to Haiti earthquake...and a few days later

10 Climate change & “crop wild relatives” Crop wild relatives Data from GBIF 343 species Global Climate change models Current richness Future predicted richness Predicted change

11 Turning data to understanding Oceans of data...

12 ...rivers of information...

13 ...streams of knowledge...

14 ... and drops of understanding.

15

16 Uses of biodiversity data Taxonomic research, species distribution modelling / predicting, invasive species, habitat loss, species inter-relations,... But also... Conservation planning, water resources management, antivenoms, ecotourism, history of sciences, hunting and fishing, data repatriation, nature photography and film-making,...

17 Fitness for use Definition "The general intent of describing the quality of a particular dataset or record is to describe the fitness of that dataset or record for a particular use that one may have in mind for the data." Chrisman, 1991

18 Fitness for use in action - Does species 'A' occur in Tasmania ? - Does species 'A' occur in National Park 'X' ?

19 Loss of data quality can occur at every step Collection time During digitization During documentation During storage/archiving During analysis and manipulation At time of presentation Through the use to which they are put

20 Data quality information chain Assign responsibility for the quality of data as close to data creation as possible.

21 Quality Assurance and Quality Control Judgment of quality based on internal or external standards, processes and tools. Both should be done when data quality is a concern !

22 It's important for organizations to have: A vision for providing good quality data A policy to implement that vision A strategy for implementation Considerations - Don’t reinvent the wheel; use standards - Look for inefficiencies (in data collection and QC procedures) and reduce duplication - Share data, information and tools - Look beyond immediate use - Take care of user needs - Invest in good documentation and metadata

23 Data responsibility is shared between Collectors : primary responsibility Label information is correct, as accurate as possible and readable Collection methodologies are fully documented Notes are clear and unambiguous Difficult (or impossible) to correct later

24 Data responsibility is shared between Curator/custodian : long-term responsibility Quality of data transcription in the database Validation checks are carried out (routinely) and documented Data stored and archived Earlier versions are systematically stored Ensure respect (privacy, IP, copyright sensibility of indigenous owners,...) Provide good documentation (including known errors) User feedback about data quality is taken into account Responsibility of maintenance, but also to superintend the data for use by future generations.

25 Data responsibility is shared between Users Provide feedback to custodians: errors / omissions in data and documentation setting future priorities User responsibility: determine fitness of the data for their use and not use the data in inappropriate ways.

26 Accuracy and precision Accuracy = correctness Precision: o Statistical = "repetition" o Numerical = "digits" Low accuracy High precision High accuracy Low precision High accuracy High precision

27 Errors & uncertainty Errors : both imprecision and inaccuracies Random or systematic Don't try to avoid (measure, calculate, record, document Uncertainty Always present (difficulty: understand, record and describe) Talks more about the observer's than about the data!

28 Fitness-for-use and metadata "Data about data(set)" content, accessibility, completeness,... dataset-level or record-level document error document data validation and cleaning/error correction The data must be documented with sufficient detailed metadata to enable its use by third parties without reference to the originator of the data.

29 Taxonomic data Names are often the first point of entry to biodiversity databases. => Risk of error propagation Possible errors: Wrong identification Wrong format Spelling errors

30 Taxonomic data Taxonomic data consists of: -Name -Nomenclatural status -Reference -Determination -Quality fields

31 Taxonomic data Error checking:  Missing values -Incorrect values -Non-atomic values -Domain schizophrenia -Duplicates -Inconsistent data

32 Spatial data Is one of the most crucial aspects in being able to determine the fitness-for- use of biodiversity data:  species distribution modeling  reserve selection -environmental planning and management

33 Spatial data What is it ? Point records as lat/lon ? => Area represented as:  Point/radius  Bounding box  Polyline  Grid reference

34 Example of grid-based data (checklists)

35 Spatial data definitions Georeference: the code that records a position on the surface of the earth, according to a spatial reference system (SRS). It's often a latitude/longitude pair.  synonyms: coordinates Georeferencing: the process of assigning geographic coordinates to a record. Syn: geocoding.

36 (Geodesic) datum

37 Things to know about GPS GPS technology use triangulation, a minimum of 4 satellites are needed. Since position in space and time is known, position on earth can be calculated. Historically, the number of receivable satellites was not always sufficient. Prior to May 2000, selective availability gave an accuracy of 100m or worse with most devices. Now, generally 10 meters in open areas with 4 satellites. Averaging = better precision (some devices do that automatically). Differential GPS, WAAS, LAAS, and Real-time Differential GPS are different techniques that makes use of bases stations at well- know position to applies appropriate corrections. Precision up to 1 cm. GPS height relates to the earth ellipsoid in use, not to Mean Sea Level.

38 Spatial data Common errors lat./lon. inversion zero value (one or both) no recorded datum wrong choice of SRS false sense of precision/conversion issues

39 Original GBIF data about USA

40 Collector and collection data collector date of collection additional information: habitat, soil, weather conditions... Importance vary with the type of data collected: Static collection for a museum: collector name and number, date, habitat, capture method... Observational data: +length of observation, area of observation, time of the day, activity, sex of observed animal... Survey data: +survey method and size (grid), frequency, if vouchers get collected (+collection number)

41 Collector and collection data Accuracy: of collector names, dates,... Consistency: use of a terminology in data fields such as habitat, soils, associated species... Completeness: rarely achieved for fields such as habitat, flowering... This makes a study of habitat from just collections alone difficult

42 Descriptive data Morphological, physiological, phenological,... Increasing use Quality and accuracy variable: data unobservable (historic), impractical to observe (too costly), perceived rather than real (abundance, color,...) In many cases, stored at taxonomic level rather than specimen level. Completeness: generally not possible at specimen level (i.e. fruit VS flowers characteristics) Consistency: inconsistent representation of the same attribute: o FLOWER_COLOUR = Carmine o FLOWER_COLOUR= crimson

43 Credits Based on Arthur Chapman's documents, mainly the presentation "principles of data quality" Crop Wild Relatives: Andy Jarvis(1), Samy Gaiji (2), Julian Ramirez (1) and Emmanuel Zapata (1) 1. The International Center for Tropical Agriculture (CIAT) 2. The Global Biodiversity Information Facility Secretariat (GBIF) Accuracy VS precision slide: http://www.mathsisfun.com/accuracy-precision.htmlhttp://www.mathsisfun.com/accuracy-precision.html Beach picture by Lali Masrieta :www.visualpanic.net River: Johan J.Ingles-Le NobelJohan J.Ingles-Le Nobel Stream: bterrycomptonbterrycompton Reference: Chapman, A.D. and J. Wieczorek (eds). 2006. Guide to Best Practices for Georeferencing. Copenhagen: Global Biodiversity Information Facility. Available online from http://www2.gbif.org/BioGeomancerGuide.pdf or in French as Chapman, A.D. and J. Wieczorek (eds). 2006. Principes de la bonne pratique sur le géoréférencement, version 1.0. Trad. Chenin, C. Copenhague: Global Biodiversity Information Facility, 95 pp. Disponible en ligne sur http://links.gbif.org/gbif_georeferencement_manual_fr_v1.pdfhttp://www2.gbif.org/BioGeomancerGuide.pdfhttp://links.gbif.org/gbif_georeferencement_manual_fr_v1.pdf

44 Laura Russell (larussell@vertnet.org) Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition Introduction to fitness-for-use


Download ppt "Laura Russell Programmer VertNet Buenos Aires (Argentina) 30 September 2011 Training course on biodiversity data publishing and."

Similar presentations


Ads by Google