Presentation is loading. Please wait.

Presentation is loading. Please wait.

Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh.

Similar presentations


Presentation on theme: "Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh."— Presentation transcript:

1 Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

2 Geographic SpaceEcological Space occurrence points on native distribution ecological niche modeling Projection back onto geography Native range prediction Invaded range prediction The SEEK Prototype: Ecological Niche Modeling temperature Model of niche in ecological dimensions precipitation Biodiversity information e.g. data from museum specimens, ecological surveys Geospatial and remotely sensed data Results taken to integrate with other data realms (e.g., human populations, public health, etc.)

3 Species prediction map Predicted Distribution: Amur snakehead (Channa argus) Image from http://www.lifemapper.org

4 SEEK Overview Analysis and Modelling System (Kepler) Modelling scientific workflows EcoGrid: Making diverse environmental data systems interoperate Semantic Mediation System: “Smart” data discovery and integration Taxon WG: Taxonomic name/concept resolution server

5 Scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; AMS captures this knowledge

6 Kepler: Ecological Niche Model

7 Metadata driven data ingestion  Key information needed to read and machine process a data file is in the metadata  Physical descriptors (CSV, Excel, RDBMS, etc.)  Logical Entity (table, image, etc) and Attribute (column) descriptions  Name  Type (integer, float, string, etc.)  Codes (missing values, nulls, etc.)  Integrity constraints  Semantic descriptions (ontology-based type systems)

8 Ecological ontologies  What was measured (e.g., biomass)  Type of quantity measured (e.g., Energy)  Context of measurement (e.g., Psychotria limonensis)  How it was measured (e.g., dry weight)

9  Label data with semantic types  Label inputs and outputs of analytical components with semantic types  Use reasoning engines to generate transformation steps  Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components

10 Data integration  Homogeneous data integration  Integration of homogeneous data via EML metadata is relatively straightforward  Heterogeneous Data integration  Requires advanced metadata and processing  Attributes must be semantically typed  Collection protocols must be known  Units and measurement scale must be known  Measurement relationships must be known  e.g., that ArealDensity=Count/Area

11 Life Sciences Data  Much of the data gathered in ecological studies and used in ecological data analysis is bio-referenced data  typically organisms are referenced by a Latin name  Many analyses requires integrating data originating in many locations and at various points in time  for most bio-referenced data, integration involves matching on organism name

12 Biological (scientific) Names  Used for communicating information about known organisms and groups of organisms – taxa  Framework for all biologists to communicate with…  Taxonomists apply scientific names to species and higher taxa in their classifications  Formalized and validated according to strict codes of nomenclature  (different depending on kingdom)  Latin name is a polynomial for species and below; monomial for genus and above  Quoted as: LatinName NameAuthors Year  Example: Carya floridana Sarg. 1913

13 Taxon_concept classify Pile of specimens Genus Species Taxonomic Hierarchy _a _b _c _d Classification, Concepts & Names

14 classify Pile of specimens Classification, Concepts & Names

15 In Linneaus 1758 In Archer 1965 In Tucker 1991 In Pargiter 2003 In Pyle 1990 Aus aus L.1758 (ii) Aus L.1758 Aus bea Archer 1965 (i) Aus L.1758 Aus aus L.1758 Linneaus 1758 In Fry 1989 (iii) Aus L.1758 Aus aus L.1758 Aus bea Archer 1965 Aus cea BFry 1989 Fry 1989 (v) Aus L.1758 Xus beus (Archer) Pargiter 2003. Aus ceus BFry 1989 Xus Pargiter 2003 Pargiter 2003 Aus aus L. 1758 bea and cea noted as invalid names and replaced with beus and ceus. Pyle 1990 Aus aus L.1758 Tucker 1991 (iv) Aus L.1758 Aus cea BFry 1989 Publications of Taxonomic Revisions Publications of Purely Nomenclatural Observation A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989 Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea Pargiter publishes his revision using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus. type specimen genus name Genus concept Species concept species name publication specimen Archer splits Aus aus L. 1758 into two species, retains the name for one and creates a new one Fry splits Aus bea Archer. 1965 into two species, retains the name for one and creates a new one Tucker finds new specimens and combines Aus aus L. 1758 and Aus bea Archer. 1965 into one species, retains the name. Pargiter decides to resplit Aus aus but believes bea(beus) is in a new genus Xus. Taxonomic history of Aus L. 1758

16 Problems with Scientific Names  Often recorded inappropriately in datasets  No author and/or year (e.g. Carya floridana)  Abbreviated (e.g. C. floridana)  Internal code (e.g. PicRub for Picea rubens)  Vernacular used (e.g. Scrub Hickory)  Misspelled  Are not unique  “Re-use” of names with changed definition  Name is ambiguous without definition  Subject to name alterations and 'corrections' over time  (e.g. Code changes its rules)

17 Concepts ……  Full Scientific name + “according to” (Author + Publication + Date) + Definition  Carya floridana Sarg. (1913) “according to” Charles Sprague Sargent, Trees & Shrubs 2:193 plate 177 (1913) [+Definition]  Original concept  1 st use of name as described by the taxonomist  same author + date in scientific name and the “according to”  same publication for original concepts and name  Revised concept  Re-classification of a group  different author + date in “according to”  Carya floridana Sarg. (1913) “according to” Stone FNA 3:424 (1997) [+Definition]  Should be used for communicating about groups of organisms  Full Scientific name + “according to” (Author + Publication + Date)  definition clear – can get the definition  comparing or integrating data based on concepts is more accurate  Can GUIDs help?

18 Concepts  Concepts are are described in many ways  Created by someone - an Author  Described in a Publication  Given a Name  May or may not be valid in terms of the nomenclatural codes  Depending on the taxonomists working practice, defined by  the set of Specimens examined  (type specimens and others)  Common set of Characters  data recorded by taxonomists to describe specimens and taxa  context dependent; differentiate taxa rather than fully describe them;  use natural language with all its ambiguities  Relationships to other Taxon Concepts  Taxon circumscription  the lower level taxa  Congruence, overlap etc to taxa in other classifications

19 Legacy Data …  In legacy data names often appear in place of concepts  Names are imprecise  are inappropriate for referring to information regarding taxon e.g. observational/collection data  BUT…sometimes that’s all we have  How do we interpret names?…..  potentially multiple definitions  the sum of all definitions that exist for the name  would that make any sense – conflicts?  one of the existing definitions  how can we choose?  the “attributes” in common to all the definitions  would that leave any?  represented by the type specimen  but what does that mean? – very subjective…..

20 Legacy Names as Concepts…  Nominal concepts  Sub-set of TaxonConcepts  Name but no AccordingTo  non-unique (concept) identifier elements  can have a unique concept GUID  No definition  Explicitly saying it’s something with this name but not really sure what is/was meant  Encourage people to understand and address the issue of names  Allowing mark-up of collections with names allows people to believe names are really good enough  Important problem - needs to be tackled sooner rather than later  will improve long term usefulness of scientific data  ease integration

21 SEEK Taxon  Build a Name/Concept resolution server  TOS (Kansas)  Taxonomic Concept Schema  TCS (Napier)  Exchange of taxonomic Info  TDWG/GBIF standard  Basis for TOS  GUIDs  GBIF/SEEK etc..  Tools to relate and compare concepts  Taxonomy Comparison Visualisation Tool (Napier)  Concept Mapper Tool (UNC)

22 Concept Comparison Visualisation

23 Taxon Concept Schema  TCS developed to allow exchange of taxonomic names/concept data  Based on consultation with range of users  understand users’ notions of taxonomic concept  what information they consider part of a concept  Presentations at meetings including 2 TDWG  Agreement that concepts are important and necessary  Taxon Names are independent from Taxon concepts  Agreement that observations/identifications etc. should record concepts not names

24 TCS  XML based exchange schema  Not designed as the “correct way” to model a Taxon Concept  No “rules” as to what a taxon must have  certain things needed to be useful  Design to accommodate different ways concepts described  Lots of optionality or flexibility in elements  to address different work practices in the community  Includes Taxon Names  are more constrained as they are governed the codes of nomenclature

25  Considerable debate on what should be top level elements  Related closely to the question  What gets a GUID?  Taxon concepts  Taxon Names  Specimens  Publications  Taxon Relationship Assertions  Concepts refer to Names  Names must not change  Can’t record original taxon concept TCS

26 Exchange of Data  Exchange of definitional data  name definition  information on history of name and type specimen and publication details  taxon concept definition  Name, publication details for the defining source, characters, specimens, related taxa etc  Exchange of usage data  for observations/lists (should only use taxon concepts)  need only exchange references to existing taxon concepts  user readable keys, e.g. Full Scientific name “according to” Author + Publication  GUIDs  for name checking purposes  need only exchange name without history or typification  user readable keys, e.g. Full Scientific name  GUIDs

27 Issues of GUIDs for integration  What gets a GUID?  TCS top level elements??  The “physical thing” or “electronic record of the thing”  What is data and what is metadata associated with the GUID?  Depends on your perspective on life…..  Stability of data associated with a GUID  Who issues GUIDs?  Centralised authority of some sort – peer review??  + One GUID per concept or name (no duplicates)  + ensure business rules are applied to new names/concepts created  - bottleneck?  - too restrictive in what the business rules might be  Distributed free for all  + Anyone can publish their own name/concept and get a GUID  - Mess of GUIDs to sort out  Which technology?  LSIDs, DOI etc.

28 TCS and SEEK and…  Taxon Object Server  Core of concept/name resolution service  Kansas team has been implementing the TOS  Schema based on the TCS model  Tool to import data from TCS documents  EML  Proposed modifications to EML to accommodate SEEK's taxonomic resolution services in the future  User interface tools  Uses cut down TCS as input format  Inform other biology meta-data standards on taxonomic issues  Cataloguing the complete genome standard

29 Taxonomic Object Server  TOS Allows  registration, retrieval, integration of datasets  Matches concepts given names, other concepts and taxonomies  Allow taxonomists to  Author new ideas  Make new relationships between concepts  Allow researchers to  Easily see previous taxonomic opinions  Use a stable identification system to reference concepts (LSIDs)  Find concepts…  Integration with Kepler

30 TOS operations  Via TCS document  addConcept  addRelationship  Public APIs  getConcept –on GUID  getBestConcept – on name string  getHigherTaxon – on GUID and authority – up tree  getAuthoritativeList – down tree  findConcepts – on any property(s)  findRelatedConcepts – on GUID and relationships  getSynonymousNames – returns name strings  getHigherTaxon  getAuthoritativeList  Dictionary for name-concept matching  N-gram matching algorithm  getBestConcept


Download ppt "Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh."

Similar presentations


Ads by Google