Download presentation
Presentation is loading. Please wait.
Published byKathryn Fowler Modified over 9 years ago
1
Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh
2
Geographic SpaceEcological Space occurrence points on native distribution ecological niche modeling Projection back onto geography Native range prediction Invaded range prediction The SEEK Prototype: Ecological Niche Modeling temperature Model of niche in ecological dimensions precipitation Biodiversity information e.g. data from museum specimens, ecological surveys Geospatial and remotely sensed data Results taken to integrate with other data realms (e.g., human populations, public health, etc.)
3
Species prediction map Predicted Distribution: Amur snakehead (Channa argus) Image from http://www.lifemapper.org
4
SEEK Overview Analysis and Modelling System (Kepler) Modelling scientific workflows EcoGrid: Making diverse environmental data systems interoperate Semantic Mediation System: “Smart” data discovery and integration Taxon WG: Taxonomic name/concept resolution server
5
Scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; AMS captures this knowledge
6
Kepler: Ecological Niche Model
7
Metadata driven data ingestion Key information needed to read and machine process a data file is in the metadata Physical descriptors (CSV, Excel, RDBMS, etc.) Logical Entity (table, image, etc) and Attribute (column) descriptions Name Type (integer, float, string, etc.) Codes (missing values, nulls, etc.) Integrity constraints Semantic descriptions (ontology-based type systems)
8
Ecological ontologies What was measured (e.g., biomass) Type of quantity measured (e.g., Energy) Context of measurement (e.g., Psychotria limonensis) How it was measured (e.g., dry weight)
9
Label data with semantic types Label inputs and outputs of analytical components with semantic types Use reasoning engines to generate transformation steps Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components
10
Data integration Homogeneous data integration Integration of homogeneous data via EML metadata is relatively straightforward Heterogeneous Data integration Requires advanced metadata and processing Attributes must be semantically typed Collection protocols must be known Units and measurement scale must be known Measurement relationships must be known e.g., that ArealDensity=Count/Area
11
Life Sciences Data Much of the data gathered in ecological studies and used in ecological data analysis is bio-referenced data typically organisms are referenced by a Latin name Many analyses requires integrating data originating in many locations and at various points in time for most bio-referenced data, integration involves matching on organism name
12
Biological (scientific) Names Used for communicating information about known organisms and groups of organisms – taxa Framework for all biologists to communicate with… Taxonomists apply scientific names to species and higher taxa in their classifications Formalized and validated according to strict codes of nomenclature (different depending on kingdom) Latin name is a polynomial for species and below; monomial for genus and above Quoted as: LatinName NameAuthors Year Example: Carya floridana Sarg. 1913
13
Taxon_concept classify Pile of specimens Genus Species Taxonomic Hierarchy _a _b _c _d Classification, Concepts & Names
14
classify Pile of specimens Classification, Concepts & Names
15
In Linneaus 1758 In Archer 1965 In Tucker 1991 In Pargiter 2003 In Pyle 1990 Aus aus L.1758 (ii) Aus L.1758 Aus bea Archer 1965 (i) Aus L.1758 Aus aus L.1758 Linneaus 1758 In Fry 1989 (iii) Aus L.1758 Aus aus L.1758 Aus bea Archer 1965 Aus cea BFry 1989 Fry 1989 (v) Aus L.1758 Xus beus (Archer) Pargiter 2003. Aus ceus BFry 1989 Xus Pargiter 2003 Pargiter 2003 Aus aus L. 1758 bea and cea noted as invalid names and replaced with beus and ceus. Pyle 1990 Aus aus L.1758 Tucker 1991 (iv) Aus L.1758 Aus cea BFry 1989 Publications of Taxonomic Revisions Publications of Purely Nomenclatural Observation A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989 Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea Pargiter publishes his revision using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus. type specimen genus name Genus concept Species concept species name publication specimen Archer splits Aus aus L. 1758 into two species, retains the name for one and creates a new one Fry splits Aus bea Archer. 1965 into two species, retains the name for one and creates a new one Tucker finds new specimens and combines Aus aus L. 1758 and Aus bea Archer. 1965 into one species, retains the name. Pargiter decides to resplit Aus aus but believes bea(beus) is in a new genus Xus. Taxonomic history of Aus L. 1758
16
Problems with Scientific Names Often recorded inappropriately in datasets No author and/or year (e.g. Carya floridana) Abbreviated (e.g. C. floridana) Internal code (e.g. PicRub for Picea rubens) Vernacular used (e.g. Scrub Hickory) Misspelled Are not unique “Re-use” of names with changed definition Name is ambiguous without definition Subject to name alterations and 'corrections' over time (e.g. Code changes its rules)
17
Concepts …… Full Scientific name + “according to” (Author + Publication + Date) + Definition Carya floridana Sarg. (1913) “according to” Charles Sprague Sargent, Trees & Shrubs 2:193 plate 177 (1913) [+Definition] Original concept 1 st use of name as described by the taxonomist same author + date in scientific name and the “according to” same publication for original concepts and name Revised concept Re-classification of a group different author + date in “according to” Carya floridana Sarg. (1913) “according to” Stone FNA 3:424 (1997) [+Definition] Should be used for communicating about groups of organisms Full Scientific name + “according to” (Author + Publication + Date) definition clear – can get the definition comparing or integrating data based on concepts is more accurate Can GUIDs help?
18
Concepts Concepts are are described in many ways Created by someone - an Author Described in a Publication Given a Name May or may not be valid in terms of the nomenclatural codes Depending on the taxonomists working practice, defined by the set of Specimens examined (type specimens and others) Common set of Characters data recorded by taxonomists to describe specimens and taxa context dependent; differentiate taxa rather than fully describe them; use natural language with all its ambiguities Relationships to other Taxon Concepts Taxon circumscription the lower level taxa Congruence, overlap etc to taxa in other classifications
19
Legacy Data … In legacy data names often appear in place of concepts Names are imprecise are inappropriate for referring to information regarding taxon e.g. observational/collection data BUT…sometimes that’s all we have How do we interpret names?….. potentially multiple definitions the sum of all definitions that exist for the name would that make any sense – conflicts? one of the existing definitions how can we choose? the “attributes” in common to all the definitions would that leave any? represented by the type specimen but what does that mean? – very subjective…..
20
Legacy Names as Concepts… Nominal concepts Sub-set of TaxonConcepts Name but no AccordingTo non-unique (concept) identifier elements can have a unique concept GUID No definition Explicitly saying it’s something with this name but not really sure what is/was meant Encourage people to understand and address the issue of names Allowing mark-up of collections with names allows people to believe names are really good enough Important problem - needs to be tackled sooner rather than later will improve long term usefulness of scientific data ease integration
21
SEEK Taxon Build a Name/Concept resolution server TOS (Kansas) Taxonomic Concept Schema TCS (Napier) Exchange of taxonomic Info TDWG/GBIF standard Basis for TOS GUIDs GBIF/SEEK etc.. Tools to relate and compare concepts Taxonomy Comparison Visualisation Tool (Napier) Concept Mapper Tool (UNC)
22
Concept Comparison Visualisation
23
Taxon Concept Schema TCS developed to allow exchange of taxonomic names/concept data Based on consultation with range of users understand users’ notions of taxonomic concept what information they consider part of a concept Presentations at meetings including 2 TDWG Agreement that concepts are important and necessary Taxon Names are independent from Taxon concepts Agreement that observations/identifications etc. should record concepts not names
24
TCS XML based exchange schema Not designed as the “correct way” to model a Taxon Concept No “rules” as to what a taxon must have certain things needed to be useful Design to accommodate different ways concepts described Lots of optionality or flexibility in elements to address different work practices in the community Includes Taxon Names are more constrained as they are governed the codes of nomenclature
25
Considerable debate on what should be top level elements Related closely to the question What gets a GUID? Taxon concepts Taxon Names Specimens Publications Taxon Relationship Assertions Concepts refer to Names Names must not change Can’t record original taxon concept TCS
26
Exchange of Data Exchange of definitional data name definition information on history of name and type specimen and publication details taxon concept definition Name, publication details for the defining source, characters, specimens, related taxa etc Exchange of usage data for observations/lists (should only use taxon concepts) need only exchange references to existing taxon concepts user readable keys, e.g. Full Scientific name “according to” Author + Publication GUIDs for name checking purposes need only exchange name without history or typification user readable keys, e.g. Full Scientific name GUIDs
27
Issues of GUIDs for integration What gets a GUID? TCS top level elements?? The “physical thing” or “electronic record of the thing” What is data and what is metadata associated with the GUID? Depends on your perspective on life….. Stability of data associated with a GUID Who issues GUIDs? Centralised authority of some sort – peer review?? + One GUID per concept or name (no duplicates) + ensure business rules are applied to new names/concepts created - bottleneck? - too restrictive in what the business rules might be Distributed free for all + Anyone can publish their own name/concept and get a GUID - Mess of GUIDs to sort out Which technology? LSIDs, DOI etc.
28
TCS and SEEK and… Taxon Object Server Core of concept/name resolution service Kansas team has been implementing the TOS Schema based on the TCS model Tool to import data from TCS documents EML Proposed modifications to EML to accommodate SEEK's taxonomic resolution services in the future User interface tools Uses cut down TCS as input format Inform other biology meta-data standards on taxonomic issues Cataloguing the complete genome standard
29
Taxonomic Object Server TOS Allows registration, retrieval, integration of datasets Matches concepts given names, other concepts and taxonomies Allow taxonomists to Author new ideas Make new relationships between concepts Allow researchers to Easily see previous taxonomic opinions Use a stable identification system to reference concepts (LSIDs) Find concepts… Integration with Kepler
30
TOS operations Via TCS document addConcept addRelationship Public APIs getConcept –on GUID getBestConcept – on name string getHigherTaxon – on GUID and authority – up tree getAuthoritativeList – down tree findConcepts – on any property(s) findRelatedConcepts – on GUID and relationships getSynonymousNames – returns name strings getHigherTaxon getAuthoritativeList Dictionary for name-concept matching N-gram matching algorithm getBestConcept
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.