Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh.

Slides:



Advertisements
Similar presentations
Meta Data Larry, Stirling md on data access – data types, domain meta-data discovery Scott, Ohio State – caBIG md driven architecture semantic md Alexander.
Advertisements

Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Forest Markup / Metadata Language FML
The Library of Life Federated Description Services and the Library of Life or What can we do with SDD anyway? Kevin Thiele Centre for Biological Information.
What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
I: The Lineage of Taxonomic Revisions The taxonomic history of Aus L. 1758, first described by Linnaeus in 1758 (i), is shown through four subsequent revisions.
Information Types and Registries Giridhar Manepalli Corporation for National Research Initiatives Strategies for Discovering Online Data BRDI Symposium.
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
Recognition, Identification and Names Spring 2014.
TC3 Meeting in Montreal (Montreal/Secretariat)6 page 1 of 10 Structure and purpose of IEC ISO - IEC Specifications for Document Management.
Plant Systematics databases: Users perspectives Robert K. Peet, University of North Carolina In collaboration with The National Center for Ecological Analysis.
Data Integration Issues in Biodiversity Research Jessie Kennedy Shawn Bowers, Matthew Jones, Josh Madin, Robert Peet, Deana Pennington, Mark Schildhauer,
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Names are not sufficient: the challenge of documenting organism identity R.K. Peet, J.B.Kennedy, and N.M. Franz and The Ecological Society of America Vegetation.
A Beginners Guide to Understanding Taxonomy, Names and Concepts Jessie Kennedy Napier University.
Taxonomic History of the Imaginary Genus Aus L Jessie Kennedy Napier University.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
Of 39 lecture 2: ontology - basics. of 39 ontology a branch of metaphysics relating to the nature and relations of being a particular theory about the.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
Integrating Live Plant Images with Other Types of Biodiversity Records Steve Baskauf Vanderbilt Dept. of Biological Sciences
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Metadata Models in Survey Computing Some Results of MetaNet – WG 2 METIS 2004, Geneva W. Grossmann University of Vienna.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Dimitrios Skoutas Alkis Simitsis
Experience from Mapping Existing Models to the Transfer Schema Robert Kukla.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
The european ITM Task Force data structure F. Imbeaux.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Taxonomic Concept Transfer Schema Robert Kukla. Transfer Schema Taxonomic units of interest? Which details do we need to record? What relationships between.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Object Oriented Multi-Database Systems An Overview of Chapters 4 and 5.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
1 Chapter 1 Introduction to Databases Transparencies.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
Winter 2011SEG Chapter 11 Chapter 1 (Part 1) Review from previous courses Subject 1: The Software Development Process.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
General Requirements for GUIDs for Taxonomic Names and Concepts Jessie Kennedy.
Modeling Security-Relevant Data Semantics Xue Ying Chen Department of Computer Science.
Converting an Existing Taxonomic Data Resource to Employ an Ontology and LSIDS Jessie Kennedy Rob Gales, Robert Kukla.
Where now for the taxon transfer schema and related work: collaboration possibilities? Jessie Kennedy.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
IPT + Darwin Core OBIS XML Schema OBIS Database Schema Explained Mike Flavell OBIS Data Manager OBIS Nodes Training Course, Oostende, Belgium, 6 May 2014.
Building the Corporate Data Warehouse Pindaro Demertzoglou Lally School of Management Data Resource Management.
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Prometheus II: Capturing and Relating Character Concept Definitions in Plant Taxonomy The Biological Problem Concepts describe objects and people invariably.
Jessie Kennedy Rob Gales, Robert Kukla
ece 627 intelligent web: ontology and beyond
Active Data Management in Space 20m DG
Template library tool and Kestrel training
Data Management: Documentation & Metadata
Metadata The metadata contains
Presentation transcript:

Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh

Geographic SpaceEcological Space occurrence points on native distribution ecological niche modeling Projection back onto geography Native range prediction Invaded range prediction The SEEK Prototype: Ecological Niche Modeling temperature Model of niche in ecological dimensions precipitation Biodiversity information e.g. data from museum specimens, ecological surveys Geospatial and remotely sensed data Results taken to integrate with other data realms (e.g., human populations, public health, etc.)

Species prediction map Predicted Distribution: Amur snakehead (Channa argus) Image from

SEEK Overview Analysis and Modelling System (Kepler) Modelling scientific workflows EcoGrid: Making diverse environmental data systems interoperate Semantic Mediation System: “Smart” data discovery and integration Taxon WG: Taxonomic name/concept resolution server

Scientific workflows EML provides semi-automated data binding Scientific workflows represent knowledge about the process; AMS captures this knowledge

Kepler: Ecological Niche Model

Metadata driven data ingestion  Key information needed to read and machine process a data file is in the metadata  Physical descriptors (CSV, Excel, RDBMS, etc.)  Logical Entity (table, image, etc) and Attribute (column) descriptions  Name  Type (integer, float, string, etc.)  Codes (missing values, nulls, etc.)  Integrity constraints  Semantic descriptions (ontology-based type systems)

Ecological ontologies  What was measured (e.g., biomass)  Type of quantity measured (e.g., Energy)  Context of measurement (e.g., Psychotria limonensis)  How it was measured (e.g., dry weight)

 Label data with semantic types  Label inputs and outputs of analytical components with semantic types  Use reasoning engines to generate transformation steps  Use reasoning engine to discover relevant components Semantic Mediation DataOntologyWorkflow Components

Data integration  Homogeneous data integration  Integration of homogeneous data via EML metadata is relatively straightforward  Heterogeneous Data integration  Requires advanced metadata and processing  Attributes must be semantically typed  Collection protocols must be known  Units and measurement scale must be known  Measurement relationships must be known  e.g., that ArealDensity=Count/Area

Life Sciences Data  Much of the data gathered in ecological studies and used in ecological data analysis is bio-referenced data  typically organisms are referenced by a Latin name  Many analyses requires integrating data originating in many locations and at various points in time  for most bio-referenced data, integration involves matching on organism name

Biological (scientific) Names  Used for communicating information about known organisms and groups of organisms – taxa  Framework for all biologists to communicate with…  Taxonomists apply scientific names to species and higher taxa in their classifications  Formalized and validated according to strict codes of nomenclature  (different depending on kingdom)  Latin name is a polynomial for species and below; monomial for genus and above  Quoted as: LatinName NameAuthors Year  Example: Carya floridana Sarg. 1913

Taxon_concept classify Pile of specimens Genus Species Taxonomic Hierarchy _a _b _c _d Classification, Concepts & Names

classify Pile of specimens Classification, Concepts & Names

In Linneaus 1758 In Archer 1965 In Tucker 1991 In Pargiter 2003 In Pyle 1990 Aus aus L.1758 (ii) Aus L.1758 Aus bea Archer 1965 (i) Aus L.1758 Aus aus L.1758 Linneaus 1758 In Fry 1989 (iii) Aus L.1758 Aus aus L.1758 Aus bea Archer 1965 Aus cea BFry 1989 Fry 1989 (v) Aus L.1758 Xus beus (Archer) Pargiter Aus ceus BFry 1989 Xus Pargiter 2003 Pargiter 2003 Aus aus L bea and cea noted as invalid names and replaced with beus and ceus. Pyle 1990 Aus aus L.1758 Tucker 1991 (iv) Aus L.1758 Aus cea BFry 1989 Publications of Taxonomic Revisions Publications of Purely Nomenclatural Observation A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989 Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea Pargiter publishes his revision using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus. type specimen genus name Genus concept Species concept species name publication specimen Archer splits Aus aus L into two species, retains the name for one and creates a new one Fry splits Aus bea Archer into two species, retains the name for one and creates a new one Tucker finds new specimens and combines Aus aus L and Aus bea Archer into one species, retains the name. Pargiter decides to resplit Aus aus but believes bea(beus) is in a new genus Xus. Taxonomic history of Aus L. 1758

Problems with Scientific Names  Often recorded inappropriately in datasets  No author and/or year (e.g. Carya floridana)  Abbreviated (e.g. C. floridana)  Internal code (e.g. PicRub for Picea rubens)  Vernacular used (e.g. Scrub Hickory)  Misspelled  Are not unique  “Re-use” of names with changed definition  Name is ambiguous without definition  Subject to name alterations and 'corrections' over time  (e.g. Code changes its rules)

Concepts ……  Full Scientific name + “according to” (Author + Publication + Date) + Definition  Carya floridana Sarg. (1913) “according to” Charles Sprague Sargent, Trees & Shrubs 2:193 plate 177 (1913) [+Definition]  Original concept  1 st use of name as described by the taxonomist  same author + date in scientific name and the “according to”  same publication for original concepts and name  Revised concept  Re-classification of a group  different author + date in “according to”  Carya floridana Sarg. (1913) “according to” Stone FNA 3:424 (1997) [+Definition]  Should be used for communicating about groups of organisms  Full Scientific name + “according to” (Author + Publication + Date)  definition clear – can get the definition  comparing or integrating data based on concepts is more accurate  Can GUIDs help?

Concepts  Concepts are are described in many ways  Created by someone - an Author  Described in a Publication  Given a Name  May or may not be valid in terms of the nomenclatural codes  Depending on the taxonomists working practice, defined by  the set of Specimens examined  (type specimens and others)  Common set of Characters  data recorded by taxonomists to describe specimens and taxa  context dependent; differentiate taxa rather than fully describe them;  use natural language with all its ambiguities  Relationships to other Taxon Concepts  Taxon circumscription  the lower level taxa  Congruence, overlap etc to taxa in other classifications

Legacy Data …  In legacy data names often appear in place of concepts  Names are imprecise  are inappropriate for referring to information regarding taxon e.g. observational/collection data  BUT…sometimes that’s all we have  How do we interpret names?…..  potentially multiple definitions  the sum of all definitions that exist for the name  would that make any sense – conflicts?  one of the existing definitions  how can we choose?  the “attributes” in common to all the definitions  would that leave any?  represented by the type specimen  but what does that mean? – very subjective…..

Legacy Names as Concepts…  Nominal concepts  Sub-set of TaxonConcepts  Name but no AccordingTo  non-unique (concept) identifier elements  can have a unique concept GUID  No definition  Explicitly saying it’s something with this name but not really sure what is/was meant  Encourage people to understand and address the issue of names  Allowing mark-up of collections with names allows people to believe names are really good enough  Important problem - needs to be tackled sooner rather than later  will improve long term usefulness of scientific data  ease integration

SEEK Taxon  Build a Name/Concept resolution server  TOS (Kansas)  Taxonomic Concept Schema  TCS (Napier)  Exchange of taxonomic Info  TDWG/GBIF standard  Basis for TOS  GUIDs  GBIF/SEEK etc..  Tools to relate and compare concepts  Taxonomy Comparison Visualisation Tool (Napier)  Concept Mapper Tool (UNC)

Concept Comparison Visualisation

Taxon Concept Schema  TCS developed to allow exchange of taxonomic names/concept data  Based on consultation with range of users  understand users’ notions of taxonomic concept  what information they consider part of a concept  Presentations at meetings including 2 TDWG  Agreement that concepts are important and necessary  Taxon Names are independent from Taxon concepts  Agreement that observations/identifications etc. should record concepts not names

TCS  XML based exchange schema  Not designed as the “correct way” to model a Taxon Concept  No “rules” as to what a taxon must have  certain things needed to be useful  Design to accommodate different ways concepts described  Lots of optionality or flexibility in elements  to address different work practices in the community  Includes Taxon Names  are more constrained as they are governed the codes of nomenclature

 Considerable debate on what should be top level elements  Related closely to the question  What gets a GUID?  Taxon concepts  Taxon Names  Specimens  Publications  Taxon Relationship Assertions  Concepts refer to Names  Names must not change  Can’t record original taxon concept TCS

Exchange of Data  Exchange of definitional data  name definition  information on history of name and type specimen and publication details  taxon concept definition  Name, publication details for the defining source, characters, specimens, related taxa etc  Exchange of usage data  for observations/lists (should only use taxon concepts)  need only exchange references to existing taxon concepts  user readable keys, e.g. Full Scientific name “according to” Author + Publication  GUIDs  for name checking purposes  need only exchange name without history or typification  user readable keys, e.g. Full Scientific name  GUIDs

Issues of GUIDs for integration  What gets a GUID?  TCS top level elements??  The “physical thing” or “electronic record of the thing”  What is data and what is metadata associated with the GUID?  Depends on your perspective on life…..  Stability of data associated with a GUID  Who issues GUIDs?  Centralised authority of some sort – peer review??  + One GUID per concept or name (no duplicates)  + ensure business rules are applied to new names/concepts created  - bottleneck?  - too restrictive in what the business rules might be  Distributed free for all  + Anyone can publish their own name/concept and get a GUID  - Mess of GUIDs to sort out  Which technology?  LSIDs, DOI etc.

TCS and SEEK and…  Taxon Object Server  Core of concept/name resolution service  Kansas team has been implementing the TOS  Schema based on the TCS model  Tool to import data from TCS documents  EML  Proposed modifications to EML to accommodate SEEK's taxonomic resolution services in the future  User interface tools  Uses cut down TCS as input format  Inform other biology meta-data standards on taxonomic issues  Cataloguing the complete genome standard

Taxonomic Object Server  TOS Allows  registration, retrieval, integration of datasets  Matches concepts given names, other concepts and taxonomies  Allow taxonomists to  Author new ideas  Make new relationships between concepts  Allow researchers to  Easily see previous taxonomic opinions  Use a stable identification system to reference concepts (LSIDs)  Find concepts…  Integration with Kepler

TOS operations  Via TCS document  addConcept  addRelationship  Public APIs  getConcept –on GUID  getBestConcept – on name string  getHigherTaxon – on GUID and authority – up tree  getAuthoritativeList – down tree  findConcepts – on any property(s)  findRelatedConcepts – on GUID and relationships  getSynonymousNames – returns name strings  getHigherTaxon  getAuthoritativeList  Dictionary for name-concept matching  N-gram matching algorithm  getBestConcept