Download presentation
Presentation is loading. Please wait.
1
Data Integration Issues in Biodiversity Research Jessie Kennedy Shawn Bowers, Matthew Jones, Josh Madin, Robert Peet, Deana Pennington, Mark Schildhauer, Aimee Stewart
2
Visual Tools for Managing Taxonomic Concepts SEEK Science Environment for Ecological Knowledge Research and develop information technology to radically improve the type and scale of ecological science that can be addressed
3
Visual Tools for Managing Taxonomic Concepts Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Geography Ecology Science and Scientific Data are Complex
4
Visual Tools for Managing Taxonomic Concepts Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Ecology Geography Organism Name Taxon concept Gene sequence Pathway Protein Location Temperature Depth
5
Visual Tools for Managing Taxonomic Concepts Individual Scientist Small Scientific Community Large Scientific Community Scientific Laboraotory Scientific Community: complex
6
Visual Tools for Managing Taxonomic Concepts Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Ecology Geography Organism Name Taxon concept Gene sequence Pathway Protein Location Temperature Depth Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Ecology Geography Organism Name Taxon concept Gene sequence Pathway Protein Location Temperature Depth Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Ecology Geography Organism Name Taxon concept Gene sequence Pathway Protein Location Temperature Depth Biochemistry Climatology Taxonomy Meteorology Nomenclature Paleontology Genomics Proteomics Hydrology Morphology Geology Oceanography Ecology Geography Organism Name Taxon concept Gene sequence Pathway Protein Location Temperature Depth
7
Visual Tools for Managing Taxonomic Concepts Science & Scientific Data are Continually Changing Conclusions become foundations for new hypotheses New experiments invalidate existing knowledge Knowledge is open to interpretation Different opinions Need to build this into our technological solutions observation experiment hypothesis conclusion
8
Visual Tools for Managing Taxonomic Concepts Exploiting Scientific Data To support scientists in Discovery Access Sharing Integration/Linking Analysis Scientists can then improve their potential for new scientific discovery
9
Visual Tools for Managing Taxonomic Concepts Data Integration/Linking: approaches Metadata to describe the data sets and know how to interpret the data sets Ontologies to define the terminology used and know how data might be related and to aid automatic transformation of the data Standardisation of formats for exchange of data + to ease integration LSIDs to uniquely identify things; know when 2 things are the same Workflows to enable specification, refinement and repetition of integration/analysis Provenance of data to record where the data has come from and what has happened to it en route.
10
Visual Tools for Managing Taxonomic Concepts Projects in most sciences: ESG
11
Visual Tools for Managing Taxonomic Concepts Ecological Science - Analysis Ecological niche modeling of species distributions Where do species occur now? Image from http://www.lifemapper.org Where will they occur in the future?
12
Visual Tools for Managing Taxonomic Concepts Ecological Niche Modeling Environmental Characteristics from gridded GIS layers Known Species Locations Temperature layer Many other layers Environmental Change Prediction Future Scenarios Of Environmental Characteristics Invasion Area Prediction Environmental Characteristics Of Different Geographic Area Native Distribution Prediction Environmental Characteristics Of Surrounding Geographic Area Develop Model Multidimensional Ecological Space D 1 = Temperature D2D2 DnDn
13
Visual Tools for Managing Taxonomic Concepts Sources of Scientific Data Data are massively dispersed Ecological field stations and research centers (100’s) Natural history museums and biocollection facilities (100’s) Agency data collections (10’s to 100’s) Individual scientists (1000’s) Data are heterogeneous Syntax (format) Schema (model) Semantics (meaning)
14
Visual Tools for Managing Taxonomic Concepts Challenge: Data Integration
15
Visual Tools for Managing Taxonomic Concepts SEEK Components
16
Visual Tools for Managing Taxonomic Concepts Semantic Annotation – SEEK ontologies Integration/merge Concept mapping Units conversion Spatial & temporal scaling Data discovery Finding relevant data sets Understanding data set content
17
Visual Tools for Managing Taxonomic Concepts Smart (Data) Integration: Merge Discover data of interest … connect to merge actor … “compute merge”
18
Visual Tools for Managing Taxonomic Concepts Smart Merge … Semantic type annotations and ontology definitions used to find mappings between sources Executing the merge actor results in an integrated data product (via “outer union”) a1 a2 a3 a4 a 5 10 b 6 11 a1 a2 a3 a4 a 5 10 b 6 11 a5 a6 a7 a8 0.1 a 0.2 c 0.3 d a5 a6 a7 a8 0.1 a 0.2 c 0.3 d a3a3 a6a6 a1a1 a8a8 a4a4 Merge a1a8 a3a6 a4 Biomass Site a1 a3 a4 a 5.0 10 b 6.0 11 a 0.1 c 0.2 d 0.3 a1 a3 a4 a 5.0 10 b 6.0 11 a 0.1 c 0.2 d 0.3 Merge Result
19
Visual Tools for Managing Taxonomic Concepts Challenges of Taxonomic Data Scientific names change in meaning over time + geographical region conclusions being drawn from analysis of data integrated on names.
20
Visual Tools for Managing Taxonomic Concepts Flora North America SubAlpine Fir USDA Plants & ITIS Abies lasiocarpa Abies bifolia Abies lasiocarpa var. arizonica var. lasiocarpa What is Abies lasiocarpa?
21
Visual Tools for Managing Taxonomic Concepts Aus L.1758 Aus aus L.1758 Linneaus 1758 Aus aus L.1758 Tucker 1991 Aus L.1758 Aus cea BFry 1989 Aus aus L.1758 Aus L.1758 Aus bea Archer 1965 Aus aus L.1758 Aus L.1758 Aus bea Archer 1965 Aus cea BFry 1989 Fry 1989 Aus L.1758 Xus beus (Archer) Pargiter 2003. Aus ceus BFry 1989 (vi) Xus Pargiter 2003 Pargiter 2003 Aus aus L. 1758 Changes in meaning of names Aus bea and Aus cea noted as invalid names and replaced with Aus beus and Aus ceus. Pyle 1990 5 Revisions of Aus 1 name spelling change Taxonomic history of imaginary genus Aus L. 1758
22
Visual Tools for Managing Taxonomic Concepts Aus L.1758 Aus bea Archer 1965 Aus aus L.1758 Archer 1965 Aus L.1758 Aus aus L.1758 Linneaus 1758 Aus aus L.1758 Aus L.1758 Xus beus (Archer) Pargiter 2003. Aus ceus BFry 1989 (vi) Xus Pargiter 2003 Pargiter 2003 Aus aus L. 1758 Aus bea and Aus cea noted as invalid names and replaced with Aus beus and Aus ceus. Aus aus L.1758 Tucker 1991 Aus L.1758 Aus cea BFry 1989 Aus L.1758 Aus bea Archer 1965 Aus cea BFry 1989 Fry 1989 Changes in meaning of names Pyle 1990 8 Names 2 genus 6 species
23
N4 - Aus beus Archer 1965 N1 - Aus aus L.1758 N1 C1.5 C1.4 C1.3 C1.2 C1.1 C1.1 - Aus aus L.1758 sec. Linneaeus 1758 C1.2 - Aus aus L.1758 sec. Archer 1965 C1.3 - Aus aus L.1758 sec. Fry 1989 C1.4 - Aus aus L.1758 sec. Tucker 1991 C1.5 - Aus aus L.1758 sec. Pargiter 2003 N2 - Aus bea Archer 1965 N5 C5.5 N5 - Aus ceus Fry 1989 C5.5 - Aus ceus Fry 1989 sec. Fry 1989 C6.5 N6 N6 - Xus beus Pargiter 2003 C6.6 - Xus beus Pargiter 2003 sec. Pargiter 2003 N2 C2.3 C2.2 C2.2 - Aus bea Archer 1965 sec. Archer 1965 C2.3 - Aus bea Archer 1965 sec. Fry 1989 N3 N4 C3.4 C3.3 N3 - Aus cea Fry 1989 C3.3 - Aus cea Fry 1989 sec. Fry 1989 C3.4 - Aus cea Fry 1989 sec. Tucker 1991 N0 - Aus L.1758 N0 C0.5 C0.4 C0.3 C0.2 C0.1 C0.1 - Aus L.1758 sec. Linneaeus 1758 C0.2 - Aus L.1758 sec. Archer 1965 C0.3 - Aus L.1758 sec. Fry 1989 C0.4 - Aus L.1758 sec. Tucker 1991 C0.5 - Aus L.1758 sec. Pargiter 2003 C7.5 N7 N7 - Xus Pargiter 2003 C7.6 - Xus Pargiter 2003 sec. Pargiter 2003 8 Names 17 Concepts Each name has many concepts or meanings
24
Visual Tools for Managing Taxonomic Concepts Find data sets containing Aus aus Many possible interpretations of Aus aus (N1) Original concept: C1.1 Most recent concept: C1.5 Preferred Authority (e.g. Fry 1989): C1.3 Everything ever named N1: Union(C1.1,C1.2,C1.3,C1.4,C1.5) Best fit according to some matching algorithm Best(C1.1,C1.2,C1.3,C1.4,C1.5) New concept containing only those features common to all concepts with the name N1: Intersection(C1.1,C1.2,C1.3,C1.4,C1.5) Is it appropriate to link or merge data sets returned on the scientific names? Depends on the user’s purpose Level of precision required N1 - Aus aus L.1758 N1 C1.5 C1.4 C1.3 C1.2 C1.1
25
Visual Tools for Managing Taxonomic Concepts C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 Information from literature on synonymy Taxonomists record which names their concepts are synonymous with and any name changes Parent child relationships in 5 revisions Names for each of the concepts
26
Visual Tools for Managing Taxonomic Concepts Find data sets with Aus aus (N1) C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 N1 C1.1 C1.2C1.3 C1.5 C1.4 N1
27
Visual Tools for Managing Taxonomic Concepts Find data sets with Aus aus (N1) C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 N1 N2 C1.1 C1.2 C2.2 C1.3 C2.3 C1.5 C1.4 N1
28
Visual Tools for Managing Taxonomic Concepts Find data sets with Aus aus (N1) C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 N1 N2 C1.1 C1.2 C2.2 C1.3 C2.3 C1.5 C1.4C3.4C3.3 N1 N2 N3
29
Visual Tools for Managing Taxonomic Concepts Find data sets with Aus aus (N1) C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 N1 N2 C1.1 C1.2 C2.2 C1.3 C2.3 C1.5 C1.4C3.4C3.3 C6.5 N6 N3 N4 N1 N2
30
Visual Tools for Managing Taxonomic Concepts Find data sets with Aus aus (N1) C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N0 N7 N1 N2 N5 N6 N3 N4 N1 N2 C1.1 C1.2 C2.2 C1.3 C2.3 C1.5C5.5 C1.4C3.4C3.3 C6.5 N5 N6 N3 N4 N1 N2 N3 Results in everything returned for Aus aus by traversing the synonymy and name links
31
Visual Tools for Managing Taxonomic Concepts C1.5C5.5 C0.5 C1.4C3.4 C0.4 C1.1 C0.1 C1.2 C2.2 C0.2 C1.3 C2.3 C3.3 C0.3 C6.5 C7.5 N1 N5 N6 N2 N3 N4 N0 N7 == Information to improve data sets returned Minimally what we need are set relationships from concepts in any taxonomy to earlier concepts and name changes related to earlier names We can build systems to return data suit for purpose
32
Visual Tools for Managing Taxonomic Concepts Real Biological Taxonomies Larger and change more frequently than the Aus example German mosses 14 classifications in 73 years covering 1548 taxa only 35% thought to be stable concepts 65% of names used in legacy data sets are ambiguous Taxonomic Revisions of genus Alteromonas 34 years: from 1972 to 2006 At the species level 18 “emendations” 19 species reassigned to 4 genera 3 new combinations 6 synonyms 2 species to subspecies 2 subspecies to species 21 new species
33
Visual Tools for Managing Taxonomic Concepts SEEK Taxon Approach Use Taxon Concepts for referring to organisms Aus aus L. 1758 sec. Tucker 1991 Abies lasiocarpa (Hook) Nutt. sec FNA 1997 Taxon Concept/Name Resolution International data exchange schema TCS (Taxonomic Concept Schema) Concept Repository and Resolution web service Linked to Kepler workflow system Globally unique identifiers (LSIDs) Visualization software for comparing Taxonomies and Asserting Concept Relationships
34
Visual Tools for Managing Taxonomic Concepts Taxon Object Server Mammal Species of the World Taxonomic Literature Taxonomic Data Providers TOS SEEK Cache Database to TCS Mapping Tool Concept Extraction Tool TCS Concept Mapper
35
Visual Tools for Managing Taxonomic Concepts Taxonomic Object Service: SEEK Concept Mapper http://seek.nhm.ku.edu/TaxObjServ/services TCS Find All Concepts Get Synonymous Concepts Get Best Concept TOS SEEK Cache LSID Authority Morpho Data Analysis EML Datasets Identify species EML(TCS) Mark up datasets
36
Visual Tools for Managing Taxonomic Concepts Recap… Re-emphasised the problems with Taxonomic Names not good identifiers for organisms problem extends to most areas characters, countries, habitats, vegetation types, genes….. Shown that Taxonomic concepts are better for referring to organisms, specimens, observations… but Need better systems for resolving taxonomic names/concepts Which require better information
37
Visual Tools for Managing Taxonomic Concepts Provide better tools for users To help taxonomists create better quality data Better access to reference/legacy data Explore differences/similarities in existing taxonomies To create relationships between concepts Improved data can be made available to the general biology community for incorporating into bio-referenced databases. To help end users understand and use the data and its limitations Biologists can use tools to understand the impact of using particular data on their analysis
38
Visual Tools for Managing Taxonomic Concepts Conclusion Science is complex (and therefore split into specialisms) Identify the overlaps/linkages in the different domains Need useful approximations of things to simplify linked domain Need to understand the approximations or linking points well Support re-composition, linking or building on the components Science is inherently changing Science is full of legacy data Today’s scientific research is tomorrow’s legacy data Track the changes in the data know when components or links have changed Provide long-term persistent storage Any published scientific discovery should store the data as evidence Data needs to be accurately annotated Sufficient to repeat analyses to test hypotheses
39
Visual Tools for Managing Taxonomic Concepts Acknowledgements Colleagues on the SEEK project NSF and EPSRC funding e-Science Centre funding Colleagues in TDWG
40
Thank You Questions…
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.