Names are not sufficient: the challenge of documenting organism identity R.K. Peet, J.B.Kennedy, and N.M. Franz and The Ecological Society of America Vegetation.

Slides:



Advertisements
Similar presentations
Open repositories: value added services The Socionet example Sergey Parinov, CEMI RAS and euroCRIS.
Advertisements

The VegBank taxonomic datamodel Robert K. Peet Sponsored by: The Ecological Society of America US National Science Foundation Produced at: The National.
What is a Flora? Peter Hovenkamp. What is not a Flora? Labwork/ecology paper Species selection on non-taxonomic criteria No identification tool Character.
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Diana Hernandez Integrating the catalogue of Mexican biota: different approaches for different client perspectives.
Taxonomic data issues: An ecologist’s experience R.K. Peet The University of North Carolina Adapted by J Kennedy.
I: The Lineage of Taxonomic Revisions The taxonomic history of Aus L. 1758, first described by Linnaeus in 1758 (i), is shown through four subsequent revisions.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Making small data big! The Biodiversity Data Journal (BDJ) Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts, Vincent Smith ViBRANT.
Taxonomic Literature Standards and Synergies TDWG 2006 Anna L. Weitzman & Christopher H. C. Lyal.
VegBank.org: a Permanent, Open-Access Archive for Vegetation Plot Data. Michael T. Lee 1, Michael D. Jennings 2, Robert K. Peet 1. Interacting with the.
Integrated Taxonomic Information System Janet Gomon, Deputy Director, ITIS Smithsonian Institution Museum of Natural History The.
SDD: Structured Descriptive Data Gregor Hagedorn (Germany) Bob Morris (USA) Kevin Thiele (Australia)
Vegetation databases Lessons from VegBank, SEEK, TDWG, IAVS, & NCEAS Robert Peet University of North Carolina.
Transition to taxon concepts from a world of legacy data --- R.K. Peet 1, A.S. Weakley 1,2, X. Liu 1,3, & N. Franz 4,5 1 The University of North Carolina.
Plant Systematics databases: Users perspectives Robert K. Peet, University of North Carolina In collaboration with The National Center for Ecological Analysis.
Data Integration Issues in Biodiversity Research Jessie Kennedy Shawn Bowers, Matthew Jones, Josh Madin, Robert Peet, Deana Pennington, Mark Schildhauer,
Long-Term Preservation of Astronomical Research Results Robert Hanisch US National Virtual Observatory Space Telescope Science Institute Baltimore, MD.
Data models for Community information Robert K. Peet, University of North Carolina John Harris, Nat. Center for Ecol. Analysis & Synthesis Michael D. Jennings,
VegBank A vegetation field plot archive Sponsored by: The Ecological Society of America - Vegetation Classification Panel Produced at: The National Center.
EcoInformatics & Vegetation Science. The symposium message Plant community ecology is on the brink of a dramatic transformation that will be made possible.
VegBank and the ESA Cyber-infrastructure for Vegetation Science Robert K. Peet & The Ecological Society of America Vegetation Panel.
North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development.
The VegBank taxonomic datamodel Robert K. Peet Sponsored by: The Ecological Society of America US National Science Foundation Produced at: The National.
Plant names: obstacles and solutions
BIS TDWG Conference 28 October 2013, Florence Documenting data quality in a global network: the challenge for GBIF Éamonn Ó Tuama, Andrea Hahn, Markus.
Taxonomic History of the Imaginary Genus Aus L Jessie Kennedy Napier University.
Introduction to UDDI From: OASIS, Introduction to UDDI: Important Features and Functional Concepts.
Vegetation Plot Management: A National Plots Database Demo Funding: National Science Foundation (DBI ) John Harris - NCEAS Robert K. Peet - University.
Scratchpads Publication Module - A paradigm shift in publishing RBG Kew, Seminar,
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
A new floristic atlas for the Southeast based on taxon concept relationships Robert K. Peet 1, Alan S. Weakley 1,2 & Xianhua Liu 1,3 1 The University of.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
At the frontline of publishing in systematic zoology: A presentation of ZooKeys Lyubomir Penev 1, Terry Erwin 2, Jeremy Miller 3 1 Pensoft Publishers,
Brian J. Enquist Dept. Ecology and Evolutionary Biology University of Arizona, Tucson, A.Z. and The Santa Fe Institute, Santa Fe, N.M. Brian J. Enquist.
GLOBAL BIODIVERSITY INFORMATION FACILITY Cataloging and using Taxonomic Data The Global Names Architecture David Remsen Senior Programme Officer, ECAT.
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
Resolving the publishing bottleneck and increasing data interoperability in biodiversity science Lyubomir Penev, Teodor Georgiev, Pavel Stoev, David Roberts,
Science Environment for Ecological Knowledge Jessie Kennedy School of Computing, Napier University, Edinburgh.
Vegetation Data Management: VegBank Funding: National Science Foundation (DBI ) January 8, 2002 John Harris - NCEAS.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
The VegBank taxonomic datamodel Sponsored by: The Ecological Society of America - Vegetation Classification Panel Produced at: The National Center for.
Collections. Vegetation sampling We observe and collect data on soil.
Ontologies Working Group Agenda MGED3 1.Goals for working group. 2.Primer on ontologies 3.Working group progress 4.Example sample descriptions from different.
Don’t make me think Biodiversity Data Publishing Made Easy Laurence Livermore, Vince Smith, Alice Heaton, Simon Rycroft, Ed Baker, Ben Scott & Lyubomir.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Taxonomic verification: Species 2000 and the Catalogue of Life Frank Bisby.
The VegBank Data Model. Biodiversity data structure Taxonomic database Plot/Inventory database Occurrence database Plot Observation/ Collection Event.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Scratchpads and the new Biodiversity Data Journal Biodiversity Data Publishing made… easier Dimitris Koureas Natural History Museum London.
The challenge of biodiversity: Plot, organism and taxonomic databases Robert K. Peet University of North Carolina The National Plots Database Committee.
Acronym Soup GBIF, TDWG & GUIDs Jerry Cooper. Global Biodiversity Information Facility (GBIF) Established in 2000 through non-binding MOU (25 countries.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Transition to taxon concepts from a world of legacy data --- R.K. Peet 1, A.S. Weakley 1,2, X. Liu 1,3, & N. Franz 4,5 1 The University of North Carolina.
VegBank A vegetation field plot archive Produced at: The National Center for Ecological Analysis and Synthesis Principal Investigators: Robert K. Peet,
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
General Requirements for GUIDs for Taxonomic Names and Concepts Jessie Kennedy.
The challenge of organism identity --- The flora of the Southeast The flora of the Southeast as a case study Robert K. Peet University of North Carolina.
Plazi: Prospects for Markup of Legacy and New Taxonomic Literature Terry Catapano TDWG Fremantle, WA October 21, 2008.
VegBank and the ESA Cyber-infrastructure for Vegetation Science R.K. Peet, Don Faber-Langendoen, Michael Jennings, & Michael Lee Ecological Society of.
The challenge of biodiversity: Plot, organism and taxonomic databases Robert K. Peet University of North Carolina The National Plots Database Committee.
Globally Unique Identifiers: What, why, when, which and what now? Dave Thau University of Kansas
A vision for community involvement and integration Robert K. Peet & Alan S. Weakley Alan S. Weakley.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
Data sharing and exchange: Experiences within the
RCN Development of an Online Database to Enhance the Conservation of SGCN Invertebrates in the Northeastern Region James W. Fetzner Jr. & John.
Taxonomic and Community Classification Resources and Standards
Data Management: The Data Repatriation Re-integration Step or …
Presentation transcript:

Names are not sufficient: the challenge of documenting organism identity R.K. Peet, J.B.Kennedy, and N.M. Franz and The Ecological Society of America Vegetation Panel The SEEK development team

Accurate identification and labelling of organisms is a critical part of collecting, recording and reporting biological data. Increasingly research in biodiversity and ecology is based on the integration (and re-use) of multiple datasets. What was a minor annoyance for a few tens of records becomes intractable when looking at a million records.

The Taxonomic database challenge: Standardizing organisms and communities The problem: Integration of data potentially representing different times, places, investigators and taxonomic standards. The traditional solution: A standard list of organisms / communities.

Carya ovata (Miller) K. Koch Carya carolinae-septentrionalis (Ashe) Engler & Graebner Carya ovata (Miller) K. Koch sec. FNA 1997sec. USDA 2005 Three concepts of shagbark hickory Splitting one species into two illustrates the ambiguity often associated with scientific names.

High-elevation fir trees of western North America AZ NM CO WY MT AB eBC wBC WA OR Abies lasiocarpa var. arizonica Abies lasiocarpa var. lasiocarpa Distribution USDA - ITIS Flora North America Abies bifoliaAbies lasiocarpa

R. plumosa R plumosa v. intermedia R. plumosa v. plumosa R. intermedia R. plumosa v. interrupta R. pineticola R. plumosa R. sp. 1 R. plumosa v. plumosa R. plumosa v. pineticola Multiple concepts of Rhynchospora plumosa s.l. Elliot 1816 Gray 1834 Kral 2003 Peet 2004? Chapman 1860

Aus aus L.1758 (v) Aus L.1758 Xus Pargiter 2003 Xus beus (Archer) Pargiter in Pargiter 2003 (ii) Aus L.1758 Aus bea Archer 1965 in Archer 1965 (i) Aus L.1758 Aus aus L.1758 in Linneaus 1758 Aus bea Archer 1965 Aus cea BFry 1989 (iii) Aus L.1758 in Fry 1989 Aus ceus BFry 1989 Aus aus L A diligent nomenclaturist, Pyle (1990), notes that the species epthithets of Aus bea and Aus cea are of the wrong gender and publishes the corrected names Aus beus corrig. Archer 1965 and Aus ceus corrig. BFry 1989 Tucker publishes his revison without noting Pyle’s corrigendum of the name of Aus cea Pargiter publishes his revison using Pyle’s corrigendum of the epithet bea to beus and Aus cea to Aus ceus. Timeline showing taxonomic history (revisions and nomenclatural changes) pertaining to species comprising the imaginary genus Aus. Aus aus L.1758 in Tucker 1991 (iv) Aus L.1758 Aus cea BFry 1989

Standardized taxon lists fail to allow dataset integration The reasons include: Taxonomic concepts are not defined (just lists), Multiple party perspectives on taxonomic concepts and names cannot be supported or reconciled, The user cannot reconstruct the database as viewed at an arbitrary time in the past. This is the single largest impediment to large-scale synthesis in ecology

NameReferenceConcept Taxonomic theory A taxon concept represents a unique combination of a name and a reference. Report -- name sec reference..

NameConceptUsage A usage represents an association of a concept with a name. The name used in defining the concept need not be the same name used in your work. e.g. Carya alba = Carya tomentosa sec. Gleason & Cronquist Usage can be used to apply multiple name systems to a concept

Data models and data exchange standards Numerous data models incorporate concepts. The IOPI, VegBank, and Taxonomer models are optimized for different uses. SEEK, GBIF, and TDWG, are seeking a consensus model to be voted on August 2005 by TDWG

Relationships among concepts Exactly equal (identification) Congruent, equal (=) Includes (>) Included in (<) Overlaps (><) Disjunct (|)

1. When reporting identity of organisms in publications or data, provide not only the full scientific name of each kind of organism recognized, but also the reference that formed the basis of the taxonomic concept. e.g., Abies lasiocarpa sec. Flora North America Best Practices

2. Reference high quality sources for taxon concepts such as a major compendium that provides its own defined concepts or a source that references the concepts of others. Best Practices

3. Avoid comprehensive, synonymized checklists (e.g. ITIS) as they typically lack true taxonomic descriptions or circumscriptions; then can be considered if they contain taxonomic concepts sufficient for documenting organism identity. Best Practices

4. Identifications for organisms should be by reference to credible, authoritatively published taxonomic concepts, rather than merely references to other identifications. Best Practices

5. Identifications should include linkage to at least one concept, but need not be limited to a single concept. Eg. -- < Potentilla sec. Cronquist ~ Potentilla simplex sec Cronquist ~ Potentilla canadensis sec Cronquist 1991 Best Practices

6. Where appropriate, recorded identifications should be modified by supplemental information. Metadata is good, but is hard to use. Best Practices

7. Use Internet-based taxonomic resources that document concepts only if they archive old versions and enable tracking of concepts time. Best Practices

Step 1: Adoption of minimum standards and best practices by high-quality journals, funding agencies, and professional organizations. Distributed information systems - and the way ahead

Step 2: Creation, availability, and maintenance of databases that document core sets of taxonomic concepts and the relationships of these concepts to each other. The way ahead

Registration system and standard identifiers for names, references, and concepts Essential for data exchange SEEK is in the early design stages for a identifier system and central database.

True concept-based checklists Equivalent of ITIS but with concept documentation and including how other concepts map onto the concepts accepted by the party. Several are operative or in development including EuroMed, IOPI-GPC, Biotics, VegBank. Concept documentation planned for ITIS/USDA.

Step 3: Development and provision of tools to facilitate mark-up of data and manuscripts with taxonomic concepts The way ahead

Step 4: Development and availability of a full information infrastructure to exploit the potential of concept- enriched data and publications for information discover and analysis. The way ahead

Publishers, curators and data managers need to tag taxon interpretations with concepts Precedence exists with tagging literature citations and GenBank accessions Presses are linking scientific names in many ejournals to ITIS (e.g. Evolution, Ecology)

Tools to develop and map concepts Taxonomists need mapping and visualization tools for relating concepts of various authors. SEEK will build prototypes for review and possible adoption. Aggregators need tools for mapping relationships among concepts. Users need tools for entering legacy concepts. Several are in development

Data Set Ecological Data Set Ecological data set providers Concept Provider 1 e.g. Fishbase Concept Provider 3 e.g. Prometheus Concept Provider 2 e.g. ITIS Taxonomic concept providers Taxonomy transfer schema - TML Concept matching/expansion/… Weighted concepts Semantic Mediation System Return list of Data Sets User’s Taxonomic concept + quality measure Name/Concept Repository Ecological metadata language - EML (Containing Collector’s Taxonomic concept(s)) EML repository Taxon coverage SEEK High-Level Approach