North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development.

Slides:



Advertisements
Similar presentations
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Advertisements

The VegBank taxonomic datamodel Robert K. Peet Sponsored by: The Ecological Society of America US National Science Foundation Produced at: The National.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Education, Outreach and Training. Specifications Document Overall objective: Better integration of ecoinformatics, in general, and SEEK tools, specifically,
OVERVIEW OF DATA FLOW IN NVC PROCESS Field sheets NVC Proceedings.
Taxonomic data issues: An ecologist’s experience R.K. Peet The University of North Carolina Adapted by J Kennedy.
Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
VegBank.org: a Permanent, Open-Access Archive for Vegetation Plot Data. Michael T. Lee 1, Michael D. Jennings 2, Robert K. Peet 1. Interacting with the.
Vegetation databases Lessons from VegBank, SEEK, TDWG, IAVS, & NCEAS Robert Peet University of North Carolina.
Transition to taxon concepts from a world of legacy data --- R.K. Peet 1, A.S. Weakley 1,2, X. Liu 1,3, & N. Franz 4,5 1 The University of North Carolina.
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Plant Systematics databases: Users perspectives Robert K. Peet, University of North Carolina In collaboration with The National Center for Ecological Analysis.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Names are not sufficient: the challenge of documenting organism identity R.K. Peet, J.B.Kennedy, and N.M. Franz and The Ecological Society of America Vegetation.
Improving Restoration Using CVS-Designed Web-Based Tools 7 October 2009 M. Forbes Boyle University of North Carolina, Chapel Hill.
Data models for Community information Robert K. Peet, University of North Carolina John Harris, Nat. Center for Ecol. Analysis & Synthesis Michael D. Jennings,
VegBank A vegetation field plot archive Sponsored by: The Ecological Society of America - Vegetation Classification Panel Produced at: The National Center.
EcoInformatics & Vegetation Science. The symposium message Plant community ecology is on the brink of a dramatic transformation that will be made possible.
VegBank and the ESA Cyber-infrastructure for Vegetation Science Robert K. Peet & The Ecological Society of America Vegetation Panel.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
The VegBank taxonomic datamodel Robert K. Peet Sponsored by: The Ecological Society of America US National Science Foundation Produced at: The National.
Vegetation Plot Management: A National Plots Database Demo Funding: National Science Foundation (DBI ) John Harris - NCEAS Robert K. Peet - University.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Use case lessons: Components of the SEEK architecture Robert K. Peet University of North Carolina.
A new floristic atlas for the Southeast based on taxon concept relationships Robert K. Peet 1, Alan S. Weakley 1,2 & Xianhua Liu 1,3 1 The University of.
Indexing the Species Names of the World - for the World Frank Bisby (Species 2000), Michael Ruggiero (ITIS) Per de Place Bjørn (GBIF - ECAT)
[] Where Did Those GBIF Occurrences Come From? Providing Digital Access to NatureServe's Reference Database: Report on a Project in the Early Stages of.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Synopsis of current BIEN and Enquist projects managed by Martha iPlant 2014.
Overview of progress in Ecoinformatics Susan Wiser Landcare Research, Lincoln New Zealand.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
Introducing Australia’s Terrestrial Ecosystem Research Network: linking disciplines for better environmental outcomes. Nikki Thurgate.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
New Zealand’s National Vegetation Survey Databank: improving access and interoperability Susan Wiser, Jerry Cooper, Nick Spencer & Larry Burrows Landcare.
Vegetation Data Management: VegBank Funding: National Science Foundation (DBI ) January 8, 2002 John Harris - NCEAS.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
The SEEK EcoGrid: A Data Grid System for Ecology Arcot Rajasekar Matthew Jones Bertram Ludäscher
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
The VegBank taxonomic datamodel Sponsored by: The Ecological Society of America - Vegetation Classification Panel Produced at: The National Center for.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Collections. Vegetation sampling We observe and collect data on soil.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Multi-institutional collaborative program. Established in 1988 to document the composition and status of natural vegetation of the Carolinas. Provides.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
The VegBank Data Model. Biodiversity data structure Taxonomic database Plot/Inventory database Occurrence database Plot Observation/ Collection Event.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
The challenge of biodiversity: Plot, organism and taxonomic databases Robert K. Peet University of North Carolina The National Plots Database Committee.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Transition to taxon concepts from a world of legacy data --- R.K. Peet 1, A.S. Weakley 1,2, X. Liu 1,3, & N. Franz 4,5 1 The University of North Carolina.
VegBank A vegetation field plot archive Produced at: The National Center for Ecological Analysis and Synthesis Principal Investigators: Robert K. Peet,
System Development & Operations NSF DataNet site visit to MIT February 8, /8/20101NSF Site Visit to MIT DataSpace DataSpace.
Riccardi: DIALOGUE Workshop August 1, 2005 Supported by NSF BDI 1 Representing and Using Phylogenetic Characters in Morphbank Greg Riccardi, David Gaitros,
The challenge of organism identity --- The flora of the Southeast The flora of the Southeast as a case study Robert K. Peet University of North Carolina.
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
VegBank and the ESA Cyber-infrastructure for Vegetation Science R.K. Peet, Don Faber-Langendoen, Michael Jennings, & Michael Lee Ecological Society of.
The challenge of biodiversity: Plot, organism and taxonomic databases Robert K. Peet University of North Carolina The National Plots Database Committee.
A vision for community involvement and integration Robert K. Peet & Alan S. Weakley Alan S. Weakley.
VegBank A vegetation field plot archive Produced at: The National Center for Ecological Analysis and Synthesis Principal Investigators: Robert K. Peet,
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Data sharing and exchange: Experiences within the
Flanders Marine Institute (VLIZ)
Taxonomic and Community Classification Resources and Standards
Presentation transcript:

North American initiatives in Ecoinformatics: Vegbank and SEEK Robert K. Peet and The Ecological Society of America Vegetation Panel The SEEK development team

Case Studies

Native Exotic Upland (1090 plots) Riparian (121 plots) (268 plots with exotics) (110 plots with exotics) Mean Species Richness Kruskal-Wallis: Native Richness Chisq = 353.2, df = 1, P < Exotic Richness Chisq = 127.7, df = 1, P <

Type A: “plots that keep on giving” Unusual: supply-side driven, influenced little by competition Unusual: supply-side driven, influenced little by competition Little spatial structure; disturbance important Little spatial structure; disturbance important ( ↓ competitive exclusion) ( ↑ species pool)

Traditional Community Ecology The questions: How are communities structured? How do taxa interact? The solutions : Simple observations. Simple experiments. The scale: Stand or landscape.

Major data types Site data: climate, soils, topography, etc. Taxon attribute data: identification, phylogeny, distribution, life-history, functional attributes, etc. Occurrence data: attributes of individuals (e.g., size, age, growth rate) and taxa (e.g., cover, biomass) that co-occur at a site.

EcoInformatics ? Massive plot data have the potential to create new disciplines and allow critical syntheses. Theoretical community ecology. Which taxa occur together, and where, and following what rules? Remote sensing. What is really on the ground? Monitoring. What changes are really taking place in the vegetation? Restoration. What should be our restoration targets? Vegetation & species modeling. Where should we expect species & communities to occur after environmental changes?

Conclusions? Standard data structures Standards for data exchange Public data archives (functions for deposit, discovery, withdrawal, citation, annotation) Standards for data archiving Standards for reference to taxonomic data Standard software tools

The ESA Vegetation Classification Panel was established in 1993 with a mandate to support the emerging U.S. Vegetation Classification. Background

I am pleased to acknowledge the support and cooperation of: Ecological Society of America Gap Analysis Program National Center for Ecological Analysis and Synthesis National Biological Information Infrastructure Federal Geographic Data Committee National Science Foundation

Requirements for vegetation field plots. Documentation & description of floristic types. Submission & peer review of proposed types. Management, citation, & archiving of vegetation data. Guidelines for Vegetation Classification The ESA Vegetation Panel and its partners have collaborated to develop guidelines for the floristic levels of the classification covering:

Overview of online resources Stores plots and makes them publicly accessible Stores current communities in the NVC Stores current plant taxonomy Allows people to change and update NVC and plants vegbank.org natureserve.org plants.usda.gov TBA

VegBank The ESA Vegetation Panel is developing a public archive for vegetation plots known as VegBank ( VegBank is expected to function for vegetation plot data in a manner analogous to GenBank. Primary data will be deposited for reference, novel synthesis, and reanalysis. The database architecture is generalizable to most types of species co-occurrence data.

VegBank data are open access All data placed in VegBank are available to the public at no charge (unless the plot contributor places restrictions to protect location information for rare and endangered species or private lands). Key data can be viewed by a simple web link. The following link shows information for two VegBank plots:

Biodiversity data structure Taxonomic databases Plot/Inventory databases Specimen databases Observation/Collection Event Object or specimen BioTaxon Locality SynTaxon Community type databases

Project Plot Observation Taxon / Individual Observation Taxon Interpretation Plot Interpretation Core elements of VegBank

Taxon/community interpretation Multiple concepts can be linked simultaneously Degree of fit for each can be indicated Subsequent interpretations supported.

VegBank Interface Tools Desktop client (VegBranch) for data preparation and local use. Flexible XML data import supporting VegBranch (& TurboVeg) formats. Flexible data export. Easy web access to central archive

VegBranch can be used for converting legacy data, entering data, and maintaining a local plot database.

Challenges Data ownership, intellectual property rights, & confidentiality Multiple classifications of organsms and communities Multiple plot types (relevés & Hubbell plots) Data entry & submission tools Perfect archiving Plot and taxon interpretation

The Taxonomic database challenge: Standardizing organisms and communities The problem: Integration of data potentially representing different times, places, investigators and taxonomic standards. The traditional solution: A standard list of organisms / communities.

Standardized taxon lists fail to allow dataset integration The reasons include: The user cannot reconstruct the database as viewed at an arbitrary time in the past, Taxonomic concepts are not defined (just lists), Multiple party perspectives on taxonomic concepts and names cannot be supported or reconciled. This is the single largest impediment to large-scale synthesis in ecology

High-elevation fir trees of western North America AZ NM CO WY MT AB eBC wBC WA OR Abies lasiocarpa var. arizonica Abies lasiocarpa var. lasiocarpa Distribution USDA - ITIS Flora North America Abies bifoliaAbies lasiocarpa

Carya ovata (Miller)K. Koch Carya carolinae-sept. (Ashe) Engler & Graebner Carya ovata (Miller)K. Koch sec. Gleason 1952 sec. Radford et al Three concepts of shagbark hickory Splitting one species into two illustrates the ambiguity often associated with scientific names. If you encounter the name “Carya ovata (Miller) K. Koch” in a database, you cannot be sure which of two meanings applies.

Names Carya ovata Carya carolinae-septentrionalis Carya ovata v. ovata Carya ovata v. australis Taxon concepts (One shagbark) C. ovata sec Gleason ’52 C. ovata sec FNA ‘97 (Southern shagbark) C. carolinae-s. sec Radford ‘68 C. ovata v. australis sec FNA ‘97 (Northern shagbark) C. ovata sec Radford ‘68 C. ovata (v. ovata) sec FNA ‘97 References Gleason 1952 Britton & Brown Radford et al Flora Carolinas Stone 1997 Flora North America Six shagbark hickory assertions Possible taxonomic synonyms are listed together

Party Perspective The Party Perspective on a Concept includes: Status – Standard, Nonstandard, Undetermined Correlation with other concepts – Equal, Greater, Lesser, Overlap, Undetermined. Start & Stop dates.

VegBank is populating USDA concepts & relationships Reference list: –USDA PLANTS / ITIS –1999, 2005 Standard treatments –Flora North America (8 volumes) –Isley – Legumes –Rollins – Brassicaceae –Selected treatments

Best practices When reporting identity of organisms, provide not only the full scientific name of each kind of organism, but also the reference that formed the basis of the taxonomic concept. Reference high quality sources for taxon concepts such as major compendia that provide their own defined concepts. Comprehensive checklists typically lack true taxonomic circumscriptions, but might be considered to contain taxonomic concepts sufficient for documenting organism identity. Identifications should include linkage to at least one concept, but in some cases should be linked to multiple concepts.

NatureServe provides access to the NVC and supporting documentation

Simple searches allow information on communities to be located.

Key descriptive data are available online, but the classification process is not yet open to the full scientific community.

Coming soon – direct links to views of typal and occurrence plots in VegBank

The ESA Panel and VegBank staff are developing an open peer-review system to allow anyone to contribute proposed revisions for the NVC.

The results of the peer-review process will be published in an online journal linked to VegBank

Concluding remarks Much of what we are doing with the US National Vegetation Classification is common to the vegetation classification enterprise worldwide, but much is also novel. Public plot archives, initially driven by the classification enterprise, have the potential to radically change the development of ecology and biodiversity management in general.

Highlights from the Science Environment for Ecological Knowledge (SEEK)

What is SEEK ? Science Environment for Ecological Knowledge Multidisciplinary project to create: Scientific-workflow system (Kepler) –Design, reuse, and execute scientific analyses Distributed data network (EcoGrid) –Environmental, ecological, and systematics data KR & Semantic Mediation –Discover, integrate, and compose hard-to-relate data and services via ontologies Taxonomic concept services –Resolve taxon ambiguities Collaborators (the SEEK team) NCEAS, UNM, SDSC/UCSD, U Kansas Vermont, Napier, ASU, UNC

Kepler: Scientific Workflows Model the way scientists work with their data now –Mentally coordinate export and import of data among software systems Workflows emphasize data flow Metadata-driven data ingestion Output generation includes creating appropriate metadata Query EcoGrid to find data Archive output to EcoGrid with workflow metadata

SEEK EcoGrid Goal: allow diverse environmental data systems to interoperate –Hides complexity of underlying systems using lightweight interfaces –Integrate diverse data networks from ecology, biodiversity, and environmental sciences Data systems –Any system can implement these interfaces –Prototyping using: Metacat, DiGIR, etc. Supports multiple metadata standards –EML, Darwin Core as foci

EcoGrid client interactions Modes of interaction –Client-server –Fully distributed –Peer-to-peer EcoGrid Registry –Node discovery –Service discovery Aggregation services –Centralized access –Reliability –Data preservation

Knowledge Representation Current Ontologies –Ecological Concepts, Models, Networks –Measurements –Properties –Statistical Analyses –Time and Space –Taxonomic Identifiers –Units –Symbiosis Recent Developments –Biodiversity (measured traits, computation of traits) –Descriptive Terminology for Plant Communities –Ontology documentation …

Data Procurement “Find all datasets that contain abundance measurements of ‘Manica bradleyi’ inter-ant parasites observed within California”

Data Set Ecological Data Set Ecological data set providers Concept Provider 1 e.g. Fishbase Concept Provider 3 e.g. Prometheus Concept Provider 2 e.g. ITIS Taxonomic concept providers Taxonomy transfer schema - TML Concept matching/expansion/… Weighted concepts Semantic Mediation System Return list of Data Sets User’s Taxonomic concept + quality measure Name/Concept Repository Ecological metadata language - EML (Containing Collector’s Taxonomic concept(s)) EML repository Taxon coverage SEEK High-Level Approach

Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON

Conclusions? Standard data structures Standards for data exchange Public data archives (functions for deposit, discovery, withdrawal, citation, annotation) Standards for data archiving Standards for reference to taxonomic data Standard software tools