Bringing Organism Observations Into Bioinformatics Networks

Slides:



Advertisements
Similar presentations
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Advertisements

Publish or perish? Linking Scratchpads and the new Biodiversity Data Journal for streamlining publication of botanical data D.N Koureas 1, L. Penev 2 &
Stepping Forward Population Objectives Partners in Flight Conservation Design Workshop April 2006 and Delivering Conservation.
Center for Environmental Studies Arizona State University Digital Research Records at Center for Environmental Studies Peter McCartney.
Effects of Climatic Variability and Change on Forest Resources Dave Peterson Forest Service – PNW Research Station Pacific Wildland Fire Sciences Lab UW.
Assessing conservation priorities: the African Vertebrates Databank (AVD) Istituto di Ecologia Applicata Via L.Spallanzani, Rome ITALY
Medical Informatics Basics
NSF EF Welcome to Summit III University of Florida Florida State University.
The EDIT Platform for Cybertaxonomy as an information broker in name infrastructures Andreas Kohlbecker 1, Yde de Jong 2, Cherian Mathew 1, Lorna Morris.
TDWG Annual Conference 2013, Florence Hannu Saarenmaa University of Eastern Finland Integrating observation and survey data for production of the Essential.
European GBIF Nodes Meeting 2013 Rui Figueira Digitarium, Joensuu, Finland, March GBIF Portugal
Drivers for a PRAGMA Biodiversity Science Expedition Reed Beaman Florida Museum of Natural History University of Florida.
CSD 5100 Introduction to Research Methods in CSD First Day Opening Stretch Course Requirements/Syllabus What is Science? What is Research? The Scientific.
Resource Identification for a Biological Collection Information Service in Europe An introduction to the BioCISE project Walter G. Berendsohn Botanical.
Introduction to Neural Networks. Neural Networks in the Brain Human brain “computes” in an entirely different way from conventional digital computers.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
EE325 Introductory Econometrics1 Welcome to EE325 Introductory Econometrics Introduction Why study Econometrics? What is Econometrics? Methodology of Econometrics.
University of Florida Florida State University
Standards and tools for publishing biodiversity data Yu-Huang Wang June 25, 2012.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Astro / Geo / Eco - Sciences Illustrative examples of success stories: Sloan digital sky survey: data portal for astronomy data, 1M+ users and nearly 1B.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
CBoL Taipei, september 2007 BARCODE DATA, MUSEUM CATALOGS AND GBIF Simon Tillier.
Jake F. Weltzin United States Geological Survey Taking the Pulse of our Planet The USA National Phenology Network.
Scientific Processes Mrs. Parnell. What is Science? The goal of science is to investigate and understand the natural world, to explain events in the natural.
Research & Experimental Design Why do we do research History of wildlife research Descriptive v. experimental research Scientific Method Research considerations.
Definition of an Observation In general, an observation represents the measurement of some attribute, of some thing, at a particular time and place. Observations.
Applications of Spatial Statistics in Ecology Introduction.
Landscape Ecology: Conclusions and Future Directions.
Why Does NOAA Need a Climate & Ecosystem Demonstration Project in the California Current System? Capabilities and Drivers La Jolla, CA 6 June, 2005.
Global Biodiversity Information Facility GLOBAL BIODIVERSITY INFORMATION FACILITY Meredith A. Lane CODATA/ERPANET Workshop: Scientific Data Selection &
Nursing research Is a systematic inquiry into a subject that uses various approach quantitative and qualitative methods) to answer questions and solve.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Jake F. Weltzin United States Geological Survey USA National Phenology Network Integrating phenology data across spatial and temporal scales.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Macroecology & Conservation Unit
Who are we? Laboratory of Biodiversity, Institute of Marine Biology, Biotechnology and Aquaculture (IMBBA), Hellenic Centre for Marine Research (HCMR)
Context: The Strategic Plan for Establishing the Network Integrated Biocollections Alliance Judith E. Skog, Office of the Assistant Director, Biological.
Taxonomic Workflow in the EDIT Platform for Cybertaxonomy Andreas Kohlbecker, Pepe Ciardelli, Niels Hoffmann, Katja Luther, Andreas Müller Botanic Garden.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Taxonomy & Phylogeny. B-5.6 Summarize ways that scientists use data from a variety of sources to investigate and critically analyze aspects of evolutionary.
CIBIO/InBIOIICT Miguel Porto, Pedro Beja, Rui Figueira.
Research Design
01-Business intelligence
IOT – Firefighting Example
RESEARCH METHODS Lecture 12
Database management system Data analytics system:
GBIF Implementation Plan Highlights
Joslynn Lee – Data Science Educator
RCN Development of an Online Database to Enhance the Conservation of SGCN Invertebrates in the Northeastern Region James W. Fetzner Jr. & John.
Chapter 1 – Ecological Data
Citizen Science’s contribution to GEO BON
Introductory Econometrics
Applying GIS to Santa Cruz Island:
Biodiversity Informatics 101
Software Engineering Experimentation
Data Warehousing and Data Mining
Measuring biotic components of a system
Research Design Shamindra Nath Sanyal 12/4/2018 SNS.
Chapter 1.1 – What is Science?
Delivering Conservation
Introduction.
Moving Social Science into the Fourth Paradigm: Opportunity Abounds
GBIF Strategic Plan Alberto González-Talaván
Features of a Good Research Study
RESEARCH BASICS What is research?.
Bird of Feather Session
Inferential Statistics
RESEARCH METHODS Lecture 12
Presentation transcript:

Bringing Organism Observations Into Bioinformatics Networks Steve Kelling Cornell Lab of Ornithology As the types of data included in biodiversity clearinghouses expands outside of the traditional realm of natural history collections opportunities and challenges arise. More data provides a greater opportunity for synthetic analysis across broad spatial and temporal landscapes, but since these data are collected in different ways more care is required in how data are repurposed.

Data to Knowledge digital data are not only the output of research, but the foundation for new scientific insights (NSF 2007) Observations are not only the output of research, but the foundation for new scientific insights Much discussion about the need for synthesis of biodiversity data 1. Observations of nature are the foundation of ecological studies 2. To organize these data requires methods for data sharing and interoperability A caveat on data synthesis. Most traditional data synthesis approaches to understand species occurrence involve tens if not hundreds of potentially important predictors with species data gathered either during a specific study, or reduced to a level in which most of the important information that was collected are removed. But organizing observational data from a variety of projects and enable the analysis of primary occurrence data from them has several challenges

Data about the occurrence of an organism What, Where, When, How and By Whom Primary Biodiversity Data What constitutes an observation of species’ occurrence?

Rhipidura leucophrys Willy Wagtail

Natural History Collections Broad-scale Surveys Directed Surveys Natural History collections are zoological, botanical, and paleontological specimens in museums, living collections in botanical or zoological gardens, or microbial strain and tissue collections. They are the foundation for taxonomic and historic occurrence of species. While most use of specimen collections has been for taxon-oriented research, they have been used for predictive modeling of species occurrence. Broad-scale surveys generate probabilistic estimates of species occurrence. They do not provide direct evidence, but allow inferences for the causes of species occurrence. Broad-scale surveys gather tens of millions of observations annually and provide the bulk of non-specimen observational data available. Directed surveys used when a priori knowledge of a given system or biological mechanism already exists. The design attempts to control for known sources of variation, while sampling one or a few well defined variables. As such, directed surveys are the form of observational data collection that closest resembles experimental studies.

Organize mountains of observations in standardized structures for access, analysis, and visualization of biodiversity. The mission of TDWG is to develop the structures, standards, and processes to allow the ingest, organization, and access to biodiversity data. These processes are beginning to structure primary occurrence data at large scales.

Not only do we need information on the occurrence of an organism, but we need to better understand how those occurrences were gathered. We need more than just mountains of data The goal of Biodiversity Information Standards organization is to take an expansive Not Parochial view of what is needed for Biodiversity Informatics and act upon it. No more statements of these data are crap Or Not enough data Or whatever

Data Gathering Information Project Code Sampling Event Identifier Protocol Identifier Data Gathering Information must be included in any biodiversity data management architecture Project Code: Allows linking of species’ occurrence records to a “project” Sampling Event Identifier: Allows single observations to be grouped. The identifier must be unique within each project. A sampling event is typically defined as a series of observations made during a determined amount of time at a given location (i.e., a checklist of birds or other organisms, marine mammals counted along a transect). Protocol Identifier: Allows the identification of the methods used to collect the species’ occurrence data, using domain specific standards.

High level processing workflow for integrative data intensive biodiversity research. Physical events and objects are gathered through sensor, observer, and survey networks. These data are stored in heterogeneous repositories. Informatics processes allow heterogeneous data to be synthesized for processing. Exploratory analyses (analyses useful for generating hypotheses) can drive confirmatory summative analyses. A variety of visualization tools allow these data to be viewed by a broad public.

New exploratory data analysis tools emerging from the fields of machine learning, data mining, and statistics can automatically identify patterns in large and complex biodiversity data sources. For example, bagged decision trees have been used to accurately identify the patterns of winter bird distributions across North America. These techniques share an ability to automatically adapt to patterns in data making them especially well suited for exploratory analysis.

Garbage Out Data In Data In and Garbage Out Ecoinformatics initiatives must insure that the data that are being organized do not loose much of the information that was gathered. Not only does this information include data on the organism, but also information on how the organism data were collected. Without this information, the contents of the data looses its significance.