Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007.

Slides:



Advertisements
Similar presentations
1 A Case Study in E- Science: Building Ecological Informatics Solutions for Multi-Decadal Research ARL/CNI 2008 Conference Washington, DC 16 October 2008.
Advertisements

FT228/4 Knowledge Based Decision Support Systems Knowledge Engineering Ref: Artificial Intelligence A Guide to Intelligent Systems, Michael Negnevitsky.
V Alyssa Rosemartin 1, Lee Marsh 1, Ellen Denny 1, Bruce Wilson USA National Phenology Network, Tucson, AZ; 2 - Oak Ridge National Laboratory, Oak.
Database Systems: Design, Implementation, and Management Tenth Edition
Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date.
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
Databases Chapter 11.
Professor Michael J. Losacco CIS 1150 – Introduction to Computer Information Systems Databases Chapter 11.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory 5 th NACP Principal Investigator’s.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
PHASE 3: SYSTEMS DESIGN Chapter 7 Data Design.
Cyber-Infrastructure Challenges & Ecoinformatics: An Ecologist’s Perspective William Michener LTER Network Office Department of Biology University of New.
SAFARI 2000 Data Activities at the ORNL DAAC Bob Cook, Les Hook, Stan Attenberger, Dick Olson, and Tim Rhyne Oak Ridge National Laboratory.
Overview of the Database Development Process
Fundamental Practices for Preparing Data Sets Robert Cook ORNL Distributed Active Archive Center Environmental Sciences Division Oak Ridge National Laboratory.
U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico.
ITEC224 Database Programming
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Methodology - Conceptual Database Design Transparencies
Software School of Hunan University Database Systems Design Part III Section 5 Design Methodology.
Database Systems: Design, Implementation, and Management Ninth Edition
Methodology Conceptual Databases Design
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
1 Chapter 15 Methodology Conceptual Databases Design Transparencies Last Updated: April 2011 By M. Arief
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Workshop on QC in Derived Data Products, Las Cruces, NM, 31 January 2007 ClimDB/HydroDB Objectives Don Henshaw Improve access to long-term collections.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Quality Assurance & Quality Control Kristin Vanderbilt, Ph.D. Sevilleta LTER.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Data-Model Assimilation in Ecology History, present, and future Yiqi Luo University of Oklahoma.
Translation to the New TCO Panel Beverly Law Prof. Global Change Forest Science Science Chair, AmeriFlux Network Oregon State University.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Methodology - Conceptual Database Design. 2 Design Methodology u Structured approach that uses procedures, techniques, tools, and documentation aids to.
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
Copyright © 2015 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
DATABASES Southern Region CEO Wednesday 13 th October 2010.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Methodology - Conceptual Database Design
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Databases Shortfalls of file management systems Structure of a database Database administration Database Management system Hierarchical Databases Network.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Geosciences - Observations (Bob Wilhelmson) The geosciences in NSF’s world consists of atmospheric science, ocean science, and earth science Many of the.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
Fanny Widadie, S.P, M.Agr 1 Database Management Systems.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Surface Water Quality Monitoring Information System (SWQMIS) Cindi Atwood Tetra Tech, Inc. (703) Nancy Ragland TCEQ.
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Quality Assurance & Quality Control
1 Chapter 12 Configuration management This chapter is extracted from Sommerville’s slides. Text book chapter 29 1.
Data Organization Quality Assurance and Transformations.
11 Researcher practice in data management Margaret Henty.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Morpho – metadata management software SEEK Training January 2004.
Data Management System to Collect, Quality Control, Distribute, and Archive Near Real-time Marine Data Jeremy J. Rolph, Jacob T. Rettig, Mark A. Bourassa,
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Standardization Promotes Biogeochemical Data Management and Use in Multidisciplinary Environmental Research Yaxing Wei, Suresh Vannan, Robert B. Cook,
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007

Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

Ecoinformatics A broad S&T discipline that incorporates both concepts and practical tools for the understanding, generation, processing, and propagation of ecological data, information and knowledge.

Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

Many studies employ a restricted scale of observation -- Commonly 1 m 2 The literature is biased toward single and small scale results

Space Parameters Time Thinking Outside the “Box” LTER Biocomplexity NEON, WATERS, OOI, …. Increase in breadth and depth of understanding.....

Grand environmental challenges

More and more of the ecological questions that confront society are national, continental and global in scope Source: CDC Drought Source: Drought Monitor

LTER 26 NSF LTER Sites in the U.S. and the Antarctic: > 1,600 Scientists; 6,000+ Data Sets— different themes, methods, units, structure, ….

NEON Climate Domains Northeast Mid Atlantic Southeast Atlantic Neotropical Great Lakes Prairie Peninsula Appalachians / Cumberland Plateau Ozarks Complex Northern Plains Central Plains Southern Plains Northern Rockies Southern Rockies / Colorado Plateau Desert Southwest Great Basin Pacific Northwest Pacific Southwest Tundra Taiga Pacific Tropical

Aquatic Arrays BioMesonet Tower and Sensor Arrays

Soil Sensor Arrays Micron-scale nitrate ISE

Small-Organism Tracking: Mobile animals as bio- sentinels for environmental change, forecasting biological invasions, emerging disease spread

Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

Characteristics of Ecological Data Complexity/Metadata Requirements Satellite ImagesDataVolume(perdataset) Low High High Soil Cores PrimaryProductivity GIS Population Data BiodiversitySurveys Gene Sequences Business Data Weather Stations Most Ecological Data MostSoftware

Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997) Data Entropy

A B Schema transform Coding transform Taxon Lookup Semantic transform Imagine scaling!! C B C Semantics

Semantics—Linking Taxonomic Semantics to Ecological Data Rhynchospora plumosa s.l. Elliot 1816 Gray 1834 Kral 1998 Peet 2002? Chapman 1860 R. plumosa R. Plumosa v. intermedia R. plumosa v. plumosa R. Plumosa v. interrupta R. plumosa R. intermedia R. pineticola R. plumosa v. plumosa R. plumosa v. pinetcola R. sp. 1  Taxon concepts change over time (and space)  Multiple competing concepts coexist  Names are re-used for multiple concepts from R. Peet ABC

What Users Really Want…

Outline  Ecoinformatics: a definition  A science vision  Information challenges  Ecoinformatics “solutions”

Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Standard Operating Procedures Policies Data sharing Computer use Archive storage

Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

 Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival Ecoinformatics solutions

Data Design  Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

Database Types  File-system based  Hierarchical  Relational  Object-oriented  Hybrid (e.g., combination of relational and object-oriented schema) Porter 2000

Data Design: 7 Best Practices  Assign descriptive file names  Use consistent and stable file formats  Define the parameters  Use consistent data organization  Perform basic quality assurance  Assign descriptive data set titles  Provide documentation (metadata) from Cook et al. 2000

1. Assign descriptive file names  File names should be unique and reflect the file contents  Bad file names Mydata 2001_data  A better file name Sevilleta_LTER_NM_2001_NPP.asc  Sevilleta_LTER is the project name  NM is the state abbreviation  2001 is the calendar year  NPP represents Net Primary Productivity data  asc stands for the file type--ASCII

2. Use consistent and stable file formats  Use ASCII file formats – avoid proprietary formats  Be consistent in formatting don’t change or re-arrange columns include header rows (first row should contain file name, data set title, author, date, and companion file names) column headings should describe content of each column, including one row for parameter names and one for parameter units within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)

3. Define the parameters  Use commonly accepted parameter names that describe the contents e.g., precip for precipitation  Use consistent capitalization e.g., not temp, Temp, and TEMP in same file  Explicitly state units of reported parameters in the data file and the metadata SI units are recommended  Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file e.g., use yyyymmdd; January 2, 1999 is

4. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI HOGI HOGI Note: is a missing value code

4. Use consistent data organization (a second good approach) StationDateParameterValueUnit HOGI Temp12C HOGI Temp14C HOGI Precip0mm HOGI Precip3mm

5. Perform basic quality assurance  Assure that data are delimited and line up in proper columns  Check that there no missing values for key parameters  Scan for impossible and anomalous values  Perform and review statistical summaries  Map location data (lat/long) and assess errors  Verify automated data transfers e.g. check-sum techniques  For manual data transfers, consider double keying data and comparing 2 data sets

6. Assign descriptive data set titles  Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7).  Data set title should be similar to names of data files Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, ” Bad: “Productivity Data”

7. Provide documentation (metadata)

Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

High-quality data depend on:  Proficiency of the data collector(s)  Instrument precision and accuracy  Consistency (e.g., standard methods and approaches) Design and ease of data entry  Sound QA/QC  Comprehensive metadata (e.g., documentation of anomalies, etc.)

PlantLife Stage ______________ _______________ What’s wrong with this data sheet?

Important questions  How well does the data sheet reflect the data set design?  How well does the data entry screen (if available) reflect the data sheet?

PlantLife Stage ardiP/G V B FL FR M S D NP arpuP/G V B FL FR M S D NP atcaP/G V B FL FR M S D NP bamuP/G V B FL FR M S D NP zigrP/G V B FL FR M S D NP P/G V B FL FR M S D NP PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors:_________________________________ Date:___________________ Time:_________ Notes: _________________________________________ _______________________________________________ P/G = perennating or germinatingM = dispersing V = vegetatingS = senescing B = buddingD = dead FL = floweringNP = not present FR = fruiting

PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 Notes: Cloudy day, 3 gopher burrows on transect ardiP/G V B FL FR M S D NP Y N arpuP/G V B FL FR M S D NP Y N asbrP/G V B FL FR M S D NP Y N deobP/G V B FL FR M S D NP Y N

Ecoinformatics solutions  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Brunt 2000 Generic Data Processing

Ecoinformatics solutions  Project / experimental design  Data design  Data acquisition  QA/QC  Data documentation (metadata) – to be addressed  Data archival

Ecoinformatics solutions  Project / experimental design  Data design  Data acquisition  QA/QC  Data documentation (metadata)  Data archival

Planning Problem Analysis and modeling Cycles of Research “A Conventional View” Collection Publication s Data

Cycles of Research “A New View” Planning Problem Definition (Research Objectives) Analysis and modeling Planning Collection Selection and extraction Archive of Data Original Observations Secondary Observations Publication s

Data Archive  A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data.  Examples: ESA’s Ecological Archive NASA’s DAACs (Distributed Active Archive Centers)

Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 in Michener and Brunt (2000) Edwards (2000) Ch. 4 in Michener and Brunt (2000) Michener (2000) Ch. 7 in Michener and Brunt (2000) Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook Best practices for preparing ecological and ground-based data sets to share and archive. (online at bin/MDE/S2K/bestprac.html) References