Cyber-Infrastructure Challenges & Ecoinformatics: An Ecologist’s Perspective William Michener LTER Network Office Department of Biology University of New.

Slides:



Advertisements
Similar presentations
Relational Database and Data Modeling
Advertisements

Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
C6 Databases.
Database Systems: Design, Implementation, and Management Tenth Edition
Local Data Management: Building understandable spreadsheets Jeff Arnfield National Climatic Data Center Version 1.0 Review Date.
By Mary Anne Poatsy, Keith Mulbery, Eric Cameron, Jason Davidson, Rebecca Lawson, Linda Lau, Jerri Williams Chapter 9 Fine-Tuning the Database 1 Copyright.
Access - Project 1 l What Is a Database? –A Collection of Data –Organized in a manner to allow: »Access »Retrieval »Use of That Data.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
PowerPoint Presentation for Dennis, Wixom & Tegarden Systems Analysis and Design Copyright 2001 © John Wiley & Sons, Inc. All rights reserved. Slide 1.
Database Management: Getting Data Together Chapter 14.
Research Integrity: Collaborative Research Michelle Stickler, DEd Office for Research Protections
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
Introduction to Communication Research
Professor Michael J. Losacco CIS 1150 – Introduction to Computer Information Systems Databases Chapter 11.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Data quality control, Data formats and preservation, Versioning and authenticity, Data storage Managing research data well workshop London, 30 June 2009.
Introduction to Ecoinformatics: Past, Present & Future William Michener LTER Network Office, University of New Mexico January 2007.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
U.S. Department of the Interior U.S. Geological Survey Best Practices for Preparing Science Data to Share.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
1Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall. Exploring Microsoft Office Access 2010 by Robert Grauer, Keith Mast, and Mary Anne.
Data Management Plans Bill Michener University Libraries and Biology Dept. University of New Mexico.
1 Introduction: Context for the Week William Michener LTER Network Office Department of Biology University of New Mexico.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
9/14/2012ISC329 Isabelle Bichindaritz1 Database System Life Cycle.
Chapter 7: Database Systems Succeeding with Technology: Second Edition.
Elements of a Data Management Plan Bill Michener University Libraries University of New Mexico Data Management Practices for.
Workshop on QC in Derived Data Products, Las Cruces, NM, 31 January 2007 ClimDB/HydroDB Objectives Don Henshaw Improve access to long-term collections.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Quality Assurance & Quality Control Kristin Vanderbilt, Ph.D. Sevilleta LTER.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Enhancing Linkages Between Projects and Datasets: Examples from LBA-ECO for NACP Lisa Wilcox, Amy L. Morrell,
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Module 6. Data Management Plans  Definitions ◦ Quality assurance ◦ Quality control ◦ Data contamination ◦ Error Types ◦ Error Handling  QA/QC best practices.
File Systems (1). Readings r Reading: Disks, disk scheduling (3.7 of textbook; “How Stuff Works”) r Reading: File System Implementation ( of textbook)
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Data Models for Ecological Databases John Porter Department of Environmental Sciences University of Virginia.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Fundamental Practices for Preparing Data Sets Bob Cook Environmental Sciences Division Oak Ridge National Laboratory.
Developing Policy and Procedure Management System إعداد برنامج سياسات وإجراءات العمل 8 Safar February 2007 HERA GENERAL HOSPITAL.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Data Management 101 for Earth Scientists Managing Your Data Robert Cook Environmental Sciences Division Oak Ridge National Laboratory.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
Quality Assurance & Quality Control University of New Mexico Kristin Vanderbilt William Michener James Brunt Troy Maddux University of South Carolina Don.
IDigBio is funded by a grant from the National Science Foundation’s Advancing Digitization of Biodiversity Collections Program (Cooperative Agreement EF ).
Elements of a Data Management Plan Bill Michener University of New Mexico
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Data Models for Ecological Databases John Porter Department of Environmental Sciences University of Virginia.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
0 / Database Management. 1 / Identify file maintenance techniques Discuss the terms character, field, record, and table Describe characteristics.
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Quality Assurance & Quality Control
DOE Data Management Plan Requirements
Metadata ESA Workshop. In this session we will discuss…  Metadata: what are they? and why should they be created?  Metadata standards  Creating metadata.
Learning Objectives Understand the concepts of Information systems.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Morpho – metadata management software SEEK Training January 2004.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Cushing – EIM Integrating Ecological Data Notes from the Grasslands ANPP Data Integration Project evergreen.edu LTER Network Office,
NASA Earth Science Data Stewardship
Data Management: Documentation & Metadata
Social Research Methodology and Supplementary Documentation John Kallas University of the Aegean, Department of Sociology.
Presentation transcript:

Cyber-Infrastructure Challenges & Ecoinformatics: An Ecologist’s Perspective William Michener LTER Network Office Department of Biology University of New Mexico

Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

Most studies use a single scale of observation -- Commonly 1 m 2 The literature is biased toward single and small scale results

Time (yrs) Variable Change transition from one state or condition to another

Space Parameters Time Thinking Outside the “Box” LTER Biocomplexity ???? Increase in breadth and depth of understanding.....

LTER 24 NSF LTER Sites in the U.S. and the Antarctic: > 1500 Scientists; 6,000+ Data Sets— different themes, methods, units, structure, ….

Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

Knowledge Data Information Phenomena

Knowledge Data Information Phenomena

Abstraction of phenomena

Knowledge Data Information Phenomena

Yon Hither Hunter-gatherers

A Paradigm Shift Taxon 1 Taxon 2 Taxon 3 Taxon 4 Abioticfactors Integrated,InterdisciplinaryDatabases Harvesters

Information Content Time Time of publication Specific details General details Accident Retirement or career change Death (Michener et al. 1997) Data Entropy

A B Schema transform Coding transform Taxon Lookup Semantic transform Imagine scaling!! C A B C Semantics

Semantics—Linking Taxonomic Semantics to Ecological Data Rhynchospora plumosa s.l. Elliot 1816 Gray 1834 Kral 1998 Peet 2002? Chapman 1860 R. plumosa R. Plumosa v. intermedia R. plumosa v. plumosa R. Plumosa v. interrupta R. plumosa R. intermedia R. pineticola R. plumosa v. plumosa R. plumosa v. pinetcola R. sp. 1  Taxon concepts change over time (and space)  Multiple competing concepts coexist  Names are re-used for multiple concepts from R. Peet ABC

Knowledge Data Information Phenomena

Characteristics of Ecological Data Complexity/Metadata Requirements Satellite ImagesDataVolume(perdataset) Low High High Soil Cores PrimaryProductivity GIS Population Data BiodiversitySurveys Gene Sequences Business Data Weather Stations Most Ecological Data MostSoftware

What Users Really Want…

Data Collection Analysis Translation Use By Non-Scientists Publish For Other Scientists

Today’s Road Map Science Cyber-Infrastructure Challenges Ecoinformatics

Ecological Informatics A broad interdisciplinary science that incorporates both conceptual and practical tools for the understanding, generation, processing, and propagation of ecological data and information.

Data Design and Metadata Data Acquisition and Quality Control Access and Archiving Analysis and Interpretation Data Manipulation and Quality Assurance Project Initiation Publication

Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no

“Ecological Informatics” Activities Project / experimental design Data design [Porter & McCartney] Data acquisition QA/QC [Vanderbilt] Data documentation (metadata) [Michener] Data archival

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Project / Experimental Design ExperimentalDesign Analyses Data / Database Design

Project / Experimental Design Some Classic References –Green, R.H Sampling Design and Statistical Methods for Environmental Biologists. John Wiley & Sons, Inc., New York. –Resetarits, Jr., W.J. and J. Bernardo (eds.) Experimental Ecology. Oxford University Press, New York. –Scheiner, S.M. and J. Gurevitch (eds.) Design and Analysis of Ecological Experiments. Chapman & Hall, New York. –Sokal, R.R. and F.J. Rohlf Biometry. W.H. Freeman & Company, New York. –Underwood, A.J Experiments in Ecology: Their Logical Design and Interpretation Using Analysis of Variance. Cambridge University Press, Cambridge, UK.

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Data Design Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

Data Set Design: Best Practices Assign descriptive file names Use consistent and stable file formats Define the parameters Use consistent data organization Perform basic quality assurance Assign descriptive data set titles Provide documentation (metadata) from Cook et al. 2000

1. Assign descriptive file names File names should be unique and reflect the file contents Bad file names –Mydata –2001_data A better file name –Sevilleta_LTER_NM_2001_NPP.asc Sevilleta_LTER is the project name NM is the state abbreviation 2001 is the calendar year NPP represents Net Primary Productivity data asc stands for the file type--ASCII

2. Use consistent and stable file formats Use ASCII file formats – avoid proprietary formats Be consistent in formatting –don’t change or re-arrange columns –include header rows (first row should contain file name, data set title, author, date, and companion file names) –column headings should describe content of each column, including one row for parameter names and one for parameter units –within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)

3. Define the parameters Use commonly accepted parameter names that describe the contents (e.g., precip for precipitation) Use consistent capitalization (e.g., not temp, Temp, and TEMP in same file) Explicitly state units of reported parameters in the data file and the metadata (SI units are recommended) Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file –e.g., use yyyymmdd; January 2, 1999 is

4. Use consistent data organization (one good approach) StationDateTempPrecip Units YYYYMMDDCmm HOGI HOGI HOGI Note: is a missing value code for the data set

4. Use consistent data organization (a second good approach) StationDateParameterValueUnit HOGI Temp12C HOGI Temp14C HOGI Precip0mm HOGI Precip3mm

5. Perform basic quality assurance Assure that data are delimited and line up in proper columns Check that there no missing values for key parameters Scan for impossible and anomalous values Perform and review statistical summaries Map location data (lat/long) and assess errors Verify automated data transfers For manual data transfers, consider double keying data and comparing 2 data sets

6. Assign descriptive data set titles Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7). Titles should be restricted to 80 characters. Data set title should be similar to names of data files –Good: “Shrub Net Primary Productivity at the Sevilleta LTER, New Mexico, ” –Bad: “Productivity Data”

7. Provide documentation (metadata) To be discussed in detail

Database Types File-system based Hierarchical Relational Object-oriented Hybrid (e.g., combination of relational and object-oriented schema) Porter 2000

File-system-based Database Directory Files Porter 2000

Hierarchical Database Project Data setsInvestigators VariablesLocations CodesMethods Porter 2000

Relational Database Projects Data sets Locations Location_idData_id Location_id Porter 2000

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

High-quality data depend on: Proficiency of the data collector(s) Instrument precision and accuracy Consistency (e.g., standard methods and approaches) –Design and ease of data entry Sound QA/QC Comprehensive metadata (e.g., documentation of anomalies, etc.)

How are data to be acquired? Automatic Collection ? Tape Recorder Data Sheet Field entry into hand-held computer

PlantLife Stage ______________ _______________ What’s wrong with this data sheet?

Important questions How well does the data sheet reflect the data set design? How well does the data entry screen (if available) reflect the data sheet?

PlantLife Stage ardiP/G V B FL FR M S D NP arpuP/G V B FL FR M S D NP atcaP/G V B FL FR M S D NP bamuP/G V B FL FR M S D NP zigrP/G V B FL FR M S D NP P/G V B FL FR M S D NP PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors:_________________________________ Date:___________________ Time:_________ Notes: _________________________________________ _______________________________________________ P/G = perennating or germinatingM = dispersing V = vegetatingS = senescing B = buddingD = dead FL = floweringNP = not present FR = fruiting

PHENOLOGY DATA SHEET Rio Salado - Transect 1 Collectors Troy Maddux Date: 16 May 1991 Time: 13:12 Notes: Cloudy day, 3 gopher burrows on transect ardiP/G V B FL FR M S D NP Y N arpuP/G V B FL FR M S D NP Y N asbrP/G V B FL FR M S D NP Y N deobP/G V B FL FR M S D NP Y N

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Experimental Design Methods Data Design Data Forms Data Entry Field Computer Entry Electronically Interfaced Field Equipment Electronically Interfaced Lab Equipment Raw Data File Quality Assurance Checks Data Contamination Data verified? Data Validated Archive Data File Archival Mass Storage Magnetic Tape / Optical Disk / Printouts Access Interface Off-site Storage Secondary Users Publication Synthesis Investigators Summary Analyses Quality Control Metadata Research Program Investigators Studies yes no Brunt 2000 Generic Data Processing

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

“Ecological Informatics” Activities Project / experimental design Data design Data acquisition QA/QC Data documentation (metadata) Data archival

Traditional Fates of Data Post-Publication Paper to filing cabinets Data to floppy disks or tape Data and information lost over time entropy

Data Archive A collection of data sets, usually electronic, stored in such a way that a variety of users can locate, acquire, understand and use the data. Examples: –ESA’s Ecological Archive –NASA’s DAACs (Distributed Active Archive Centers)

Planning Problem Analysis and modeling Cycles of Research “A Conventional View” Collection Publication s Data

Cycles of Research “A New View” Planning Problem Definition (Research Objectives) Analysis and modeling Planning Collection Selection and extraction Archive of Data Original Observations Secondary Observations Publication s

Start small and keep it simple – building on simple successes is much easier than failing on large inclusive attempts.  Involve scientists - ecological data management is a scientific endeavor that touches every aspect of the research program. Scientists should be involved in the planning and operation of a data management system.  Support science – data management must be driven by the research and not the other way around, a data management system must produce the products and services that are needed by the community. Keys to Success

Data Value Time Serendipitous Discovery Inter-site Synthesis Gradual Increase In Data Equity Methodological Flaws, Instrumentation Obsolescence Non-scientific Monitoring Increasing value of data over time

LTER Data Access Policy 1)There are two types of data: Type I (data that is freely available within 2-3 years) with minimum restrictions and, Type II (Exceptional data sets that are available only with written permission from the PI/investigator(s)). Implied in this timetable, is the assumption that some data sets require more effort to get on-line and that no "blanket policy" is going to cover all data sets at all sites. However, each site would pursue getting all of their data on-line in the most expedient fashion possible. 2) The number of data sets that are assigned TYPE II status should be rare in occurrence and that the justification for exceptions must be well documented and approved by the lead PI and site data manager. Some examples of Type II data may include: locations of rare or endangered species, data that are covered by copyright laws (e.g. TM and/or SPOT satellite data) or some types of census data involving human subjects.

Reasons to Not Share Data: Fear of getting scooped Number of publications will decrease People will find errors Someone will misinterpret my data

Benefits of Data Sharing Publicity, accolades, media attention Renewed or increased funding Teaching: long-term data sets adapted for teaching & texts Archival: back-up copy of critical data sets Research: new synthetic studies peer-reviewed publications Document global and regional change Conservation and resource management: species and natural areas protection new environmental laws

Brunt (2000) Ch. 2 in Michener and Brunt (2000) Porter (2000) Ch. 3 in Michener and Brunt (2000) Edwards (2000) Ch. 4 in Michener and Brunt (2000) Michener (2000) Ch. 7 in Michener and Brunt (2000) Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook Best practices for preparing ecological and ground-based data sets to share and archive. (online at bin/MDE/S2K/bestprac.html)

Thanks !!!