Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Slides:



Advertisements
Similar presentations
Katherine Jenny Thompson
Advertisements

Critical Reading Strategies: Overview of Research Process
TABLES and FIGURES BIOL 4001.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Atmospheric Iron Flux and Surface Chlorophyll at South Atlantic Ocean: A case study Near Patagonia J. Hernandez*, D. J. Erickson III*, P. Ginoux†, W. Gregg‡,
Unit 7: Evolution.
Sensitivity Analysis for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Internal documentation and user documentation
RIT Software Engineering
1 Modelled Meteorology - Applicability to Well-test Flaring Assessments Environment and Energy Division Alex Schutte Science & Community Environmental.
SE 450 Software Processes & Product Metrics 1 Defect Removal.
Slide 1 FAA’s Special Technical Audit of Boeing and the Audit Resolution Plan.
Systems Analysis I Data Flow Diagrams
Chapter 7 Correlational Research Gay, Mills, and Airasian
How to write a publishable qualitative article
Quality Assurance in the clinical laboratory
TRENDS IN MARINE WINDS ADJUSTED FOR CHANGES IN OBSERVATION METHOD, Bridget R. Thomas 1, Elizabeth C. Kent 2, Val R. Swail 3 and David I. Berry.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
©2013 Cengage Learning. All Rights Reserved. Business Management, 13e Data Analysis and Decision Making Mathematics and Management Basic.
WWLC Standard Operating Procedures Presented by Frank Hall, Laboratory Certification Coordinator.
1 Commissioned by PAMSA and German Technical Co-Operation National Certificate in Paper & Pulp Manufacturing NQF Level 3 Collect and use data to establish.
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Software Engineering Lecture # 17
Entities & Attributes Overview By the end of this section participants will be able to discuss the main components of Section 5, and be able to incorporate.
Evaluating a Research Report
Metrology Adapted from Introduction to Metrology from the Madison Area Technical College, Biotechnology Project (Lisa Seidman)
Role of Statistics in Geography
Analyzing and Interpreting Quantitative Data
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
Ch 1-1 © 2004 Pearson Education, Inc. Pearson Prentice Hall, Pearson Education, Upper Saddle River, NJ Ostwald and McLaren / Cost Analysis and Estimating.
1 f02kitchenham5 Preliminary Guidelines for Empirical Research in Software Engineering Barbara A. Kitchenham etal IEEE TSE Aug 02.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
LEVEL 3 I can identify differences and similarities or changes in different scientific ideas. I can suggest solutions to problems and build models to.
 Descriptive Methods ◦ Observation ◦ Survey Research  Experimental Methods ◦ Independent Groups Designs ◦ Repeated Measures Designs ◦ Complex Designs.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Hypothesis and Test Procedures A statistical test of hypothesis consist of : 1. The Null hypothesis, 2. The Alternative hypothesis, 3. The test statistic.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Presenting and Analysing your Data CSCI 6620 Spring 2014 Thesis Projects: Chapter 10 CSCI 6620 Spring 2014 Thesis Projects: Chapter 10.
The Scientific Method: Terminology Operational definitions are used to clarify precisely what is meant by each variable Participants or subjects are the.
Data Preprocessing Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
An Introduction to Scientific Research Methods in Geography Chapter 3 Data Collection in Geography.
Chapter 6: Analyzing and Interpreting Quantitative Data
RESEARCH & DATA ANALYSIS
7 Strategies for Extracting, Transforming, and Loading.
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
DOE Data Management Plan Requirements
Wireless sensor and actor networks: research challenges
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
LECTURE 5 Nangwonvuma M/ Byansi D. Components, interfaces and integration Infrastructure, Middleware and Platforms Techniques – Data warehouses, extending.
S o u t h F l o r i d a W a t e r M a n a g e m e n t D i s t r i c t Resampling Guidance: When Should Field Crews Resample Water Quality at a Site? 1/29/09.
Single Season Study Design. 2 Points for consideration Don’t forget; why, what and how. A well designed study will:  highlight gaps in current knowledge.
Tailoring the ESS Reliability and Availability needs to satisfy the users Enric Bargalló WAO October 27, 2014.
7. Air Quality Modeling Laboratory: individual processes Field: system observations Numerical Models: Enable description of complex, interacting, often.
1 Handbook on Population and Housing Census Editing Department of Economic and Social Development United Nations Statistics Division Studies in Methods,
Research Design
SeaDataNet Technical Task Group meeting JRA1 Standards Development Task 1.2 Common Data Management Protocol (for dissemination to all NODCs and JRA3) Data.
26. Classification Accuracy Assessment
WHO The World Health Survey Data Entry
How to write a publishable qualitative article
Gaps assessment in GAIA-CLIM
Quality Assurance in the clinical laboratory
Water Quality Monitoring -Sampling Design-
Data Management: Documentation & Metadata
EQT 272 PROBABILITY AND STATISTICS ROHANA BINTI ABDUL HAMID
Presentation transcript:

Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 24, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

Presumptions Archives depend on logical rules for information structures and consistent codes for metadata. “Field Observations” will contain unexpected variations that will challenge rules. Containment of this problem can be accomplished with some planning.

Presentation Strategy This presentation focuses on special issues for “data management planning”. Archives must: –Determine how these special issues were resolved in the original data management plan. OR –Resolve these issues by further processing or documentation as data are added to the Archive.

Challenges from Field Data Multiple schemes for location information Temporary changes in methods Unmeasurable events will occur Evolving references lists Find a containment strategy for exceptions

Location Information - Coordinates Multiple geographic coordinate systems –Local / engineering systems Unprojected Rotational differences –Global systems Which projection? Which projection parameters Conversions may be “mathematically irreversible” Be careful and test changes before large scale conversions are made –Visualization capability is essential!!

Location Information – Place Names Multiple naming schemes –“Folk” names – unofficial –Divergence in official naming schemes at local, regional, and national scales –Connecting historical name changes Avoid including a measurable parameter in a “location code” –Stream mile story Names for sampling stations included “stream mile”. Names were changes for part of the data after a higher resolution mapping occurred and the “stream mile” changed. Using the entire collection of information was complicated.

Temporary Changes in Methods The field sampling protocol is decided to be insufficient (too much or too little). –Need to structure the metadata to record the temporary change Field observations are ongoing at a remote site –Part of the instruments breakdown –Remaining instruments continue operations –Need a robust scheme for missing value representation Data analyses must correctly exclude missing values

Unmeasurable Events (too little) How do you code the results from well water when the well is dry at the sampling time? –Need a robust scheme for missing values – be consistent!! –Need a decision rule that skips the entire record How do you record values that are below the detection limit (but not zero)? –Set all values to the minimum detection limit or –Set all values to the midpoint between detection and zero or –Set all values to zero or –Retain estimated value, but include a quality flag. Select one of these strategies and document the choice. –The choice can have significant impacts on summary statistics.

Unmeasurable Events (too much) How do you record the biological population when there are too many individuals to count? –Record some arbitrary large number or –Flag as unmeasureable Similar problems can occur with wide ranges in chemical concentrations. Different schemes may have impacts on: –Results from statistical analyses –Setting quality assurance limits

Evolving Reference Lists Taxonomic lists* –Infrequent individuals may not be fully identifiable Need other lifecycle stages –Later samples enable fuller identification –Need to recode earlier records to match newer identification (??) –How do you analyze or summarize the entire data collection? Chemical constituents* –Chemicals with low concentrations maybe measured as a group –Additional and later locations contain higher concentrations and fuller chemical speciation is determined. –How do you analyze or summarize the entire series of measurements? *(Assumes agreement on single and accepted classification scheme; may not be true!!)

Containment Strategy for Exceptions “90 / 10” rule –~90% of the data can be described by a few logical rules. –~10% of the data cannot be described by rules and contains numerous and isolated exceptions. Guidance for decisions –Consider how many rules can be explained to future data users. –Put the information that cannot be described by rules in an alternative structure that can: be labeled as “user beware”. support detailed and varied documentation. Accommodate and communicate numerous exceptions. (Query logic vs. cataloged directory tree)

Evaluation of Containment Strategy More guidance for rule decisions –When the logical rules are “too many”, the archiving process will become too inefficient and tedious. –Adjust as needed. Eliminate spurious variation in codes and logic –For example: inconsistent abbreviations, punctuation, and capitalization –Minimizes the containment of exceptions.

Comments and Questions…