Presentation is loading. Please wait.

Presentation is loading. Please wait.

Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.

Similar presentations


Presentation on theme: "Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open."— Presentation transcript:

1 Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 24, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

2 Presumptions Archives depend on logical rules for information structures and consistent codes for metadata. “Field Observations” will contain unexpected variations that will challenge rules. Containment of this problem can be accomplished with some planning.

3 Presentation Strategy This presentation focuses on special issues for “data management planning”. Archives must: –Determine how these special issues were resolved in the original data management plan. OR –Resolve these issues by further processing or documentation as data are added to the Archive.

4 Challenges from Field Data Multiple schemes for location information Temporary changes in methods Unmeasurable events will occur Evolving references lists Find a containment strategy for exceptions

5 Location Information - Coordinates Multiple geographic coordinate systems –Local / engineering systems Unprojected Rotational differences –Global systems Which projection? Which projection parameters Conversions may be “mathematically irreversible” Be careful and test changes before large scale conversions are made –Visualization capability is essential!!

6 Location Information – Place Names Multiple naming schemes –“Folk” names – unofficial –Divergence in official naming schemes at local, regional, and national scales –Connecting historical name changes Avoid including a measurable parameter in a “location code” –Stream mile story Names for sampling stations included “stream mile”. Names were changes for part of the data after a higher resolution mapping occurred and the “stream mile” changed. Using the entire collection of information was complicated.

7 Temporary Changes in Methods The field sampling protocol is decided to be insufficient (too much or too little). –Need to structure the metadata to record the temporary change Field observations are ongoing at a remote site –Part of the instruments breakdown –Remaining instruments continue operations –Need a robust scheme for missing value representation Data analyses must correctly exclude missing values

8 Unmeasurable Events (too little) How do you code the results from well water when the well is dry at the sampling time? –Need a robust scheme for missing values – be consistent!! –Need a decision rule that skips the entire record How do you record values that are below the detection limit (but not zero)? –Set all values to the minimum detection limit or –Set all values to the midpoint between detection and zero or –Set all values to zero or –Retain estimated value, but include a quality flag. Select one of these strategies and document the choice. –The choice can have significant impacts on summary statistics.

9 Unmeasurable Events (too much) How do you record the biological population when there are too many individuals to count? –Record some arbitrary large number or –Flag as unmeasureable Similar problems can occur with wide ranges in chemical concentrations. Different schemes may have impacts on: –Results from statistical analyses –Setting quality assurance limits

10 Evolving Reference Lists Taxonomic lists* –Infrequent individuals may not be fully identifiable Need other lifecycle stages –Later samples enable fuller identification –Need to recode earlier records to match newer identification (??) –How do you analyze or summarize the entire data collection? Chemical constituents* –Chemicals with low concentrations maybe measured as a group –Additional and later locations contain higher concentrations and fuller chemical speciation is determined. –How do you analyze or summarize the entire series of measurements? *(Assumes agreement on single and accepted classification scheme; may not be true!!)

11 Containment Strategy for Exceptions “90 / 10” rule –~90% of the data can be described by a few logical rules. –~10% of the data cannot be described by rules and contains numerous and isolated exceptions. Guidance for decisions –Consider how many rules can be explained to future data users. –Put the information that cannot be described by rules in an alternative structure that can: be labeled as “user beware”. support detailed and varied documentation. Accommodate and communicate numerous exceptions. (Query logic vs. cataloged directory tree)

12 Evaluation of Containment Strategy More guidance for rule decisions –When the logical rules are “too many”, the archiving process will become too inefficient and tedious. –Adjust as needed. Eliminate spurious variation in codes and logic –For example: inconsistent abbreviations, punctuation, and capitalization –Minimizes the containment of exceptions.

13 Comments and Questions…


Download ppt "Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open."

Similar presentations


Ads by Google