Presentation is loading. Please wait.

Presentation is loading. Please wait.

Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.

Similar presentations


Presentation on theme: "Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access."— Presentation transcript:

1 Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 23, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

2 Presentation Strategy Change is part of Science Change is part of Science Accommodating change Accommodating change Integration with good practices Integration with good practices

3 Research Implies Change … repeat… New information requirements New questions Research Discovery Not always true for other information systems

4 Minimize Changes / Maximize Documentation Unpredicted variation in data during research is: Unpredicted variation in data during research is: No excuse for loose management of changes!! No excuse for loose management of changes!! Often used as an excuse to avoid standards. Often used as an excuse to avoid standards. Unavoidable in all cases, but try… Unavoidable in all cases, but try… Missing values will occur; Plan ahead Missing values will occur; Plan ahead Avoid this complexity: “Temp, temp, t, T, temperature…” Avoid this complexity: “Temp, temp, t, T, temperature…” A source of ambiguity; be clear. A source of ambiguity; be clear. Consider the view of future users Consider the view of future users Minimal observational intensity is: Minimal observational intensity is: No excuse (!!) for skipping documentation!! No excuse (!!) for skipping documentation!! Quick study = no documentation?? {NO} Quick study = no documentation?? {NO} The unexpected are rare and most valuable??

5 Management Issues to Consider What will change? What will change? Which changes can be controlled? Which changes can be controlled? How are changes approved? How are changes approved? How are users notified about changes? How are users notified about changes? How and when can changes be “smoothed” in the cumulative view? How and when can changes be “smoothed” in the cumulative view?

6 Things that will Change Access expectations Access expectations Removal or addition of access restrictions Removal or addition of access restrictions The scope and logical hierarchy of the information. The scope and logical hierarchy of the information. New parameters New parameters New disciplines New disciplines New study sites New study sites New data sources or methods New data sources or methods Revisions and additions to metadata codes for parameters, sites, and measurements. Revisions and additions to metadata codes for parameters, sites, and measurements. Updates of hardware and software Updates of hardware and software

7 Design Considerations (1) Create “extensible standards” for metadata Create “extensible standards” for metadata Have a process for proposing and implementing new standard metadata codes. Have a process for proposing and implementing new standard metadata codes. Record the effective dates of changes. Record the effective dates of changes. Build databases and applications software “for change” Build databases and applications software “for change” Put labels in “lookup” tables (outside the software code) Put labels in “lookup” tables (outside the software code) DO NOT let the flexibility needed to store the information become constrained by software that is too complex to be changed!! DO NOT let the flexibility needed to store the information become constrained by software that is too complex to be changed!! Ask developers: Before software and databases are built. Ask developers: “How hard will this design be to change in the future?” Before software and databases are built.

8 Design Considerations (2) Include notification procedures to data users about changes Include notification procedures to data users about changes Process is simple – distribute information to previous data users. Process is simple – distribute information to previous data users. Records about previous data access are required. Records about previous data access are required. The description of the change maybe difficult to acquire and manage. The description of the change maybe difficult to acquire and manage. Allocate resources for reprocessing Allocate resources for reprocessing Some changes over time maybe very difficult (and irritating) to the data users. Some changes over time maybe very difficult (and irritating) to the data users. Reprocessing can “smooth over” some changes. Reprocessing can “smooth over” some changes. Reprocessing may be limited by available documentation. Reprocessing may be limited by available documentation.

9 Change and Dataset Design The following series of slides present: The following series of slides present: Basic “principles” for good dataset design AND Basic “principles” for good dataset design AND How the “principles” need to be adapted to accommodate changes and future data archiving. How the “principles” need to be adapted to accommodate changes and future data archiving.

10 Rules for Creating Datasets for Archiving (1) Unique Occurrences Unique Occurrences Each type of measurement is represented in a consistent way. Each type of measurement is represented in a consistent way. Each measurement event is represented by only one value. Each measurement event is represented by only one value. If multiple versions of datasets accumulate: provide version information Explain version differences Document effective date range for each version When was “it done this way” (observation date range) When was “it distributed this way” (distribution date range)

11 Rules for Creating Datasets for Archiving (2) Identifiers Identifiers Each value is associated with a parameter name. Each value is associated with a parameter name. Each measurement value has a quality indicator and link to a method description. Each measurement value has a quality indicator and link to a method description. When possible remove multiple aliases for the same identifier (sample ID, site ID or name, measurement name, etc.).

12 Rules for Creating Datasets for Archiving (3) Place and Time Place and Time Each value is associated with a unique place name with a quantitatively defined location (geographic coordinates). Each value is associated with a unique place name with a quantitatively defined location (geographic coordinates). Each value is associated with a date and time. Each value is associated with a date and time. Do not confuse date and time for measurements with: Date and time for storage storage or revisions. Date and time ranges for measurement or encoding methods.

13 Rules for Creating Datasets for Archiving (4) Data Storage and Transport Data Storage and Transport Data are stored or managed with a database management system or self documenting data format. Data are stored or managed with a database management system or self documenting data format. NetCDF is an example of a non-proprietary data format that is self-documented. NetCDF is an example of a non-proprietary data format that is self-documented. Developed by the atmospheric sciences research community. Developed by the atmospheric sciences research community. Main documentation and software libraries are openly available. Main documentation and software libraries are openly available. http://my.unidata.ucar.edu/content/software/netcdf/index.html http://my.unidata.ucar.edu/content/software/netcdf/index.html http://my.unidata.ucar.edu/content/software/netcdf/index.html Some commercial data analysis software include interfaces to this open format. Some commercial data analysis software include interfaces to this open format. Include data analysis software in data management suite Useful for comparing versions of data that accumulate over time Include data format conversion software in data management suite Useful for migrating data from storage technology to another

14 Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive Best Practices Include: Best Practices Include: Assign descriptive file names Assign descriptive file names Use consistent and stable file formats Use consistent and stable file formats Define the parameters Define the parameters Use consistent data organization Use consistent data organization Perform basic quality assurance Perform basic quality assurance Assign descriptive data set titles Assign descriptive data set titles Provide documentation Provide documentation Published: Cook et al. 2001. Bulletin of the Ecological Society of America Published: Cook et al. 2001. Bulletin of the Ecological Society of America http://www.daac.ornl.gov/DAAC/PI/bestprac.html http://www.daac.ornl.gov/DAAC/PI/bestprac.html

15 A Future Scientist’s View Three years ago: Three years ago: I told my college-age daughter about the Japanese announcement of 1 TB of optical memory in 1 cubic centimeter. I told my college-age daughter about the Japanese announcement of 1 TB of optical memory in 1 cubic centimeter. Her reply was: Her reply was: “…We need to know how to think critically and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”

16 Comments and Questions…


Download ppt "Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access."

Similar presentations


Ads by Google