Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.

Slides:



Advertisements
Similar presentations
SolidWorks Enterprise PDM Data Loading Strategies
Advertisements

Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Virtualizing Entomology Collection Student: Di Wang (Alan) Sponsors: John Marris: Curator, Entomology Research Museum Stuart Charters: Department of Applied.
Documenting the Resource Malcolm Polfreman
System Design System Design - Mr. Ahmad Al-Ghoul System Analysis and Design.
Laboratory Notebook FITT (Fostering Interregional Exchange in ICT Technology Transfer)
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Open Library Environment Designing technology for the way libraries really work November 19, 2008 ~ ASERL, Atlanta Lynne O’Brien Director, Academic Technology.
Copyright © 2007 Software Quality Research Laboratory DANSE Software Quality Assurance Tom Swain Software Quality Research Laboratory University of Tennessee.
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
EMu and Archives NA EMu Users Conference – Oct Slide 1 EMu and Archives Experiences from the Canada Science and Technology Museum Corporation.
Basics of Good Documentation Document Control Systems
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Android Core Logging Application Keith Schneider Introduction The Core Logging application is part of a software suite that is designed to enable geologic.
Chapter 1 Database Systems. Good decisions require good information derived from raw facts Data is managed most efficiently when stored in a database.
Implementing Digital Object Identifiers at the GESIS Data Archive for the Social Sciences Workshop “Persistent Identifiers for the Social Sciences” Bonn,
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Chinese-European Workshop on Digital Preservation Beijing (China), July.
1 Open Library Environment Designing technology for the way libraries really work December 8, 2008 ~ CNI, Washington DC Lynne O’Brien Director, Academic.
3 Dec 2003Market Operations Standing Committee1 Market Rule and Change Management Consultation Process John MacKenzie / Darren Finkbeiner / Ella Kokotsis,
SAFARI 2000 Data Activities at the ORNL DAAC Bob Cook, Les Hook, Stan Attenberger, Dick Olson, and Tim Rhyne Oak Ridge National Laboratory.
Inter-American Workshop on Environmental Data Access Panel discussion on scientific and technical issues Merilyn Gentry, LBA-ECO Data Coordinator NASA.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
DAY 15: ACCESS CHAPTER 2 Larry Reaves October 7,
Best Practices for Preparing Data Sets Non-CO2 Synthesis Workshop Boulder, Colorado October 2008 Compiled by: A. Dayalu, Harvard University Adapted.
Chapter 9 Designing Databases Modern Systems Analysis and Design Sixth Edition Jeffrey A. Hoffer Joey F. George Joseph S. Valacich.
CSCI 3140 Module 2 – Conceptual Database Design Theodore Chiasson Dalhousie University.
Chuck Humphrey Data Library Co-ordinator University of Alberta May 16, Capitalising on Metadata Tool development plans IASSIST 2007.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
1 Peter Allan14-15 Dec 2004AstroGrid Consortium Meeting: Architecture Discussion AstroGrid Architecture – the view from outside Is the description acceptable?
Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
1 ENERGY 211 / CME 211 Lecture 26 November 19, 2008.
M.Lautenschlager (WDCC, Hamburg) / / 1 Semantic Data Management for Organising Terabyte Data Archives Michael Lautenschlager World Data Center.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
(Spring 2015) Instructor: Craig Duckett Lecture 10: Tuesday, May 12, 2015 Mere Mortals Chap. 7 Summary, Team Work Time 1.
Grade 11 Computer Science. Relational Databases  Using the link below, answer questions in your notebooks  Look at Kites.accdb database to refresh your.
Automated (meta)data collection – problems and solutions Grete Christina Lingjærde and Andora Sjøgren USIT, University of Oslo.
1 NARSTO Quality Systems Science Center Les A. Hook and Sigurd W. Christensen NARSTO QSSC Environmental Sciences Division Oak Ridge National Laboratory.
Naming and Code Conventions for ALICE DCS (1st thoughts)
Managing Your Data: Assign Descriptive File Names Robert Cook Oak Ridge National Laboratory Section: Local Data Management Version 1.0 October 2012.
3/30/04 16:14 1 Lessons Learned CERES Data Management Presented to GIST 21 “If the 3 laws of climate are calibrate, calibrate, calibrate, then the 3 laws.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Central Arizona Phoenix LTER Center for Environmental Studies Arizona State University Database Design Peter McCartney (CAP) RDIFS Training Workshop Sevilleta.
Copyright 2010, The World Bank Group. All Rights Reserved. Testing and Documentation Part II.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the.
Design and Planning Or: What’s the next thing we should do for our project?
3/6: Data Management, pt. 2 Refresh your memory Relational Data Model
DOE Data Management Plan Requirements
State of Georgia Release Management Training
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 1 Database Systems.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
DOCUMENTATION ISO/IEC 17025:2005 Documentation.
“Port Monitor”: progress & open questions Torsten Wilde and James Kohl Oak Ridge National Laboratory CCA Forum Quarterly Meeting Santa Fe, NM ~ October.
Ingest – Acquisition and deposit Irena Vipavc Brvar ADP SEEDS Workshop I Belgrade, October.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
ECA 2010, Geneva, Switzerland Creating a synergy between BPM
GIRO usage and GSICS Lunar Observation Dataset Policy S. Wagner
Database Design Hacettepe University
Datasets in CRM Site Proposal
Software Requirements Specification (SRS) Template.
Proposal of a Geographic Metadata Profile for WISE
Long-Lived Data Collections
Research Data Dr Aoife Coffey, Research Data Coordinator
Presentation transcript:

Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 23, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

Presentation Strategy Change is part of Science Change is part of Science Accommodating change Accommodating change Integration with good practices Integration with good practices

Research Implies Change … repeat… New information requirements New questions Research Discovery Not always true for other information systems

Minimize Changes / Maximize Documentation Unpredicted variation in data during research is: Unpredicted variation in data during research is: No excuse for loose management of changes!! No excuse for loose management of changes!! Often used as an excuse to avoid standards. Often used as an excuse to avoid standards. Unavoidable in all cases, but try… Unavoidable in all cases, but try… Missing values will occur; Plan ahead Missing values will occur; Plan ahead Avoid this complexity: “Temp, temp, t, T, temperature…” Avoid this complexity: “Temp, temp, t, T, temperature…” A source of ambiguity; be clear. A source of ambiguity; be clear. Consider the view of future users Consider the view of future users Minimal observational intensity is: Minimal observational intensity is: No excuse (!!) for skipping documentation!! No excuse (!!) for skipping documentation!! Quick study = no documentation?? {NO} Quick study = no documentation?? {NO} The unexpected are rare and most valuable??

Management Issues to Consider What will change? What will change? Which changes can be controlled? Which changes can be controlled? How are changes approved? How are changes approved? How are users notified about changes? How are users notified about changes? How and when can changes be “smoothed” in the cumulative view? How and when can changes be “smoothed” in the cumulative view?

Things that will Change Access expectations Access expectations Removal or addition of access restrictions Removal or addition of access restrictions The scope and logical hierarchy of the information. The scope and logical hierarchy of the information. New parameters New parameters New disciplines New disciplines New study sites New study sites New data sources or methods New data sources or methods Revisions and additions to metadata codes for parameters, sites, and measurements. Revisions and additions to metadata codes for parameters, sites, and measurements. Updates of hardware and software Updates of hardware and software

Design Considerations (1) Create “extensible standards” for metadata Create “extensible standards” for metadata Have a process for proposing and implementing new standard metadata codes. Have a process for proposing and implementing new standard metadata codes. Record the effective dates of changes. Record the effective dates of changes. Build databases and applications software “for change” Build databases and applications software “for change” Put labels in “lookup” tables (outside the software code) Put labels in “lookup” tables (outside the software code) DO NOT let the flexibility needed to store the information become constrained by software that is too complex to be changed!! DO NOT let the flexibility needed to store the information become constrained by software that is too complex to be changed!! Ask developers: Before software and databases are built. Ask developers: “How hard will this design be to change in the future?” Before software and databases are built.

Design Considerations (2) Include notification procedures to data users about changes Include notification procedures to data users about changes Process is simple – distribute information to previous data users. Process is simple – distribute information to previous data users. Records about previous data access are required. Records about previous data access are required. The description of the change maybe difficult to acquire and manage. The description of the change maybe difficult to acquire and manage. Allocate resources for reprocessing Allocate resources for reprocessing Some changes over time maybe very difficult (and irritating) to the data users. Some changes over time maybe very difficult (and irritating) to the data users. Reprocessing can “smooth over” some changes. Reprocessing can “smooth over” some changes. Reprocessing may be limited by available documentation. Reprocessing may be limited by available documentation.

Change and Dataset Design The following series of slides present: The following series of slides present: Basic “principles” for good dataset design AND Basic “principles” for good dataset design AND How the “principles” need to be adapted to accommodate changes and future data archiving. How the “principles” need to be adapted to accommodate changes and future data archiving.

Rules for Creating Datasets for Archiving (1) Unique Occurrences Unique Occurrences Each type of measurement is represented in a consistent way. Each type of measurement is represented in a consistent way. Each measurement event is represented by only one value. Each measurement event is represented by only one value. If multiple versions of datasets accumulate: provide version information Explain version differences Document effective date range for each version When was “it done this way” (observation date range) When was “it distributed this way” (distribution date range)

Rules for Creating Datasets for Archiving (2) Identifiers Identifiers Each value is associated with a parameter name. Each value is associated with a parameter name. Each measurement value has a quality indicator and link to a method description. Each measurement value has a quality indicator and link to a method description. When possible remove multiple aliases for the same identifier (sample ID, site ID or name, measurement name, etc.).

Rules for Creating Datasets for Archiving (3) Place and Time Place and Time Each value is associated with a unique place name with a quantitatively defined location (geographic coordinates). Each value is associated with a unique place name with a quantitatively defined location (geographic coordinates). Each value is associated with a date and time. Each value is associated with a date and time. Do not confuse date and time for measurements with: Date and time for storage storage or revisions. Date and time ranges for measurement or encoding methods.

Rules for Creating Datasets for Archiving (4) Data Storage and Transport Data Storage and Transport Data are stored or managed with a database management system or self documenting data format. Data are stored or managed with a database management system or self documenting data format. NetCDF is an example of a non-proprietary data format that is self-documented. NetCDF is an example of a non-proprietary data format that is self-documented. Developed by the atmospheric sciences research community. Developed by the atmospheric sciences research community. Main documentation and software libraries are openly available. Main documentation and software libraries are openly available Some commercial data analysis software include interfaces to this open format. Some commercial data analysis software include interfaces to this open format. Include data analysis software in data management suite Useful for comparing versions of data that accumulate over time Include data format conversion software in data management suite Useful for migrating data from storage technology to another

Best Practices for Preparing Ecological and Ground-Based Data Sets to Share and Archive Best Practices Include: Best Practices Include: Assign descriptive file names Assign descriptive file names Use consistent and stable file formats Use consistent and stable file formats Define the parameters Define the parameters Use consistent data organization Use consistent data organization Perform basic quality assurance Perform basic quality assurance Assign descriptive data set titles Assign descriptive data set titles Provide documentation Provide documentation Published: Cook et al Bulletin of the Ecological Society of America Published: Cook et al Bulletin of the Ecological Society of America

A Future Scientist’s View Three years ago: Three years ago: I told my college-age daughter about the Japanese announcement of 1 TB of optical memory in 1 cubic centimeter. I told my college-age daughter about the Japanese announcement of 1 TB of optical memory in 1 cubic centimeter. Her reply was: Her reply was: “…We need to know how to think critically and select what kinds of projects and data we need to keep because the limiting factor will be our minds, not the technology.”

Comments and Questions…