Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation.

Slides:



Advertisements
Similar presentations
The Messy World of Grey Literature in Cyber Security 8 th Grey Literature Conference 4-5 December 2006 New Orleans, Louisiana Patricia Erwin – I3P Senior.
Advertisements

Global Change Master Directory (GCMD) Strategic Plan Wyn Cudlip BNSC/QinetiQ Presentation to IDN Task Team, WGISS25.
Peter Griffith and Megan McGroddy 4 th NACP All Investigators Meeting February 3, 2013 Expectations and Opportunities for NACP Investigators to Share and.
Corry Bregendahl Leopold Center for Sustainable Agriculture Ames, Iowa.
TDWG, October 19, 2008 Giri Palanisamy, Metadata Management – Mercury (  Current strategy and standards:
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
Symposium on Digital Curation in the Era of Big Data: Career Opportunities and Educational Requirements Workforce Demand and Career Opportunities From.
Tom Sheridan IT Director Gas Technology Institute (GTI)
introduction to MSc projects
1 ORNL DAAC: Data and Services Robert Cook and Suresh SanthanaVannan Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN Presentation.
Data preservation & the Virtual Observatory Bob Mann Wide-Field Astronomy Unit Royal Observatory Edinburgh
NOAA Metadata Update Ted Habermann. NOAA EDMC Documentation Directive This Procedural Directive establishes 1) a metadata content standard (International.
Elements of a Data Management Plan Alison Boyer Environmental Sciences Division Oak Ridge National Laboratory.
EZID (easy-eye-dee) is a service that makes it simple for digital object producers (researchers and others) to obtain and manage long-term identifiers.
Effective Methods for Software and Systems Integration
THOMSON SCIENTIFIC Web of Science 7.0 via the Web of Knowledge 3.0 Platform Access to the World’s Most Important Published Research.
1 WARINTEK ? WARINTEK (Technology Information Kiosk) is Indonesian Multipurpose Community Telecentres that could encourage and support.
Providing Access to Your Data: Tracking Data Usage Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
1 CDIAC Data Support for SPRUCE and NGEE Les A. Hook and Ranjeet Devarakonda Environmental Sciences Division Oak Ridge National Laboratory CDIAC User Working.
The Case for Data Stewardship: Preserving the Scientific Record Matthew Mayernik National Center for Atmospheric Research Version 2.0 [Review Date]
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
For each of the Climate Literacy and Energy Literacy Principles, a dedicated page on the CLEAN website summarizes the relevant scientific concepts and.
Research Data Management Services Katherine McNeill Social Sciences Librarians Boot Camp June 1, 2012.
Recordkeeping for Good Governance Toolkit Digital Recordkeeping Guidance Funafuti, Tuvalu – June 2013.
Dataset Citation: From Pilot to Production Mark Martin Assistant Director, Office of Scientific and Technical Information U.S. Department of Energy.
Challenges in Urban Meteorology: A Forum for Users and Providers OFCM Workshop Summaries Lt Col Rob Rizza Assistant Federal Coordinator for USAF/USA Affairs.
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
CODATA 2006 Beijing - E-Science Session The Role of Scientific Data in e-Science: How Do We Preserve All Necessary Data So They are Useful John Rumble.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
CCSM DATA MANGEMENT POLICY The Community Climate System Model (CCSM) Data Management Policy documents the procedures for the management of model data produced.
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
Preservation Strategies: Framing The Approach Nancy Hoebelheinrich Knowledge Motifs LLC Data Management Workshop American Geophysical.
Evidence and Standard Two. The Big Picture The Big Picture Standard Two is about: Curriculum Curriculum Planned, overseen curriculum – with clear outcomes.
WK 13 - How to Prepare Ecological Data Sets for Effective Analysis and Sharing 2:00 PM-5:00 PM August 1 st, 2010.
1 NARSTO Quality Systems Science Center Les A. Hook and Sigurd W. Christensen NARSTO QSSC Environmental Sciences Division Oak Ridge National Laboratory.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
3/30/04 16:14 1 Lessons Learned CERES Data Management Presented to GIST 21 “If the 3 laws of climate are calibrate, calibrate, calibrate, then the 3 laws.
DataONE: Preserving Data and Enabling Data-Intensive Biological and Environmental Research Bob Cook Environmental Sciences Division Oak Ridge National.
Peter Granda Archival Assistant Director / Data Archives and Data Producers: A Cooperative Partnership.
DRAFT EDMC Procedural Directives NOAA Environmental Data Management Committee 12/3/2015 1
1 Understanding Cataloging with DLESE Metadata Karon Kelly Katy Ginger Holly Devaul
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Cyberinfrastructure to promote Model - Data Integration Robert Cook, Yaxing Wei, and Suresh S. Vannan Oak Ridge National Laboratory Presented at the Model-Data.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Archiving microdata Standards and good practices United Nations Statistics Commission New York, February 26, 2009 Olivier Dupriez World Bank, Development.
Special Considerations for Archiving Data from Field Observations A Presentation for “International Workshop on Strategies for Preservation of and Open.
RAARMM Atmospheric Radiation Measurement Regional Databases and Archives: the Effects of Scale… A Presentation for “Scalable Information Networks for the.
Data Archives: Brokers for DART Improving Data Access and Research Transparency in Switzerland November 7, 2014 Bern Brian Kleiner, FORS.
U.S. Department of the Interior U.S. Geological Survey Decision Support Tools and USGS Data Management Best Practices Cassandra Ladino USGS Chesapeake.
The ENGAGE Workshop: Encouraging Networks between Geoscience and Geoscience Education Nicole LaDue, Northern Illinois University ; Michael Hubenthal, John.
HDF and HDF-EOS: Implications for Long-Term Archiving and Data Access.
End-to-End Data Services A Few Personal Thoughts Unidata Staff Meeting 2 September 2009.
Preliminary Findings Baseline Assessment of Scientists’ Data Sharing Practices Carol Tenopir, University of Tennessee
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI strategy and Grand Vision Ludek Matyska EGI Council Chair EGI InSPIRE.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Session 6: Data Flow, Data Management, and Data Quality.
Program Assessment – an overview Karen E. Dennis O: sasoue.rutgers.edu.
Chang, Wen-Hsi Division Director National Archives Administration, 2011/3/18/16:15-17: TELDAP International Conference.
BG 5+6 How do we get to the Ideal World? Tuesday afternoon What gaps, challenges, obstacles prevent us from attaining the vision now? What new research.
1. 2 NOAA’s Mission To describe and predict changes in the Earth’s environment. To conserve and manage the Nation’s coastal and marine resources to ensure.
Biological and Chemical Oceanography Data Management Office slide 1 of 22 Introduction to Data Management for Ocean Science Research Cyndy Chandler Biological.
Effective Action Planning Strategies to Ensure Your Employee Survey Leads to Tangible Improvements Presented by: Matt Roddan ORC International’s Employee.
Helmholtz Open Science Webinars on Research Data Webinar 34 – 6 / 11 April 2016 Dr. Birgit Schmidt Niedersächsische Staats- und Universitätsbibliothek.
Oak Ridge National Laboratory*
Joslynn Lee – Data Science Educator
Data Management: Documentation & Metadata
Bird of Feather Session
Successful Data Curation for Large Data Archives
Presentation transcript:

Managing the Impacts of Programmatic Scale and Enhancing Incentives for Data Archiving A Presentation for “International Workshop on Strategies for Preservation of and Open Access to Scientific Data” June 22, 2004 Beijing, China Raymond McCord Oak Ridge National Laboratory* Oak Ridge, Tennessee, USA *Oak Ridge National Laboratory is operated by UT-Battelle, LLC, for the U.S. Department of Energy under contract DE-AC05-00OR22725

Credits Concepts presented here are derived from 25+ years of managing data for environmental projects. Concepts presented here are derived from 25+ years of managing data for environmental projects. Variations of the concepts have been observed from these disciplines. Variations of the concepts have been observed from these disciplines. plant community research plant community research impact assessment in marine systems impact assessment in marine systems acid rain surveys acid rain surveys environmental monitoring and cleanup projects at DOE facilities environmental monitoring and cleanup projects at DOE facilities land use assessment land use assessment climate change research (atmospheric research) climate change research (atmospheric research) These concepts are believed to extend to other scientific disciplines. These concepts are believed to extend to other scientific disciplines.

Presentation Strategy Archiving and science Archiving and science Making connections Making connections Enhancing incentives for archiving Enhancing incentives for archiving Impacts of scale Impacts of scale Volume (files and bytes) Volume (files and bytes) Diversity Diversity Timing Timing Longevity Longevity

Source: American Scientist,Vol 886 p 525. You can’t keep running in here and demanding data every two years Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving. Challenge: engage scientists in the process of archiving their data and provide the mechanism for archiving.

Quotes from Raymond “Storing data is easy. Finding and using data later is not.” “Storing data is easy. Finding and using data later is not.” “Systematically and consistently organized data does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.” “Systematically and consistently organized data does not occur without cost. Consider the results from previous science projects with no extra effort for data archiving.” “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.” “The natural tendency over time for data and information is chaos. Effort must be exerted to overcome this.” “Successfully managed data by projects may not be ready to be archived.” “Successfully managed data by projects may not be ready to be archived.”

Archive Functions Store data Store data Submitted by others Submitted by others Build a catalog and structure Build a catalog and structure Maintain storage across technology generations Maintain storage across technology generations Review new data (QA, metadata) Review new data (QA, metadata) “Advertise” contents “Advertise” contents Find data for users Find data for users Query and browse logic Query and browse logic Distribute data Distribute data Provide access to data Provide access to data References to documentation References to documentation

Presumptions about Archiving Information sharing is important. Information sharing is important. Multidisciplinary data access will foster more robust scientific discoveries. Multidisciplinary data access will foster more robust scientific discoveries. Archiving can be improved. Archiving can be improved. The “neurons” of archives are metadata. The “neurons” of archives are metadata. The limited number of permanent data archives will increase. The limited number of permanent data archives will increase. An expectation from “the Internet” An expectation from “the Internet”

Why Archive?? “I am doing Science. Trust me.”

Cycles of Research “An Information View” Planning Automation and review Information review Problem Definition (Research Objectives) Analysis and modeling Planning Measurement Collection Selection and extraction Archive of Data Publications Original Observations Secondary Observations 200 yrs 25 yrs

“Why Don’t I Archive My Data?” No incentives - What’s in it for me? No incentives - What’s in it for me? No acknowledgment - Does a dataset = a paper? No acknowledgment - Does a dataset = a paper? Give up publication rights - Will somebody scoop me? Give up publication rights - Will somebody scoop me? Poor planning - It was not in “the Plan”. Poor planning - It was not in “the Plan”. No resources - Who’s going to pay for it? No resources - Who’s going to pay for it? No future – Who will support this later? No future – Who will support this later? Lack of training - What do I do first? Lack of training - What do I do first? Unsure about metadata content - How much is enough? Unsure about metadata content - How much is enough?

“Why Should I Archive My Data?” (management hints!) Career advancement (give them credit) Career advancement (give them credit) Scientists need to get some recognition for archiving. Scientists need to get some recognition for archiving. Consider scientific journals that also provide companion “data publications”. Consider scientific journals that also provide companion “data publications”. “It may help me do science with broader view.” “It may help me do science with broader view.” Good scientific practice (create peer pressure) Good scientific practice (create peer pressure) Professional development (give them training) Professional development (give them training) Provide daily interactions between scientific and information specialists. Provide daily interactions between scientific and information specialists. Allow a reasonable time for initial discovery. Allow a reasonable time for initial discovery. Provide support for long-term “stewardship”. (Who will answer the questions after the project is completed?) Provide support for long-term “stewardship”. (Who will answer the questions after the project is completed?)

“Why Should I Archive My Data?” (more management hints!!) Institutional incentives (Have plans AND expectations) Institutional incentives (Have plans AND expectations) Archiving should be required by the sponsor. Archiving should be required by the sponsor. Data archiving is “in the plan” and resources are available to support it. Data archiving is “in the plan” and resources are available to support it. Interweave archiving with the planning and publication processes. Interweave archiving with the planning and publication processes. Technological advances (Give them hardware and software) Technological advances (Give them hardware and software) It is technically easier now and there are more options. It is technically easier now and there are more options. Consistent “self-discipline” is still challenging. Consistent “self-discipline” is still challenging.

“Why Should I Archive My Data?” (still more management hints!!!) “Change” will be managed. (Have standards AND flexibility!!??) “Change” will be managed. (Have standards AND flexibility!!??) Change is inherent in research. Change is inherent in research. Managing change without prior planning can become consumptive. Managing change without prior planning can become consumptive. Changes may cause confusion and diminish data usefulness. Changes may cause confusion and diminish data usefulness. A BIG issue – more details during tomorrow’s panel discussion on “Management and Technical Issues” A BIG issue – more details during tomorrow’s panel discussion on “Management and Technical Issues”

Archiving Supports Better Science The metadata required for archiving will improve data quality. The metadata required for archiving will improve data quality. Archiving extends data usefulness. Archiving extends data usefulness. Archived data increases your information base for doing research: Archived data increases your information base for doing research: More data volume and diversity More data volume and diversity Proper archives permit the replication of results. Proper archives permit the replication of results. A KEY concept of Science

The Effects of Project Scale on Archives “Metadata are archive neurons??”

Metadata Depends on Your “World View” Investigator Investigator Doesn’t need extensive formal metadata Doesn’t need extensive formal metadata Project Project Metadata needed for project integration and modeling activities may be limited Metadata needed for project integration and modeling activities may be limited Project data manager may help write metadata Project data manager may help write metadata Data archive Data archive More detailed metadata (e.g., spatial coordinates) More detailed metadata (e.g., spatial coordinates) More standardization (e.g., keywords) to communicate clearly with future users More standardization (e.g., keywords) to communicate clearly with future users Who writes the metadata? Who writes the metadata?

Measurement An Initial View of Data…

Measurement Single Experiment View date sample ID parameter name location

Measurement Research Project View QA flag media date sample ID parameter name location

Measurement Long-term or Multidisciplinary View QA flag media generator method date sample ID parameter name location records Units

Measurement Integrated System & Archive View QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

Another View of Scale

Program Project Scale and Recorded Metadata PIMetadataGroupArchive Increasing User Scope Units Method QA flag Media Parameter name Measurement Date Sample ID Location Generator Records

Data Maturation and Scale Individual Investigators Individual Investigators collect data, quality assure, document, analyze, publish collect data, quality assure, document, analyze, publish Groups or Science Teams Groups or Science Teams collate data, enhance, synthesize, model, publish collate data, enhance, synthesize, model, publish Project Information System Project Information System collate data, review completeness, maintain data for project collate data, review completeness, maintain data for project Data Distribution and Archive Center Data Distribution and Archive Center long-term archive, distribute freely to users long-term archive, distribute freely to users Master Data Directory Master Data Directory searchable index with pointers to data searchable index with pointers to data

Preparing for Archiving I will not wait. I will not …

Measurement Generic Environmental Data Model (Which Piece Is First…?) QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

Measurement Sequence of Information Birth QA flag media generator method date sample ID parameter name location records Units Sample def. type date location generator lab field Method def. words, words units method Parameter def. org.type name custodian address, etc. coord. elev. type depth Record system date words, words. QA def. Units def. GIS

Research ~ Publishing ~ Metadata Metadata design can be a “checklist” for research planning. Metadata design can be a “checklist” for research planning. Metadata preparation can be integrated with publication process. Metadata preparation can be integrated with publication process. Metadata are an investment in current and future science. Metadata are an investment in current and future science.

Summary Points Incentives to archive data are a “management responsibility”. Incentives to archive data are a “management responsibility”. “Management” should understand the “Big Picture” “Management” should understand the “Big Picture” The impacts of scale on archiving. The impacts of scale on archiving. Archives need structure and standards. Archives need structure and standards. Solutions include more than additional technology. Solutions include more than additional technology. New behavior is also VERY important. New behavior is also VERY important. Metadata are the “neurons” of Archives. Metadata are the “neurons” of Archives. Early metadata are better than later. Early metadata are better than later. The planning and decisions about archiving needs to be intentional and not accidental. The planning and decisions about archiving needs to be intentional and not accidental.

Future Thoughts Will we be able to know “Where are we?” as the capacity of information technology continues to expand? Will we be able to know “Where are we?” as the capacity of information technology continues to expand? How many 30 KB files are on a 100 GB tape cartridge? How many 30 KB files are on a 100 GB tape cartridge? The future limits will not be technology The future limits will not be technology But our minds… But our minds… We need to plan NOW about how to best leverage the future. We need to plan NOW about how to best leverage the future.

Looking Forward to a Future With Archives!!