Long-Lived Data Collections

Slides:



Advertisements
Similar presentations
Configuration management
Advertisements

SCD Research Data For UCAR Data Management Working Group January 10, 2001 Steven Worley Scientific Computing Division Data Support Section.
ICOADS Archive Practices at NCAR JCOMM ETMC-III 9-12 February 2010 Steven Worley.
ERA-Interim and ASR Data Management at NCAR
Cornell Institute for Digital Collections Digital Technologies and Access At Cornell University Peter B. Hirtle Cornell Institute for Digital Collections.
Lecture Nine Database Planning, Design, and Administration
Developing PANDORA Mark Corbould Director, IT Business Systems.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System 1 Zaihua Ji Doug Schuster Steven Worley Computational.
Different approaches to digital preservation Hilde van Wijngaarden Digital Preservation Officer Koninklijke Bibliotheek/ National Library of the Netherlands.
October 16-18, Research Data Set Archives Steven Worley Scientific Computing Division Data Support Section.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
Preserving the Scientific Record: Establishing Relationships with Archives Matthew Mayernik National Center for Atmospheric Research Version 1.0 Review.
Research Data at NCAR 1 August, 2002 Steven Worley Scientific Computing Division Data Support Section.
15 Maintaining a Web Site Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section.
Section 15.1 Identify Webmastering tasks Identify Web server maintenance techniques Describe the importance of backups Section 15.2 Identify guidelines.
Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on.
ACCESS for VALIDITY ACCESS for INNOVATION. Starting January 2011 for NEW proposals Not voluntary – “integral part” of proposal and FastLane Required for.
Presented By: Steven Chenery Chief Executive Officer.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
Scientific Investigations; Support from Research Data Archives for Joint Office for Science Support 26 February, 2002 Steven Worley SCD/DSS.
OCLC Online Computer Library Center Digital Preservation with OCLC Digitization Standards: Issues & Updates Taylor Surface, OCLC.
Managing the Impacts of Change on Archiving Research Data A Presentation for “International Workshop on Strategies for Preservation of and Open Access.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
RUBRIC IP1 Ruben Botero Web Design III. The different approaches to accessing data in a database through client-side scripting languages. – On the client.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Review of Meteorological Data Sharing Project in China Anyuan XIONG (National Meteorological Information Center, CMA, CHINA)
Digital Preservation across the technologies, strategies, open standards & interoperability aspects including the legal issues Pratik Shrivastava Scientist.
29 March 2004 Steven Worley, NSF/NCAR/SCD 1 Research Data Stewardship and Access Steven Worley, CISL/SCD Cyberinfrastructure meeting with Priscilla Nelson.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Data Management Lesley A. Brown Director of Proposal Development.
03/08/1999UT Austin: GSLIS LIS Information Management LIS /8/99 Martha Richardson.
The Research Data Archive at NCAR: A System Designed to Handle Diverse Datasets Bob Dattore and Steven Worley National Center for Atmospheric Research.
INTRODUCTION TO DOCUMENT AUTHORING AND ELECTRONIC PUBLISHING.
5-7 May 2003 SCD Exec_Retr 1 Research Data, May Archive Content New Archive Developments Archive Access and Provision.
Training Course on Data Management for Information Professionals and In-Depth Digitization Practicum September 2011, Oostende, Belgium Concepts.
A41I-0105 Supporting Decadal and Regional Climate Prediction through NCAR’s EaSM Data Portal Doug Schuster and Steve Worley National Center for Atmospheric.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
The SCEC CSEP TESTING Center Operations Review
Meemim's Microsoft Azure-Hosted Knowledge Management Platform Simplifies the Sharing of Information with Colleagues, Clients or the Public MICROSOFT AZURE.
Microsoft Azure-Powered BlueCielo Meridian360 Portal Improves Asset Data Integrity and Facilitates Secure Collaboration with External Stakeholders MICROSOFT.
The Organisation As A System An information management framework
Keyhub Identity and Access Management App is Powered by Azure and Offers Customers Easy Authentication, Authorization for Mobile Devices MICROSOFT AZURE.
TIGGE Archives and Access
Publishing and Maintaining a Website
Section 15.1 Section 15.2 Identify Webmastering tasks
TIGGE Data Archive and Access System at NCAR
Interlake Hybrid Cloud Management Suite
The Only Digital Asset Management System on Microsoft Azure, MediaValet Is Uniquely Equipped to Meet Any Company’s Needs MICROSOFT AZURE ISV PROFILE: MEDIAVALET.
Operational Dataset Update Functionality Included in the NCAR Research Data Archive Management System Zaihua Ji Doug Schuster Steven Worley Computational.
Implementing an Institutional Repository: Part II
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
MANAGING DATA RESOURCES
XtremeData on the Microsoft Azure Cloud Platform:
Research Data Archives at NCAR
Steven Worley, NSF/NCAR/SCD
CISL’s Research Data Archive (RDA) : Description and Methods
Comeaux and Worley, NSF/NCAR/SCD
SOFTWARE DEVELOPMENT LIFE CYCLE
digital archival storage
Data Management Components for a Research Data Archive
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Robert Dattore and Steven Worley
Successful Data Curation for Large Data Archives
Data Curation in Climate and Weather
Comeaux and Worley, NSF/NCAR/SCD
Executive Sponsor: Tom Church, Cabinet Secretary
Presentation transcript:

Long-Lived Data Collections Outline Data Archiving Data Maintenance Data Migration 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Archiving At NCAR Research data archive (RDA) - 95% LLDC Meteorological and physical oceanographic data Built over 35+ years 500+ datasets, 25 TB, growing daily Nine data stewards (grad. degrees in met./ocn.) 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Archiving Monthly Mean Air Temperature at 2m One Example – ERA-40 Global Atmospheric Reanalysis 1957-2002 Many reference frames Pressure surfaces Isentropic ….. Many resolutions 2.5 ° Spectral , N80 … Expect O(1000) users 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Archiving Practices and Policies Save 2x copies Offsite backup under different management system Time stable attributes No proprietary data formats Access software in basic languages Fortran, C,… Minimize software dependence on complex libraries E.g. netCDF, HDF 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Archiving P&P, continued Shared Responsibility – Cross Agency For large collections, e.g. ERA-40, 35 TB Two step archive plan 1st: Data stays with PI - distributes Applies stewardship, QC, analysis, documentation 2nd: Mature data transferred to an archive center Long-term preservation and continued access Should an archive plan be part of a NSF proposal? Fits into “broader impacts” Include data formats and metadata 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Archiving P&P, Continued Data compression Important for efficient storage and transport Use open standards Submission to a data center Early submission advantages Data, captured before $ runs out Unburdens PI from data management Greater sharing = more science knowledge gains Disadvantages PI first evaluation rights Not a problem now Authorization and authentication 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Maintenance Practices and Policies Use change control system, all transaction Creation File additions, fixes, replacement Metadata updates The data and metadata remain tightly linked. Note: this system itself, viable for decades Same principles as the archive Employ science data stewards Additional insurance for accurate data preservation 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Maintenance P&P Do data integrity checks Monitor all network transfers for faults Receipt and reconciliation reports Many checks; byte counts, test files, comparisons Keep user information current Changes trigger web page updates 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Maintenance P&P - Concerns Fact: Huge collections of web based documentation. Text, Images, Links Embedded scripting (e.g. java script …) HOW DO YOU ARCHIVE WEB SITES? Access content 20 years from now? Data in DBMS’s Software dependent Not viable for LLDC’s – technology trap 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Maintenance P&P Use standard metadata Version control Lineage documentation Publication documentation Preservation status LLDC’s are seldom static New metadata, data corrections, new links Need flexible maintenance methods 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration Example SCD/NCAR Mass Storage System RDA plus MUCH MORE NCAR super computers NCAR data analysis machines Other NCAR/UCAR Divisions and Programs How much data is a LLDC? Ongoing debate with our users/scientists data storage policies? 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration Scales of the problem (ref., 01/21/2004) 21.5 Million Files, 1.7 PB Growth 50 TB/month total 1 Million file moves per month 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration NCAR MSS 1986-2003 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration History Since 1986, 5 migrations All tape media NCAR MSS software, scalable Software and system changes may trigger migrations Future Media replacement 20 & 60 GB/Cart.  200 GB/cart. Multi-phased plan, 2 years 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration Migration factors Done interleaved with normal operations Almost continuous now Tape life cycle “probably” 6-10 years BUT, nominal service may be 3-5 years allow for migration time Option: Extend nominal service Deploy dedicated migration system Unlikely – too expensive 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Data Migration Practices and policies Need to define data life cycle at creation time E.g. if retention = 5-years no migration is necessary Recognize, difficult decision for scientist May not be known a priori Allow for adjustable retention period Allow for peer review Advantage use the full life cycle of the media Disadvantage complex storage systems Various media types and end-of-life dates Recognize, LLDCs (if irreplaceable) data must be migrated 7 May 2019 Steven Worley, NCAR/SCD

Steven Worley, NCAR/SCD Conclusions: Need an archive plan for LLDCs Maintain LLDCs with data stewards and curation experts Need integrated data migration plans and data retention policies If LLDCs are irreplaceable data, preserve in perpetuity 7 May 2019 Steven Worley, NCAR/SCD