Physical Media WG Report Physical Media Working Group August 13, 2007.

Slides:



Advertisements
Similar presentations
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
Advertisements

Basic Principles of PACS Networking Emily Seto Medical Engineering/SIMS Center for Global eHealth Innovation April 29, 2004.
Windows Server ® 2008 File Services Infrastructure Planning and Design Published: June 2010 Updated: November 2011.
XenData SX-520 LTO Archive Servers A series of archive servers based on IT standards, designed for the demanding requirements of the media and entertainment.
STANFORD UNIVERSITY INFORMATION TECHNOLOGY SERVICES IT Services Storage And Backup Low Cost Central Storage (LCCS) January 9,
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA, SAN DIEGO Disk and Tape Storage Cost Models Richard Moore & David Minor San Diego Supercomputer.
11© 2011 Hitachi Data Systems. All rights reserved. HITACHI DATA DISCOVERY FOR MICROSOFT® SHAREPOINT ® SOLUTION SCALING YOUR SHAREPOINT ENVIRONMENT PRESENTER.
Developing a Records & Information Retention & Disposition Program:
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Components and Architecture CS 543 – Data Warehousing.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Computers: Tools for an Information Age
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
An Introduction to DuraCloud Carissa Smith, Partner Specialist Michele Kimpton, Project Director Bill Branan, Lead Software Developer Andrew Woods, Lead.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
IBM TotalStorage ® IBM logo must not be moved, added to, or altered in any way. © 2007 IBM Corporation Break through with IBM TotalStorage Business Continuity.
XenData Digital Archives Simplify your video archive workflow XenData LTO Video Archive Solutions Overview © Copyright 2013 XenData Limited.
P l a n e t a r y D a t a S y s t e m 1 Data Integrity Compilation Data Integrity Compilation Presented by: Al Schultz Location: PDS Management Council.
P l a n e t a r y D a t a S y s t e m 1 Data Integrity Compilation Data Integrity Compilation Presented by: Al Schultz Location: PDS Management Council.
Copyright © 2003 by Prentice Hall Computers: Tools for an Information Age Chapter 14 Systems Analysis and Design: The Big Picture.
Preserving Electronic Mailing Lists: The H-Net Archive H-Net Mapped to the OAIS Model Preservation AssessmentPreservation improvementsOverview How H-Net.
1 Chapter 12 File Management Systems. 2 Systems Architecture Chapter 12.
Virtualization in the NCAR Mass Storage System Gene Harano National Center for Atmospheric Research Scientific Computing Division High Performance Systems.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Big Data Challenges in the Federal Government July 8, 2013.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Service Transition & Planning Service Validation & Testing
Relationships July 9, Producers and Consumers SERI - Relationships Session 1.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
Tracy Bierman August 17, 2011 A Proposal to Archive Shuttle Records in the Cloud.
Update from the Data Integrity & Tracking WG Management Council F2F UCLA Los Angles, CA August 13-14, 2007
1 Chapter Nine Conducting the IT Audit Lecture Outline Audit Standards IT Audit Life Cycle Four Main Types of IT Audits Using COBIT to Perform an Audit.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Component 8/Unit 9bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 9b Creating Fault Tolerant.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Verification & Validation F451 AS Computing. Why check data? It’s useless if inaccurate. Also, wrong data: Can be annoying Can cost a fortune Can be dangerous.
29 Nov 2006PDS MC NSSDC MOU history PDS-NSSDC MOU circa 1994 Reviewed in Jan 2003, June 2004, Oct 2005, Nov 2006 Add words to remove HQ changes Change.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Data Integrity Issues: How to Proceed? Engineering Node Elizabeth Rye August 3, 2006
Types of Processing of Data www. ICT-Teacher.com.
Development Report Engineering Node August 2006
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
Storage Why is storage an issue? Space requirements Persistence Accessibility Needs depend on purpose of storage Capture/encoding Access/delivery Preservation.
San Diego Super Computing (SDSC) Testing Summary and Recommendations Physical Media Working Group March 2008.
Oman College of Management and Technology Course – MM Topic 7 Production and Distribution of Multimedia Titles CS/MIS Department.
Physical Media WG Update November 30, Fall 2006 WG Status Initial technology plan Use Cases Decision Matrix Media Testing NMSU Website Physical.
Verification & Validation
Data Integrity Management Council F2F Washington, D.C. November 29-30, 2006
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
SOFTWARE ARCHIVE WORKING GROUP (SAWG) REPORT TODD KING PDS MANAGEMENT COUNCIL MEETING FEB. 4-5, 2016.
NSSDC Deep Archive / PDS4/ CDs-DVDs/Hard-Copy Archive 28 August 2012 Dave Williams.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
Planetary Data System (PDS) Tom Morgan November 24, 2014.
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
HUIT Cloud Initiative Update November, /20/2013 Ryan Frazier & Rob Parrott.
PDS4 Project Report PDS MC F2F UCLA Dan Crichton November 28,
PDS4 Project Report PDS MC F2F University of Maryland Dan Crichton March 27,
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
BACKUP AND RESTORE. The main area to be consider when designing a backup strategy Which information should be backed up Which technology should be backed.
Introduction for the Implementation of Software Configuration Management I thought I knew it all !
Building A Repository for Digital Objects
Joseph JaJa, Mike Smorul, and Sangchul Song
Technical Issues in Sustainability
IBM Tivoli Storage Manager
Presentation transcript:

Physical Media WG Report Physical Media Working Group August 13, 2007

2 PDS MC Policies on Media, Integrity and Backup Data Delivery Policy – Data producers shall deliver one copy of each archival volume to the appropriate Discipline Node using means/media that are mutually acceptable to the two parties. The Discipline Node shall declare the volume delivery complete when the contents have been validated against PDS Standards and the transfer has been certified error free. – The receiving Discipline Node is then responsible for ensuring that three copies of the volume are preserved within PDS. Several options for "local back-up" are allowed including use of RAID or other fault tolerant storage, a copy on separate backup media at the Discipline Node, or a separate copy elsewhere within PDS. The third copy is delivered to the deep archive at NSSDC by means/media that are mutually acceptable to the two parties. (Adopted by PDS MC October 2005) Archive Integrity Policy - – Each node is responsible for periodically verifying the integrity of its archival holdings based on a schedule approved by the Management Council. Verification includes confirming that all files are accounted for, are not corrupted, and can be accessed regardless of the medium on which they are stored. Each node will report on its verification to the PDS Program Manager, who will report the results to the Management Council. (Adopted by MC November 2006)

3 Question presented to MC in March Given on-going DVD reliability problems observed in PDS and observed by the industry, should WG evaluate moving to mass storage for archiving of volumes vs DVD? – A question was then raised by the MC as to what the node capabilities were in terms of online RAID distribution and whether movement away from DVD would be feasible for PDS. Actions – Presented to MC. Recommended that data integrity checking be done on regular schedule for both optical and RAID. – Presented results at Archives in 21st Century Meeting – Performed a survey to understand node capabilities – Developed use cases, requirements for archive availability (primary, secondary/backup, deep archive) – Defined options for the secondary/backup for consideration by MC and nodes

4 Process Define requirements for data availability – Media/Technology Independent Conduct a survey of the PDS nodes to determine current capability for online repository management (capacity, operating system, etc) – Determine data holdings across PDS that are online vs offline – Determine data holdings using media/storage technologies Develop options for the “3 copies” of data – Compare media management costs (optical, RAID, tape) – Understand migration challenges and scope – Deliver electronic volumes to NSSDC

5 Survey Respondents The following PDS Nodes, SubNodes, and Data Nodes responded to the Repository Survey: – (1) RS-- [RS] Richard Simpson – (2) SBN-- [AR] Anne Raugh – (3) NAIF-- [BS] Chuck Acton / Boris Semenov – (4) ATMOS-- [LH] Lyle Huber – (5) GEO-- [TS] Thomas Stein – (6) IMG-Flagstaff-- [CI] Chris Isbell – (7) IMG-HiRise-- [EE] Eric Eliason – (8) IMG-KPL-- [MM] Myche McAuley – (9) IMG-LROC-- [EB] Ernest Bowman – (10) IMG-Themis-- [KM] Kimberly Murray – (11) SBN-PSI-- [CN] Carol Neese – (12) PPI-UCLA-- [TK] Todd King – (13) RINGS-- [MG] Mitch Gordon

6 Survey Purpose and Findings The PMWG distributed a survey to: – Characterize storage infrastructure of PDS nodes, subnodes and data nodes – Identify the amount of PDS data held in various media technologies (and online vs offline) – Understand current backup capabilities at nodes High Level Findings – > 95% of PDS data appears to be held in RAID Much has already been migrated – 7 out of 13 respondents identified a backup source – 12 out of 13 serve data electronically via a primary online RAID repository Many of these are CD, DVD sources with two sites using tape – Three copies of data may not exist (or may not exist in any real organized way) Data sets not delivered to NSSDC Data managed online may/may not be backed up at a suitable site – Cost to manage storage varies across nodes It’s clear the cost is non-linear (I.e. costs drop as volumes increase) for RAID management and it appears to be more linear for CD, DVD and Tape The WG also noted that optical management costs are more linear – Large data increases seen in out years (particularly in imaging)

7 Data Volumes Matrix from Survey* CURRENT DATA VOLUME (TB) 100% (NAS)6IMG (HiRISE) 5% (OTHER) 95% (RAID)33IMG (JPL) 100% (RAID)4PPI 100% (RAID)25.5IMG (THEMIS) 100% (RAID)9.5IMG (USGS) 100% (RAID)-RINGS 100% (RAID)18GEO 15% (TAPE) 85% (RAID)1.5ATMOS 100% (RAID).1NAIF 100% (RAID).4SBN (PSI) 100% (RAID)3.5SBN (Maryland) 100% (CD/DVD) 0%-RS % ARCHIVE OPTICAL (FY07) % ARCHIVE RAID (FY07) CURRENT ONLINE CAPACITY NODE Totals: ~54+ TB ~100+ TB * Primary Repository

8 Preliminary* Survey Costing Results for Archival Storage * Costs varying widely from node-to-node. This is an initial average cost based on four years with a few samples of data. Later MC request is do we want to compare to industry numbers. NOTE (1): If you factor in data integrity, all media seems to have on-going costs to verify its usability. NOTE (2): Cost-wise, optical media makes more sense for smaller volumes of data. NOTE (3): PMWG estimates (based on PDS input) that costs start to level out at roughly $4.5K/TB/Year as the data increases to larger volumes.

9 Backup Options* $1000/TB/YearOnline RAIDExternalSDSC Cost of hardware + $20K to $50K labor Replicate Node Systems Node Independent NSSDC $400/TB/YearNear-line Tape (served by robotic tape device) ExternalSDSC $12000/TB/YearVariesExternalCommercial CostMedia TypeType Solution * The PMWG noted that one of the key areas that needs to be addressed are plans for backing up the archive (e.g., operational copy of data for recovery of a node) Internal Node to Node Cost of media + labor LTO Tape

10 San Diego Super Computing Center (SDSC) Presented by Reagan Moore at Archives in 21st Century Meeting – SRB has been “extremely successful” – They have expressed interest in being a backup site for PDS SDSC provides storage services and software for managing data collections online, near-line, etc – Flagship product is the Storage Resource Broker (SRB) which provides Persistent naming of distributed data Management of data stored in multiple types of storage systems Organization of data as a shared collection with descriptive metadata, access controls, audit trails Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections – SDSC offers a storage service storing data at San Diego based on a Service Level Agreement Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS Goal has been generic infrastructure for distributed data

11

12 PMWG Recommendations to PDS MC ● Develop “framework” report for nodes to provide information to PDS Management regarding backup and integrity plan (and their local implementation). Roll into a PDS plan and post at MC website. ● Explore use of SDSC services at SDSC for PDS as a PDS near- line backup source PMWG would explore use of SRB clients PMWG would test putting data at SDSC PMWG will report back to the MC on results for a full recommendation Given that NSSDC-PDS I/F is effectively, ready, start transferring the electronic volumes to NSSDC Transfer those data sets where the NSSDC MPGA client will run (Unix-based) Work with NSSDC on beta testing Windows version of software Continue to develop PDS storage projections and defining how they map to media solutions (e.g., produce a high level PDS storage architecture diagram from the survey, etc) Determine whether to migrate volumes not online? Survey showed that a small percentage is not online. Some are old volumes (e.g., Magellan DODR and FBIDR, several hundred SDDPT volumes at EN)

13 Backup

14 Survey Cost Results Data System Capacity and Costs (excluding LROC) – Current capacity is 100 TB, storage used is 50 TB. – Total system cost (including FY07 labor) is on the order of $940K or $9.5K per TB of storage capacity but varies widely between nodes. – Labor cost is in the ballpark of $200K per year. – Data acquisitions are on the order of 200 TB for the next 3 years and replenishment costs are on the order of $1000K per year ($5K per TB). Most acquisitions are at IMG node.

15 Preliminary Hardware Architecture Diagram*