Packard Campus for Audio Visual Conservation Formerly NAVCC Architecting for Data Integrity.

Slides:



Advertisements
Similar presentations
Tivoli SANergy. SANs are Powerful, but... Most SANs today offer limited value One system, multiple storage devices Multiple systems, isolated zones of.
Advertisements

Computing Infrastructure
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 1.
11 BACKING UP AND RESTORING DATA Chapter 4. Chapter 4: BACKING UP AND RESTORING DATA2 CHAPTER OVERVIEW Describe the various types of hardware used to.
2.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 2: Installing Windows Server.
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture.
Chapter 3 – Computer Hardware Computer Components – Hardware (cont.) Lecture 3.
NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
Update your servers to service pack 2. Ensure that the environment is fully functioning. Migrate to 64 bit servers is necessary. REVIEW UPGRADE BEST PRACTICES.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
November 2009 Network Disaster Recovery October 2014.
Chapter-4 Windows 2000 Professional Win2K Professional provides a very usable interface and was designed for use in the desktop PC. Microsoft server system.
Hands-On Microsoft Windows Server 2008
Elements of a Computer System Dr Kathryn Merrick Thursday 4 th June, 2009.
Chapter 2 Chapter 2: Planning for Server Hardware.
Chapter 2. Creating the Database Environment
EE616 Technical Project Video Hosting Architecture By Phillip Sutton.
Tutorial 11 Installing, Updating, and Configuring Software
14 Publishing a Web Site Section 14.1 Identify the technical needs of a Web server Evaluate Web hosts Compare and contrast internal and external Web hosting.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
Guide to Linux Installation and Administration, 2e 1 Chapter 9 Preparing for Emergencies.
1 Designing Storage Architecture for Digital Collections 2012.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
Experiment Management System CSE 423 Aaron Kloc Jordan Harstad Robert Sorensen Robert Trevino Nicolas Tjioe Status Report Presentation Industry Mentor:
Linux+ Guide to Linux Certification Chapter Six Linux Filesystem Administration.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Hands-On Microsoft Windows Server 2008 Chapter 7 Configuring and Managing Data Storage.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
ITMT 1371 – Window 7 Configuration 1 ITMT Windows 7 Configuration Chapter 8 – Managing and Monitoring Windows 7 Performance.
Photo recovery from water damaged XD memory card recovery-from-water-damaged-xd-memory-card.
Scientific Linux Inventory Project (SLIP) Troy Dawson Connie Sieh.
ORNL is managed by UT-Battelle for the US Department of Energy OLCF HPSS Performance Then and Now Jason Hill HPC Operations Storage Team Lead
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
PHD Virtual Technologies “Reader’s Choice” Preferred product.
Journey to the HyperConverged Agile Infrastructure
File-System Management
Compute and Storage For the Farm at Jlab
CompTIA Server+ Certification (Exam SK0-004)
XenData SX-10 LTO Archive Appliance
Integrating Disk into Backup for Faster Restores
Peer 2 Peer & Client Server
Now every configuration is possible
Overview – SOE PatchTT November 2015.
Intelligent Archiving for Media & Entertainment
Technology for Long-Term Digital Preservation
Experiences and Outlook Data Preservation and Long Term Analysis
Computing Fundamentals
Microsoft SharePoint Server 2016
Choosing the best storage method
CTA: CERN Tape Archive Adding front-ends and back-ends Status report
Bentley Project Reel Digitization Bentley Historical Library t
The Client/Server Database Environment
Computing Infrastructure for DAQ, DM and SC
Research Data Archive - technology
Storage Virtualization
Introduction to Computers
Distinguish between primary and secondary storage.
Special Promo Valid Until
Storage Basic recommendations:
Chapter 2: Operating-System Structures
CS246: Search-Engine Scale
Special Promo Valid Until
Chapter 2: Operating-System Structures
Special Promo Valid Until
STATEL an easy way to transfer data
Scott Miller TSM Team Lead Ray Mah Architect, Foundation
Presentation transcript:

Packard Campus for Audio Visual Conservation Formerly NAVCC Architecting for Data Integrity

Packard Campus for Audio Visual Conservation Formerly NAVCC 2 The Challenge “This is an Archive. We can’t afford to lose anything!”  Our customers are custodians to the history of the United States and do not want to consider the potential loss of content that is statistically likely to happen at some point. Solutions  Generate SHA1 at source and verify content after each copy  Store 2 copies of everything digital  Test and monitor for failures: Archive Integrity Checker and fix_damaged  Refresh the damaged copy from the good copy  Automate as much as possible  Acknowledge that someday we’re going to lose something  What’s that likelihood?  What costs are reasonable to reduce that?

Packard Campus for Audio Visual Conservation Formerly NAVCC Real World Example About 2 years ago another video archive with similar hardware and software reported errors in content staged from about 1 out of every 5 of their T10000B tapes  We had seen no unrecoverable errors in staging but that is not enough assurance  Given that we store a SHA1 digest for each file, we developed a simple perl program to stage content from our oldest tapes and compare the SHA1 values. No errors were found in the initial test of 600 tapes.  Something else in the environment or processes must be contributing to content loss.  We cannot afford to wait for problems to present.

Packard Campus for Audio Visual Conservation Formerly NAVCC Current Solutions Archive Integrity Checker  Contracted with a developer to write code to systematically check all the content and keep track of the status of each verification in a database  Group the files to be staged by tape and then sort by location on tape to improve efficiency  Requires some management as it must stage the content to our ingest storage thereby affecting our ingest speeds  Uses at least 1 tape drive  Generates 2X file size in throughput: write to disk and read for SHA1 Fix_damaged.pl  Uses sfind –damaged to identify files and then stages them to test whether the tape is bad or if there was a transient issue

Packard Campus for Audio Visual Conservation Formerly NAVCC Alternative Solution Verification flag in SAM (ssum). Similar in HPSS  Enabled via software for a collection of files (filesystem, directory, file(s))  Generates a 32 bit running checksum for file and stores in SAM’s meta data (viewable via sls –D)  File staged back to disk from tape and verified  Takes extra time and generates 2X additional throughput requirements  Make sure to architect it in  We turned it off  Can be turned on to generate only and use when files are staged back from tape

Packard Campus for Audio Visual Conservation Formerly NAVCC Bigger Problems File sizes are increasing. We will soon be processing 1 TB files  Audio: 1GB (96/24)  SD Video: 30 GB (mxf wrapped jpeg2000)  HD Video: 100’s GB  4K scan of film: 1 TB This means that a failed SHA1 will require retransmission  Costing hours of time and wasted resources  Typical individual PC disk speeds are 25 MB/s. Typical times to retransmit  Audio: < 1 min  SD Video < 30 min  HD Video < 3 hours  4K scan of film < 4 hours (assuming faster drives in this environment) Ensuring data integrity becomes more crucial as files sizes increase and monitoring for marginal errors greatly improves reliability and cost effectiveness of technological investments

Packard Campus for Audio Visual Conservation Formerly NAVCC Datapath Integrity Field (T10-DIF)

Packard Campus for Audio Visual Conservation Formerly NAVCC Oracle T10000C Solution T10-DIV  A variation of the T10-DIF (Data Integrity Field) that adds a CRC to each data block from the tape kernel driver to the tape drive (off/on/verify)  Verifies the FC path to the tape drive and verifies each write to tape by reading afterward  Requires Solaris 11.1 and SAM 5.3  Can be used to verify the tape content without staging back to disk  We look forward to a tape to tape migration that will use this information to validate the content read from tape during the migration. This assumes very few files are deleted and repacking will provide little value. True for our archive.

Packard Campus for Audio Visual Conservation Formerly NAVCC Oracle T10000C DIV Performance T10-DIV  How does DIV scale? There’s a computation involved. Where does that happen?  Oracle created a test bed and shared the results of their testing. In the process several interesting things were discovered and improvements added  There are three states for DIV: off, on, verify. These results are for the on setting  The server used here is a T4-4 with a limited domain of 32 of 64 cores  8 FC 8 ports, 3 drives zoned per port, 4 ports for tape, 4 ports for disk

Packard Campus for Audio Visual Conservation Formerly NAVCC Oracle T10000C DIV Performance Individual drive throughput Average Performance per drive not impacted by additional drives Maintain Optimal Transfer rates as drives are added Recommend 4 tape drives per HBA port

Packard Campus for Audio Visual Conservation Formerly NAVCC 11 Key Architecture Points  Know the amount of data you need to move over a given time period. 5 TB/12 hours  Know the speeds. Test the configuration  Disk  Tape  Ports  Protocol (FC/Ethernet)  Server (backplane)  We built our cache for 6X our throughput needs because  Daily: Write once, read three times. Once for SHA1 check, two more times for each tape copy  Testing: Write once, read once  We built a separate set of LUNs for migration  Migration: Write once, read three times. Once for SHA1 check, two more times for each tape copy  Test and Monitor  Standard Operating Procedure for checking network, systems, storage and tape  Syslog on all devices that support it. Build pattern recognition for issues whenever they present.  HSM filesystem for damaged files, errors  Storage Tape Analytics  Service Delivery Platform/Automatic Service Request

Packard Campus for Audio Visual Conservation Formerly NAVCC 12 Future  Testing HPSS implementation  Oracle Tape Analytics to monitor marginal issues with tape drives/library  Drive code upgrade, required ACSLS version not available for Solaris 11  Older libraries require an HBT upgrade for memory  Migrating 3.6 PB of T10Kb tapes/data to T10Kc tapes  Migrating current 2 GB/s infrastructure to 6.5 GB/s  DDN SFA10K with 150 TB of 600GB disks with 16 FC8 connections  T4-4 as archive server with 8 FC8 connections  X4470 as data mover/web server with 8 FC8 connections  NAS for user accessible filesystem (from FC/server based)  Evaluating next generation infrastructure at 15 GB/s  Continue to monitor HSM solutions for technical excellence  Evaluate underlying software and hardware technologies  Data integrity  Scaling meta data  Scaling data throughput  Migration strategies

Packard Campus for Audio Visual Conservation Formerly NAVCC 13 Architecting for Data Integrity Questions?  AXF format  Oracle provides end to end data path integrity in some of its appliances  Red Hat is in discussions to improve data path integrity in Linux  SHA1 versus SHA2-256/512 Scott Rife srif at loc dot gov