Download presentation
Presentation is loading. Please wait.
Published byClemence McKinney Modified over 9 years ago
1
Designing Storage Architectures for Preservation Collections Library of Congress, September 17-18, 2007 Preservation and Access Repository Storage Architecture Stephen Abrams Harvard University Library stephen_abrams@harvard.edu
2
Digital preservation at Harvard Obligation to ensure the ongoing usability of library digital assets over time Digital Repository Service (DRS) –Managed preservation and access repository –Seven years of production operation –6.7 million assets (27 TB) Primary strategy: redundancy and heterogeneity Primary challenge: scaling
3
Scaling: linear or exponential?
4
Storage classification All managed assets are assigned a storage classification –Public use (U) High availability, fast response –Archival storage (A) High capacity, low cost Use assets are optimized for web-friendly delivery Archival assets are optimized for longevity Asset classification is known at the point of acquisition
5
Architectural requirements Each asset is stored: –In at least 3 physical locations –On at least 2 storage mediums –With at least 2 on-line copies (U) / 1 on-line copy (A) –With at least 1 off-line copy Ongoing auditing for bit-level error detection and correction Virtualization layer with uniform interface to all assets, regardless of physical medium Application interface exposed as NFS-mountable file systems
6
Storage architecture
7
QFS cache and primary U disk archive on EMC CX3-40 (FC / SATA, RAID-1/ RAID-5) at on-campus data center Redundant switched FC data paths to primary / fail-over Sun T2000 / Solaris file servers running SAM-QFS Primary A / secondary U disk archive on EMC CX3-80 (FC / SATA, RAID-1/ RAID-5) at off-campus data center Redundant FC data paths to T2000 file server running SAM-QFS Secondary A / tertiary U tape archive on StorageTek SL500 (LTO-3) FC-attached to primary on-campus T2000 Tertiary A / quaternary U tape archive on LTO-3 media at off-campus managed storage facility Disk archives are UFS file systems containing Tar files; even with the loss of the SAM infrastructure they are susceptible to full (if time- consuming) recovery with standard Unix / Linux tools
8
Storage virtualization SAM-QFS reader / writer on primary on-campus T2000 file server SAM-QFS reader on fail-over on-campus / off-campus T2000 file servers All U and A assets written to QFS cache on CX3-40 Immediate creation of all UFS disk and LTO-3 tape archive copies Immediate release from cache with “stage never” SAM manages all copies of all assets; externally each asset appears as a single file in an NFS-mountable file system Application access requests are initiated by NFS reads and are fulfilled directly from primary disk archive copy without staging to cache
9
Issues Disk vs tape LTO-3 vs LTO-4 Tape archive media pooling All hardware / software installed; currently engaged in configuration and preliminary unit / integration testing Need to establish benchmarks for system performance Planning for migration from existing storage solution Automated data classification Response to an anticipated escalating rate of asset acquisition –Google mass digitization –Web archiving –Audio / video content –Scientific data sets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.