Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010
Designing and implementing storage architectures and systems to support data curation and preservation needs ◦ What does this entail? ◦ Who’s thinking about this? ◦ Who’s doing anything about this?
Digital Preservation ◦ Managed activities to ensure long term retention, retrieval of, and access to data Digital Curation ◦ Maintaining, preserving, and enhancing data throughout its lifecycle Archival storage ◦ Depends on who you talk to Information Lifecycle Management ◦ Storage industry term for the above Object-based storage ◦ Data with metadata “container”
eScience/eResearch data management needs NSF requirement for data management plans Compliance ◦ e-Discovery, FERPA, HIPAA, Sarbanes-Oxley ◦ Institutional record retention regulations and policies Storage services for libraries, archives, cultural heritage entities Great efficiencies
Storage is cheap Storage is smart Stuff on the Internet is persistent Digital safer than analog Storage provider = curators and preservation experts Repositories take care of preservation Metadata will take care of it Libraries will take care of it The Cloud will take care of it
New roles, new responsibilities, new collaborations, practices, workflows Intellectual capital requirements – digital preservation/curation policy determination and implementation Bar for trust is rising Cloud antithetical to preservation? Increased storage management requirements Scaling issues with preservation requirements
More likely to meet these today at the system level – DR & BC practices and tiered storage architectures Immutable storage Data integrity checking ◦ Mitigation of bit rot ◦ Auditing function Mitigation of obsolescence ◦ File format migration Deposition as important as retention Need for storage management metadata ◦ Technical – file size, name, location, ACL, date, time, versioning, Biggest need: system-independence
iRODS (integrated Rule-based Data System) Storage Resource Broker (SRB) Content Addressable Storage (CAS) ◦ Fixed content storage, retrieval based on content rather than location eXtensible Access Method (XAM) ◦ Emerging SNIA standard for an API for content- addressable storage objects
NSF DataNet Program ◦ Data Conservancy project – JHU lead with 23 institutions to create curation, discovery, and preservation network Chronopolis ◦ SDSC, UCSD, UMIACS, NCAR: Federated data grid using SRB/iRODS LOCKSS (Lots of Copies Keep Things Safe) ◦ Replication of licensed journals and other content MetaArchive – ◦ a private LOCKSS archive Internet Archive
National Digital Information Infrastructure & Preservation Program (NDIIP) ◦ Library of Congress program to “to develop a national strategy to collect, preserve and make available significant digital content via a preservation network of over 130 partners."
California Digital Library ◦ Curation Micro-services DuraSpace ◦ DuraCloud project to implement a preservation- oriented cloud storage service HaithiTrust ◦ Repository and storage infrastructure initiated for CIC Google book project Sun Preservation and Archiving SIG (PASIG) Storage Networking Industry Association
Content Stewardship Program – strategic collaboration between University Libraries and Information Technology Services (ITS) Goal: a suite of services to support the lifecycle of the digital object – creation, discovery, access, storage, preservation and archiving Hired Digital Library Architect and Digital Collections Curator Governance in place
Anchor projects/activities: ◦ Storage and Preservation strategy development Prototyped the XAM standard for archival storage ◦ Institutional record repository ◦ Research data prototype ◦ Best practices for data management ◦ ETD platform replacement Sponsoring curation technology workshop in August LOCKSS member, recently joined MetaArchive Exploration of California Digital Library’s curation micro-services Application of service management principles and processes to the above
What are CSG member institutions doing in this space?