Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Audit Control Environment Mike Smorul UMIACS. Issues surrounding asserting integrity Threats to Integrity of Digital Archives –Hardware/media degradation.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
The Frame NSF-funded national supercomputer centers Centers have hosted significant projects: TeraGrid, NPACI, GEON, SCEC, Chronopolis Fostered development.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
May Archiving PAWN: A Policy-Driven Software Environment for Implementing Producer- Archive Interactions in Support of Long Term Digital.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
ACE: A Software Tool to Ensure the Integrity of Digital Archives Principal Investigator: Joseph JaJa Graduate Student: Sangchul Song Lead Programmer: Michael.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
ACE: A Software Tool to Ensure the Integrity of Digital Archives Principal Investigator: Joseph JaJa Graduate Student: Sangchul Song Lead Programmers:
May 23, 2007 Archiving ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Trusted Datagrids: Library of Congress Projects with UCSD Ardys Kozbial – UCSD Libraries David Minor - SDSC.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Building a Network of Preservation Partners CNI Spring Task Force Meeting.
Toward a Distributed and Collaborative Framework for Preservation Martin Halbert, UNT Dean of Libraries David Minor, Chronopolis Program Manager Katherine.
Tyler O. Walters, Associate Director, Technology & Resource Services Library & Information Center, Georgia Institute of Technology For NSF Site Visit to.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Persistent Digital Archives and Library System (PeDALS) SC Department of Archives and History.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Why Archiving and Preserving GIS Data Is Important Maps tell a compelling story of change over time. They document movement, progress, and change to the.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Interoperability within the Grid NDIIPP Partners Meeting Arlington, VA July 9, 2008 Interoperability within the Grid Robert H. McDonald Digital Preservation.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
What is NDIIPP doing?. July 7 th, Web-At-Risk is opening its archives for public access, having captured nearly 6 TB of data—the entire CA State Government.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation.
Introduction to The Storage Resource.
C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
National Geospatial Enterprise Architecture N S D I National Spatial Data Infrastructure An Architectural Process Overview Presented by Eliot Christian.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
The Storage Resource Broker and.
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
PAWN: Producer-Archive Workflow Network
Trustworthiness of Preservation Systems
An Overview of Data-PASS Shared Catalog
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Digital Asset Management Part 15: Summary
Technical Issues in Sustainability
ACE – Auditing Control Environment
Presentation transcript:

Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center

What is Chronopolis? A digital preservation network developed by a national consortium, with initial funding from The Library of Congress / National Digital Information and Infrastructure Preservation Program (NDIIPP). Chronopolis partners are : – San Diego Supercomputer Center (SDSC) and the UC San Diego (UCSD) Libraries – University of Maryland Institute for Advanced Computer Studies (UMIACS) – National Center for Atmospheric Research (NCAR) in Boulder, Colorado 2 UCSD Libraries

Chronopolis Fast Facts Digital preservation environment using a data grid framework Designed to leverage capabilities at multiple institutions Emphasizes heterogeneous and redundant data storage systems Has a current storage capacity of 150 TB (50 TB at 3 nodes) Has geographically distributed copies of all data Includes detailed monitoring and monthly auditing of all data 3

Institutional Roles All partners provide: – Storage, network support – Complete copy of all data – SRB support UCSD Libraries: – Metadata expertise 4http://chronopolis.sdsc.edu SDSC: – Project Management – Finances, contracts, etc UMIACS: – Preservation tool development – Storage technology testing NCAR: – Data portal development UCSD/SDSC/UMIACS/NCAR

Data Providers California Digital Library – 12 TB of data – Crawls of political and government web sites – ARC files, uniform size – BagIt protocol for data transfer Inter-university Consortium for Political and Social Research (ICPSR) – 10 TB of data – 40+ years of social science research – Millions of files – Already using SRB 5http://chronopolis.sdsc.edu

Data Providers North Carolina State University Libraries – 6 TB of data – State and local geospatial data – BagIt protocol for data transfer Scripps Institution of Oceanography – 1 TB of data – 50 years of data from SIO research cruises – Already using SRB 6http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

Core Chronopolis Tools Storage Resource Broker (SRB) BagIt SRB Replication Monitor Auditing Control Environment (ACE) Chronopolis Web Portal 8http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

Storage Resource Broker The underlying infrastructure of Chronopolis Each site is a separate zone with its own MCAT and management Data is replicated at each zone Will be moving to iRODS in next few months 9http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

BagIt BagIt is a hierarchical file packaging format for the exchange of generalized digital content. There is no software to install Consists of base directory with manifest file & subdirectory with content Manifest file has a row for each content file with: – Full path in content directory – A checksum for file Holey Bags Have additional ‘fetch.txt’ file in base directory & empty content directory URLs for each content file are listed in fetch.txt file. Can reduce transfer time by fetching content in parallel 10

BagIt 11http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

SRB Replication Monitor Product of UMIACS A webapp that watches registered directories and ensures that copies exist at designated mirrors. The monitor stores enough information to know if files have been added or removed from the master site and when the last time a file was seen. Any action that the webapp takes on files is logged. The monitor does NOT do any type of integrity checking, this is the responsibility of other components (eg, ACE). 12http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

Replication Process 13 Replication Monitor

14

15

Product of UMIACS Software to protect the integrity of digital assets in the long term Underpinnings are based on rigorous cryptographic techniques Scalable, cost-effective, can interoperate with any archiving architecture Auditing Control Environment (ACE) 16http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

17 ACE – Overview Integrity Token Hash (obj) ACE-AM 3 rd Party Auditor Client ACE-IMS object (Audit Manager) (Integrity Management Service)

Can audit millions of files and TBs of data Two types of audit: – A file audit: checks files in registered directories against stored hashes to ensure files have not been corrupted – Token audit: checks the stored hashes against a remote Integrity Management Server to ensure nobody has tampered with the stored hashes ACE Audit 18http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

19 ACE Audit Integrity Token Witness Cryptographic Summary Information Object 1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager. 2. The integrity management system periodically audits the integrity tokens according to its policies. 3. Cryptographic summaries are audited as necessary using the published witness values.

20http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

21http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

22http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

Web Portal Designed to give data providers an in-depth look at their holdings Shows where data is in all locations Unifies information from SRB, ACE and the Replication Monitor 23http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

24

25

26

Chronopolis Metadata Working with team from UCSD Libraries What technical metadata is system tracking? What descriptive metadata is present? What are the significant events? 27http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

MCAT Data Manifest Node 1 ET-1 Service Level Agreement Data DP ET-4 Acquisition Registration to SRB ET-3 Acquisition Validation ET-2 Acquisition Transfer ACE ET-5 Acquisition Registration into ACE ET-8 File Integrity Check Node 3 Node 2 Replication Monitor ET-7 Acquisition Replication ET-6 Inter-Node Inventory Check 28http://chronopolis.sdsc.edu

Future directions Update auditing procedures Updated portal Automation of collection ingest New collections and storage nodes Fully-fledged business model TRAC certification 29http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

30