Download presentation
Presentation is loading. Please wait.
Published byDavid Riley Modified over 9 years ago
1
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center
2
What is Chronopolis? A digital preservation network developed by a national consortium, with initial funding from The Library of Congress / National Digital Information and Infrastructure Preservation Program (NDIIPP). Chronopolis partners are : – San Diego Supercomputer Center (SDSC) and the UC San Diego (UCSD) Libraries – University of Maryland Institute for Advanced Computer Studies (UMIACS) – National Center for Atmospheric Research (NCAR) in Boulder, Colorado 2 UCSD Libraries http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
3
Chronopolis Fast Facts Digital preservation environment using a data grid framework Designed to leverage capabilities at multiple institutions Emphasizes heterogeneous and redundant data storage systems Has a current storage capacity of 150 TB (50 TB at 3 nodes) Has geographically distributed copies of all data Includes detailed monitoring and monthly auditing of all data 3
4
Institutional Roles All partners provide: – Storage, network support – Complete copy of all data – SRB support UCSD Libraries: – Metadata expertise 4http://chronopolis.sdsc.edu SDSC: – Project Management – Finances, contracts, etc UMIACS: – Preservation tool development – Storage technology testing NCAR: – Data portal development UCSD/SDSC/UMIACS/NCAR
5
Data Providers California Digital Library – 12 TB of data – Crawls of political and government web sites – ARC files, uniform size – BagIt protocol for data transfer Inter-university Consortium for Political and Social Research (ICPSR) – 10 TB of data – 40+ years of social science research – Millions of files – Already using SRB 5http://chronopolis.sdsc.edu
6
Data Providers North Carolina State University Libraries – 6 TB of data – State and local geospatial data – BagIt protocol for data transfer Scripps Institution of Oceanography – 1 TB of data – 50 years of data from SIO research cruises – Already using SRB 6http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
8
Core Chronopolis Tools Storage Resource Broker (SRB) BagIt SRB Replication Monitor Auditing Control Environment (ACE) Chronopolis Web Portal 8http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
9
Storage Resource Broker The underlying infrastructure of Chronopolis Each site is a separate zone with its own MCAT and management Data is replicated at each zone Will be moving to iRODS in next few months 9http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
10
BagIt BagIt is a hierarchical file packaging format for the exchange of generalized digital content. There is no software to install Consists of base directory with manifest file & subdirectory with content Manifest file has a row for each content file with: – Full path in content directory – A checksum for file Holey Bags Have additional ‘fetch.txt’ file in base directory & empty content directory URLs for each content file are listed in fetch.txt file. Can reduce transfer time by fetching content in parallel http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf 10
11
BagIt 11http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
12
SRB Replication Monitor Product of UMIACS A webapp that watches registered directories and ensures that copies exist at designated mirrors. The monitor stores enough information to know if files have been added or removed from the master site and when the last time a file was seen. Any action that the webapp takes on files is logged. The monitor does NOT do any type of integrity checking, this is the responsibility of other components (eg, ACE). 12http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
13
Replication Process 13 Replication Monitor http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
14
14
15
15
16
Product of UMIACS Software to protect the integrity of digital assets in the long term Underpinnings are based on rigorous cryptographic techniques Scalable, cost-effective, can interoperate with any archiving architecture Auditing Control Environment (ACE) 16http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
17
17 ACE – Overview Integrity Token Hash (obj) ACE-AM 3 rd Party Auditor Client ACE-IMS object (Audit Manager) (Integrity Management Service)
18
Can audit millions of files and TBs of data Two types of audit: – A file audit: checks files in registered directories against stored hashes to ensure files have not been corrupted – Token audit: checks the stored hashes against a remote Integrity Management Server to ensure nobody has tampered with the stored hashes ACE Audit 18http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
19
19 ACE Audit Integrity Token Witness Cryptographic Summary Information Object 1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager. 2. The integrity management system periodically audits the integrity tokens according to its policies. 3. Cryptographic summaries are audited as necessary using the published witness values. http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
20
20http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
21
21http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
22
22http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
23
Web Portal Designed to give data providers an in-depth look at their holdings Shows where data is in all locations Unifies information from SRB, ACE and the Replication Monitor 23http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
24
24
25
25
26
26
27
Chronopolis Metadata Working with team from UCSD Libraries What technical metadata is system tracking? What descriptive metadata is present? What are the significant events? 27http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
28
MCAT Data Manifest Node 1 ET-1 Service Level Agreement Data DP ET-4 Acquisition Registration to SRB ET-3 Acquisition Validation ET-2 Acquisition Transfer ACE ET-5 Acquisition Registration into ACE ET-8 File Integrity Check Node 3 Node 2 Replication Monitor ET-7 Acquisition Replication ET-6 Inter-Node Inventory Check 28http://chronopolis.sdsc.edu
29
Future directions Update auditing procedures Updated portal Automation of collection ingest New collections and storage nodes Fully-fledged business model TRAC certification 29http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR
30
http://chronopolis.sdsc.edu 30 minor@sdsc.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.