Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.

Similar presentations


Presentation on theme: "Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center."— Presentation transcript:

1 Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center

2 What is Chronopolis? A digital preservation network developed by a national consortium, with initial funding from The Library of Congress / National Digital Information and Infrastructure Preservation Program (NDIIPP). Chronopolis partners are : – San Diego Supercomputer Center (SDSC) and the UC San Diego (UCSD) Libraries – University of Maryland Institute for Advanced Computer Studies (UMIACS) – National Center for Atmospheric Research (NCAR) in Boulder, Colorado 2 UCSD Libraries http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

3 Chronopolis Fast Facts Digital preservation environment using a data grid framework Designed to leverage capabilities at multiple institutions Emphasizes heterogeneous and redundant data storage systems Has a current storage capacity of 150 TB (50 TB at 3 nodes) Has geographically distributed copies of all data Includes detailed monitoring and monthly auditing of all data 3

4 Institutional Roles All partners provide: – Storage, network support – Complete copy of all data – SRB support UCSD Libraries: – Metadata expertise 4http://chronopolis.sdsc.edu SDSC: – Project Management – Finances, contracts, etc UMIACS: – Preservation tool development – Storage technology testing NCAR: – Data portal development UCSD/SDSC/UMIACS/NCAR

5 Data Providers California Digital Library – 12 TB of data – Crawls of political and government web sites – ARC files, uniform size – BagIt protocol for data transfer Inter-university Consortium for Political and Social Research (ICPSR) – 10 TB of data – 40+ years of social science research – Millions of files – Already using SRB 5http://chronopolis.sdsc.edu

6 Data Providers North Carolina State University Libraries – 6 TB of data – State and local geospatial data – BagIt protocol for data transfer Scripps Institution of Oceanography – 1 TB of data – 50 years of data from SIO research cruises – Already using SRB 6http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

7

8 Core Chronopolis Tools Storage Resource Broker (SRB) BagIt SRB Replication Monitor Auditing Control Environment (ACE) Chronopolis Web Portal 8http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

9 Storage Resource Broker The underlying infrastructure of Chronopolis Each site is a separate zone with its own MCAT and management Data is replicated at each zone Will be moving to iRODS in next few months 9http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

10 BagIt BagIt is a hierarchical file packaging format for the exchange of generalized digital content. There is no software to install Consists of base directory with manifest file & subdirectory with content Manifest file has a row for each content file with: – Full path in content directory – A checksum for file Holey Bags Have additional ‘fetch.txt’ file in base directory & empty content directory URLs for each content file are listed in fetch.txt file. Can reduce transfer time by fetching content in parallel http://www.digitalpreservation.gov/library/resources/tools/docs/bagitspec.pdf 10

11 BagIt 11http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

12 SRB Replication Monitor Product of UMIACS A webapp that watches registered directories and ensures that copies exist at designated mirrors. The monitor stores enough information to know if files have been added or removed from the master site and when the last time a file was seen. Any action that the webapp takes on files is logged. The monitor does NOT do any type of integrity checking, this is the responsibility of other components (eg, ACE). 12http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

13 Replication Process 13 Replication Monitor http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

14 14

15 15

16 Product of UMIACS Software to protect the integrity of digital assets in the long term Underpinnings are based on rigorous cryptographic techniques Scalable, cost-effective, can interoperate with any archiving architecture Auditing Control Environment (ACE) 16http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

17 17 ACE – Overview Integrity Token Hash (obj) ACE-AM 3 rd Party Auditor Client ACE-IMS object (Audit Manager) (Integrity Management Service)

18 Can audit millions of files and TBs of data Two types of audit: – A file audit: checks files in registered directories against stored hashes to ensure files have not been corrupted – Token audit: checks the stored hashes against a remote Integrity Management Server to ensure nobody has tampered with the stored hashes ACE Audit 18http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

19 19 ACE Audit Integrity Token Witness Cryptographic Summary Information Object 1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager. 2. The integrity management system periodically audits the integrity tokens according to its policies. 3. Cryptographic summaries are audited as necessary using the published witness values. http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

20 20http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

21 21http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

22 22http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

23 Web Portal Designed to give data providers an in-depth look at their holdings Shows where data is in all locations Unifies information from SRB, ACE and the Replication Monitor 23http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

24 24

25 25

26 26

27 Chronopolis Metadata Working with team from UCSD Libraries What technical metadata is system tracking? What descriptive metadata is present? What are the significant events? 27http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

28 MCAT Data Manifest Node 1 ET-1 Service Level Agreement Data DP ET-4 Acquisition Registration to SRB ET-3 Acquisition Validation ET-2 Acquisition Transfer ACE ET-5 Acquisition Registration into ACE ET-8 File Integrity Check Node 3 Node 2 Replication Monitor ET-7 Acquisition Replication ET-6 Inter-Node Inventory Check 28http://chronopolis.sdsc.edu

29 Future directions Update auditing procedures Updated portal Automation of collection ingest New collections and storage nodes Fully-fledged business model TRAC certification 29http://chronopolis.sdsc.eduUCSD/SDSC/UMIACS/NCAR

30 http://chronopolis.sdsc.edu 30 minor@sdsc.edu


Download ppt "Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center."

Similar presentations


Ads by Google