May 23, 2007 Archiving ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced Computer Science Studies Department of Electrical and Computer Engineering University of Maryland, College Park Sponsored by Library of Congress and NSF
May 23, 2007 Archiving Main Threats to Integrity of Digital Archives Hardware/media degradation Hardware/software malfunction Operational errors Technology evolution Object transformation (format obsolescence) Infrequent access to most data Evolution of cryptographic schemes Security breaches, malicious alterations
May 23, 2007 Archiving Existing Methodologies Core Techniques –Replication: mirroring –Coding techniques: parity checking (RAID), erasure codes –Cryptographic one-way hashing: checksum Techniques for Digital Archives –Hashing only –Replication + voting scheme –Hashing + replication –Digital Signatures –Time Stamping (PKI vs. hash-linking)
May 23, 2007 Archiving ACE - Assumptions Basic Assumption on the archive –Each object has a persistent identifier –In the presence of multiple copies, one is designated as master. No other assumptions – architecture can be centralized, distributed, or peer-to-peer; policies can be centralized, distributed, or federated.
May 23, 2007 Archiving ACE – Base Methodology Three-tiered Cryptographic Information. Each tier is periodically audited separately according to policies set by managers. Integrity Token Witness Cryptographic Summary Information 1 IT/object ~1KB 1 CSI/time window Or 1 CSI / (n) objects ~100MB/year 1 Witness/week ~2-3KB/year k:1l:1
May 23, Three-Tiered Cryptographic Information
May 23, ACE – System Architecture
May 23, 2007 Archiving ACE – Overview Integrity Token Hash (obj) ACE-AM 3 rd Party Auditor Client ACE-IMS object
May 23, 2007 Archiving ACE – Registration 1. A request containing the hash of the object is made to ACE. 2. When the aggregation round closes, the Aggregator builds an authentication tree. 3. A receipt is returned. 4~5. A new cryptographic summary is computed and the integrity token for each request is constructed. 6~8. Each object retrieves its integrity token.
May 23, 2007 Archiving ACE Witness Publication Cryptographic Summary Information Witness … Once a week, a witness is computed from the cryptographic summaries generated during the week. The witness of the week is widely published on the Internet – currently, gets posted to the newsgroups at Google, Yahoo and MSN. The witness is also stored on a CD- ROM
May 23, 2007 Archiving ACE – Demo Modify
May 23, 2007 Archiving ACE Audit Integrity Token Witness Cryptographic Summary Information Object 1. Each digital object is audited locally using the integrity token, according to the policy set by the local manager. 2. The integrity management system periodically audits the integrity tokens according to its policies. 3. Cryptographic summaries are audited as necessary using the published witness values.
May 23, 2007 Archiving Auditing Cryptographic Summaries Witness … Cryptographic Summary Information The system collects all the summaries that share the same Time Frame ID, and builds a validation witness. The system retrieves the published witness of the Time Frame ID from the newsgroups. The published witness is then compared to the validation witness
May 23, 2007 Archiving ACE Update– Obsolete Hash Functions Objects are registered again with the information on the old integrity token (IT). The new IT token is constructed using this information. The object integrity from the previous registration to the new registration can still be verified with the old IT, whereas the new IT will be responsible from the new registration.
May 23, 2007 Archiving ACE Update – Object Transformation The new object is registered again. However, the registration request contains information on the old integrity token. The new integrity token is constructed using this information. With this information, a future audit can track current version back to the previous version.
May 23, 2007 Archiving ACE Performance Preliminary performance evaluation –Setup : Audits on the NARA EAP Image Collection consisting over 1.1TB of 126,548 files. –Results: All files were audited in about 15 hours. –Note 1: Most of the time was spent in moving the data between the separate machines. –Note 2: Registration on the same collection took almost the same time.
May 23, 2007 Archiving ACE Summary Third-party auditable Cryptographically rigorous yet cost-effective Update-aware Highly interoperable Scalable High Performance