Presentation is loading. Please wait.

Presentation is loading. Please wait.

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Similar presentations


Presentation on theme: "SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan."— Presentation transcript:

1 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan Beecher ICPSR Justin Littman LC Chronopolis in Practice

2 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 Outline Current Chronopolis Implementation Accomplishments (2/08 – Present) Ingested Content Transmission Technologies for Ingest ICPSR – SRB CDL – Bagit NCSU - Bagit Technologies for Integrity Audit Control Environment Questions

3 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 Chronopolis Implementation Sun 6140 62TB Sun 6140 62TB SRB D-Broker SRB D-Broker SRB MCAT Sun SAM-QFS SRB D-Broker SRB D-Broker SRB MCAT Apple Xsan SRB D-Broker SRB D-Broker SRB MCAT CDL Server ICPSR Server NCAR Network Maryland Network SDSC Network ICPSR Network UC Berkeley Network Chronopolis Data 12-25TB Chronopolis Data 12-25TB Chronopolis Data 12TB Chronopolis Data 12TB CDL Server SDSC Network NCAR Network UMD Network Tape Silos Adapted from Bryan Banister (SDSC)

4 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 Key Deliverables 07/08 7.2 - A well-integrated network and data grid for content sharing among CDL and ICPSR supporting sustained high- capacity transfer rates. 7.3 - An integrated set of monitoring tools for the Chronopolis Data Grid using the replication monitor, ACE, and INCA for the Library community. 7.5 - A Dissemination Information Package (DIP) for content submitted by both ICPSR and CDL will be available for both ICPSR and CDL to retrieve their content from the Chronopolis gateway. 7.7 - An ingested content collection from ICPSR of 12-15 TB 7.8 - An ingested content collection from CDL of 25 TB

5 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 7.5 Deliverable Refinements Two Components Emerging Component 1 DIP based on Bagit structure Component 2 DIP that supports transmission package to load into Fedora repository software

6 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 Accomplishments (2/08-Present) NDIIPP Client Ingested Content ICPSR – 5 TB (Staging) CDL – 4 TB (Staging) Chronopolis Replicated Content SDSCUMIACS – 3 TB (Copy 2) SDSCNCAR (forthcoming) Transmission Speed-Ingest ICSPR – Approx 1 TB per day CDL – Bagit Tests using LC python scripts (15 processes) City Bag – 46.22 Mb/sec – 498.96 GB per day State Bag – 42.88 Mb/sec – 463.10 GB per day

7 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 New Partners N.C. State GIS Data @5 TBs Already working with BagIt Format Scripps Institute of Oceanography Data @2 TBs Already working with SRB

8 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 Technologies for Ingest/Replication SRB to SRB Connections ICPSR-Client Scripps-Client UMIACS-Chronopolis Partner NCAR-Chronopolis Partner Bagit Transfers CDL NC State

9 Transfer Methodology (ICPSR – Client) Synchronize collections of content with SDSC’s storage grid  Original scope was just our web-delivered content Compressed 400GB Tens of thousands of files  Since then we have copied our complete holdings Uncompressed 5000GB Millions of files

10 Transfer method SRB utilities are the base  Sput  Srsync Cannot use the utilities “out of the box”  Too many files  Too many timeouts Wrap the utilities with some simple shell script grouping

11 Example Metadata resides in Oracle; dump it nightly to SRB  Sput –fK /path/to/oracle/export s:/SDSC- chron/icpsr.umich/database Files reside elsewhere and there are LOTS  Wrap Sinit, Srsync and Sexit in a script, Ssend  Invoke via a mechanism like this: find /archive | xargs –n 3 –P 0 Ssend  Select a bunch of “just big enough” directories to feed into Ssend, and not too many at a time

12 BagIt Motivating use cases: –Transfer of content internally and between preservation partners –Long-term storage of content Needs: –Minimally self-identifying and self-describing packages –Support for error detection and transfer optimization Characteristics: –Low overhead –Content agnostic –Supported by off-the-shelf tools (e.g., MD5Deep) ‏

13 Informed by LC's eDeposit Pilot Project NDIIPP Archive and Ingest Handling Test (AIHT) ‏ Tabata et al., “Enclose-and-Deposit Method,” IWAW ’05 Documented at www.ietf.org/internet-drafts/draft-kunze-bagit-01.txt www.cdlib.org/inside/diglib/bagit/bagitspec.html

14 Basic bag: / bagit.txt manifest-.txt [optional additional tag files] data/ [content file hierarchy] Bag parts: –bagit.txt: Bag signature –manifest-.txt: List of content files and fixities Example, manifest-md5.txt: 49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png 408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt –package-info.txt: Bag contents metadata (optional) ‏ –fetch.txt: Bag contents included by reference (optional) ‏

15 UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE – Auditing Control Environment Software to ensure the long term integrity of digital objects. Underpinnings are based on rigorous cryptographic techniques and a third party integrity management and auditing. Automatic regular audits based on policies set by the archive manager. Scalable, cost-effective, and can interoperate with any archiving architecture.

16 UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE – System Architecture

17 UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE Audit  Each digital object is periodically audited using the integrity token, according to the policy set by the local manager.  Cryptographic summaries are audited as necessary by the archive or an independent party using the published witness values.

18 UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE Screen Shots Last audit: successful Adding a CollectionAuditing a CollectionViewing an Error Report Action Pane (Collection Specific) Status Pane (Overview) Start Auditing Edit Collection Location Remove Collection Browse Collection View Events View Error Report

19 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIES Q and A

20 SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIES


Download ppt "SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan."

Similar presentations


Ads by Google