SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING 07.09.08 David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Peter Berrisford RAL – Data Management Group SRB Services.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Audit Control Environment Mike Smorul UMIACS. Issues surrounding asserting integrity Threats to Integrity of Digital Archives –Hardware/media degradation.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
The Digital Preservation Network at UT Austin Chris Jordan Texas Advanced Computing Center.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
Current Thinking on Digital Preservation: Role of Metadata Oya Y. Rieger Coordinator, Library Office of Distributed Learning Cornell University Library.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
ACE: A Software Tool to Ensure the Integrity of Digital Archives Principal Investigator: Joseph JaJa Graduate Student: Sangchul Song Lead Programmer: Michael.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
ACE: A Software Tool to Ensure the Integrity of Digital Archives Principal Investigator: Joseph JaJa Graduate Student: Sangchul Song Lead Programmers:
May 23, 2007 Archiving ACE: A Novel Software Platform to Ensure the Integrity of Digital Archives Sangchul Song and Joseph JaJa Institute for Advanced.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information Principal Investigator: Joseph JaJa Lead Programmers: Mike.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Data Grid: GRASP Mike Smorul. Grid Retrieval and Search Platform Based on concepts developed in the Earth Science Data Interface (ESDI) developed at the.
Robust Technologies for Automated Ingestion and Long-Term Preservation of Digital Information PI: Joseph JaJa Co-PIs: Allison Druin and Doug Oard Major.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
Trusted Datagrids: Library of Congress Projects with UCSD Ardys Kozbial – UCSD Libraries David Minor - SDSC.
Architecting an Extensible Digital Repository Anoop Kumar, Ranjani Saigal,Rob Chavez, Nikolai Schwertner Tufts University, Medford, MA.
Repository Development Group Office of Strategic Initiatives Transfer and Inventory Components of Developing Repository Services Leslie Johnston Open Repositories.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
Neighborhood Watch for Repository Quality Assurance Stephen Abrams Patricia Cruse John Kunze University of California Curation Center California Digital.
The Digital Object Management Programme (DOM) Richard Masters, Programme Manager PRESERV Partners Meeting 18 th November
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
SAN DIEGO SUPERCOMPUTER CENTER HDF5/SRB Integration August 28, 2006 Mike Wan SRB, SDSC Peter Cao
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Preserving eScholarship and Digitized Special Collections Distributed Digital Preservation Bill Donovan
Digital Preservation MetaArchive Cooperative.  9:00-9:45 - Session 1: Digital Preservation Overview  9:45-11:00 - Session 2: Policy & Planning Overview.
Interoperability within the Grid NDIIPP Partners Meeting Arlington, VA July 9, 2008 Interoperability within the Grid Robert H. McDonald Digital Preservation.
Services for Object Storage and Preservation March 2008 All content in these slides is considered work in progress. In no way does it represent an absolute.
Katherine Skinner, Executive Director, Educopia Institute ESOPI 2013 Chapel Hill, NC April 19, 2013.
Small steps and lasting impact: making a start with preservation or It’s not all NASA Patricia Sleeman Digital Archives and Repositories University of.
Persistent Digital Archives and Library System (PeDALS)
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
GCRC Meeting 2004 BIRN Coordinating Center Software Development Vicky Rowley.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Chronopolis – MetaArchive Improving and Strengthening Inter-Institutional Preservation.
C HRONOPOLIS TM and the D IGITAL P RESERVATION I MPERATIVE Brian E. C. Schottlaender The Audrey Geisel University Librarian ECAR Symposium, 4 December.
The DuraCloud Workshop Your hosts: Bill Branan & Carissa Smith.
Content Transfer NDIIPP Meeting July 9, 2008 Jane Mandelbaum, LC.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
SAN DIEGO SUPERCOMPUTER CENTER Replication Policies for Federated Digital Repositories Robert H. McDonald Chronopolis Project Manager
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
PAWN: Producer-Archive Workflow Network
DAITSS and the Florida Digital Archive
Joseph JaJa, Mike Smorul, and Sangchul Song
Michele Kimpton Project Director, DuraCloud NDIPP Partner meeting
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING David Minor SDSC Robert H. McDonald SDSC Sangchul Song UMIACS Bryan Beecher ICPSR Justin Littman LC Chronopolis in Practice

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Outline Current Chronopolis Implementation Accomplishments (2/08 – Present) Ingested Content Transmission Technologies for Ingest ICPSR – SRB CDL – Bagit NCSU - Bagit Technologies for Integrity Audit Control Environment Questions

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Chronopolis Implementation Sun TB Sun TB SRB D-Broker SRB D-Broker SRB MCAT Sun SAM-QFS SRB D-Broker SRB D-Broker SRB MCAT Apple Xsan SRB D-Broker SRB D-Broker SRB MCAT CDL Server ICPSR Server NCAR Network Maryland Network SDSC Network ICPSR Network UC Berkeley Network Chronopolis Data 12-25TB Chronopolis Data 12-25TB Chronopolis Data 12TB Chronopolis Data 12TB CDL Server SDSC Network NCAR Network UMD Network Tape Silos Adapted from Bryan Banister (SDSC)

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Key Deliverables 07/ A well-integrated network and data grid for content sharing among CDL and ICPSR supporting sustained high- capacity transfer rates An integrated set of monitoring tools for the Chronopolis Data Grid using the replication monitor, ACE, and INCA for the Library community A Dissemination Information Package (DIP) for content submitted by both ICPSR and CDL will be available for both ICPSR and CDL to retrieve their content from the Chronopolis gateway An ingested content collection from ICPSR of TB An ingested content collection from CDL of 25 TB

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Deliverable Refinements Two Components Emerging Component 1 DIP based on Bagit structure Component 2 DIP that supports transmission package to load into Fedora repository software

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Accomplishments (2/08-Present) NDIIPP Client Ingested Content ICPSR – 5 TB (Staging) CDL – 4 TB (Staging) Chronopolis Replicated Content SDSCUMIACS – 3 TB (Copy 2) SDSCNCAR (forthcoming) Transmission Speed-Ingest ICSPR – Approx 1 TB per day CDL – Bagit Tests using LC python scripts (15 processes) City Bag – Mb/sec – GB per day State Bag – Mb/sec – GB per day

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING New Partners N.C. State GIS TBs Already working with BagIt Format Scripps Institute of Oceanography TBs Already working with SRB

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIESNDIIPP PARTNERS MEETING Technologies for Ingest/Replication SRB to SRB Connections ICPSR-Client Scripps-Client UMIACS-Chronopolis Partner NCAR-Chronopolis Partner Bagit Transfers CDL NC State

Transfer Methodology (ICPSR – Client) Synchronize collections of content with SDSC’s storage grid  Original scope was just our web-delivered content Compressed 400GB Tens of thousands of files  Since then we have copied our complete holdings Uncompressed 5000GB Millions of files

Transfer method SRB utilities are the base  Sput  Srsync Cannot use the utilities “out of the box”  Too many files  Too many timeouts Wrap the utilities with some simple shell script grouping

Example Metadata resides in Oracle; dump it nightly to SRB  Sput –fK /path/to/oracle/export s:/SDSC- chron/icpsr.umich/database Files reside elsewhere and there are LOTS  Wrap Sinit, Srsync and Sexit in a script, Ssend  Invoke via a mechanism like this: find /archive | xargs –n 3 –P 0 Ssend  Select a bunch of “just big enough” directories to feed into Ssend, and not too many at a time

BagIt Motivating use cases: –Transfer of content internally and between preservation partners –Long-term storage of content Needs: –Minimally self-identifying and self-describing packages –Support for error detection and transfer optimization Characteristics: –Low overhead –Content agnostic –Supported by off-the-shelf tools (e.g., MD5Deep) ‏

Informed by LC's eDeposit Pilot Project NDIIPP Archive and Ingest Handling Test (AIHT) ‏ Tabata et al., “Enclose-and-Deposit Method,” IWAW ’05 Documented at

Basic bag: / bagit.txt manifest-.txt [optional additional tag files] data/ [content file hierarchy] Bag parts: –bagit.txt: Bag signature –manifest-.txt: List of content files and fixities Example, manifest-md5.txt: 49afbd86a1ca9f34b677a3f09655eae9 data/27613-h/images/q172.png 408ad21d50cef31da4df6d9ed81b01a7 data/27613-h/images/q172.txt –package-info.txt: Bag contents metadata (optional) ‏ –fetch.txt: Bag contents included by reference (optional) ‏

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE – Auditing Control Environment Software to ensure the long term integrity of digital objects. Underpinnings are based on rigorous cryptographic techniques and a third party integrity management and auditing. Automatic regular audits based on policies set by the archive manager. Scalable, cost-effective, and can interoperate with any archiving architecture.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE – System Architecture

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE Audit  Each digital object is periodically audited using the integrity token, according to the policy set by the local manager.  Cryptographic summaries are audited as necessary by the archive or an independent party using the published witness values.

UNIVERSITY of MARYLAND INSTITUTE for ADVANCED COMPUTER STUDIES ACE Screen Shots Last audit: successful Adding a CollectionAuditing a CollectionViewing an Error Report Action Pane (Collection Specific) Status Pane (Overview) Start Auditing Edit Collection Location Remove Collection Browse Collection View Events View Error Report

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIES Q and A

SAN DIEGO SUPERCOMPTER CENTERUC SAN DIEGO LIBRARIES