Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Mairéad Martin, Penn State University Commons Solutions Group Storage Workshop May 2010.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Background Chronopolis Goals Data Grid supporting a Long-term Preservation Service Data Migration Data Migration to next generation technologies Trust.
A Very Brief Introduction to iRODS
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
On Developing Data Grid Workflows using Storage Resource Broker (SRB) and Kepler Tim H. Wong - UC Davis Efrat Frank - SDSC Dr. Bertram Ludäscher - UC Davis.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Digital Library Architecture and Technology
January, 23, 2006 Ilkay Altintas
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
CI Days: Planning Your Campus Cyberinfrastructure Strategy Russ Hobby, Internet2 Internet2 Member Meeting 9 October 2007.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
What is Cyberinfrastructure? Russ Hobby, Internet2 Clemson University CI Days 20 May 2008.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
NARA Report: NARA Persistent Archives Prototype Bill Underwood GTRI, Atlanta CCSDS, MOIMS DAI / IPR WGs Toulouse, 2 Nov-5 Nov 2004.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
OAIS (archive) Producer Management Consumer. Representation Information Data Object Information Object Interpreted using its Yields.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
OAIS (archive) OAIS (archive) Producer Management Consumer.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
OAIS Producer (archive) Consumer Management
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Bentley Project Reel Digitization Bentley Historical Library t
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Robin Dale RLG OAIS Functionality Robin Dale RLG
Presentation transcript:

Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar (PI) San Diego Supercomputer Center -- University of California, San Diego San Diego Supercomputer Center, Univ. of California, San Diego Arcot Rajasekar (PI) Richard Marciano Reagan Moore Chien-Yi Hou Francine Berman (co-PI) UCSD-TV, Univ. of California, San Diego Lynn Burstan (co-PI) Steve Anderson Mellisa McEwen Bee Bornheimer UCTV-Berkeley Harry Kreisler UCSD Libraries, Univ. of California, San Diego Brian Schottlaender (co-PI) Luc DeClerck Brad Westbrook Arwen Hutt Ardys Kozbial Chris Frymann Vivian Chu

Our Proposal Design and Development of a Prototype for Preserving Digital Video Collections Management of Authenticity, Integrity, and Infrastructure Independence Preservation Life-cycle meshing seamlessly with the content production Minimal impact to production life-cycle Workflow system that automates accession, description, organization and preservation of video and associated contents Metadata definition, extraction and ingestion Long-term retention and technology migration At risk Collection: Conversation with History video collection Video, audio, text transcripts, web-based material Databases of administrative and descriptive metadata Derived products Time-line: 1 year

Exemplar Collection Conversation with History - UCTV - from 1982 Hour-long interviews with internationally prominent individuals Institute of International Affairs, UC Berkeley Available in 15 million homes nationwide via UCTV 40 program segments annually Web-site for downloading older segments Among UCTVs most accessed on-line programs Programs used in educational material

Research Plan Define High-Level Life-cycle processes Develop Models for Multi-media Collection preservation Define Preservation Building Blocks – AIPs Integration of Video Content Generation with Preservation Life-cycle Issues in Security, Trust & Collection Management Long-term Preservation Models - Chronopolis

Preservation Processes Generation of a Globally Unique Identifier (GUID) for each interview session Retrieval of the original video session Retrieval of each of the segments associated with a video session Retrieval of the transmission scripts for each video segment Retrieval of the material published on the Web page for each segment Processing of each Web page to redirect internal URLs into handles within the preservation logical name space for digital entities Retrieval of the rights statement for each session Retrieval of the header associated with each video segment Retrieval of the trailer associated with each video segment Retrieval of the administrative, structural, and descriptive metadata stored in the Filemaker Pro database Retrieval of the annotations stored with the Web pages Specification of Preservation Metadata for AIP Creation of AIPs for the above material Creation of containers for physically aggregating material for storage in the preservation environment Storage of containers within the preservation environment Specification of preservation management metadata such as access controls, storage location, and replication

Tools Kepler Interactive Workflow Tool Used in multiple NSF projects – SEEK, GEON, etc Storage Resource Broker – Infrastructure Independence Collection Management Tool – Organization & Repurposing Uniform Access Abstraction with Access Control/Audit Trails Distributed, Heterogeneous Resource Access Metadata Catalog – Authenticity & Integrity Maintenance System, Structural, Descriptive, Preservation Metadata User-defined and Discipline Specific Metadata Extensible Schema Support Dspace Accession and Discovery Tool for Documents Web-Preservation Software Automatic Crawl of Web sites Auto-translation of links

Players & Roles UCTV Video generation and Broadcast Transcription Descriptive and production metadata generation UCSD-TV Webcast generation Multiple Formats Descriptive metadata generation UCSD Libraries Preservation Models and Processes Description, Metadata Definition Preservation metadata generation Accession and Arrangement SDSC Development of Life-cycle Management systems Storage Preservation Access

Project Management Plans Installation of data grid Access to material Migration of existing videos Registration of existing metadata Integration with Current Workflow Creation of archival form Validation of authenticity and integrity metadata Construction of Archival Information Package Preservation Loading into preservation collection Access mechanisms Controlled Access, Single-sign on and audit trails Workshop on SRB

Research and Development Metadata and Data Accession Human Interactions – Study of Methodology Accession Module Design and validation of preservation templates Producer-archive submission pipeline Authenticity metadata extraction Integrity metadata extraction Development of preservation workflow Mapping of templates to Kepler workflow actors Validation of workflows Extensions to the SRB data grid Administrative tools for applying preservation policies

Pre- Interview Interview Tran- scription Post Interview Metadata Analysis Schema Generation SIP/AIP Definitions Capture Scripts Metadata DB Capture Interview Metadata Capture Make SIPS Aggregate AIP/Verify Store/Replicate/ Preserve TV production Lifecycle Metadata Definition & Capture Workflow Persistent Archival Workflow Broadcast/ Transfer Metadata Validation

Conclusion Build upon a track record: SDSCs collection & storage management capabilities and in developing production software UCSD Library leadership role in Digital preservation for library collections Strong experience of UCTV and UCSD-TV in running content generation systems Have imported 1/8 of legacy videos into the environment (450 Gbytes) Total size of collection will be 220 videos (about 4 Tbytes) Use Proven Technologies: Kepler and workflows SRB, MCAT collection management systems We plan to deliver a prototype in one year