DCC Conference, Glasgow November, 2006 1 Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Pulling it all together… with thanks to Sheila Anderson.
Long-Term Preservation. Technical Approaches to Long-Term Preservation the challenge is to interpret formats a similar development: sound carriers From.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
DSpace: the MIT Libraries Institutional Repository MacKenzie Smith, MIT EDUCAUSE 2003, November 5 th Copyright MacKenzie Smith, This work is the.
Fedora Users’ Conference Rutgers University May 14, 2005 Researching Fedora's Ability to Serve as a Preservation System for Electronic University Records.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
A Very Brief Introduction to iRODS
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Trustworthy Repository Criteria, Virtual Organizations, and Infrastructure MacKenzie Smith, MIT Libraries NDIIPP Meeting, July 2010.
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No The pan-European.
The Project Three-year grant from the National Historical Publications and Records Commission (NHPRC), April 2010-March 2013 Develop electronic records.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
The OAIS model SEEDS meeting May 5 th, 2015, Lausanne Bojana Tasic.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Joint Meeting of CSUL Committees,
Ingest and Dissemination with DAITSS
DAITSS and the Florida Digital Archive
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Implementing an Institutional Repository: Part II
Technical Issues in Sustainability
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer Center

DCC Conference, Glasgow November, What is the Problem? Need to extract local collection management policies from software to be more discoverable, configurable Need to standardize ILM policies for sharing across systems within a preservation environment Need to define metadata to audit ILM operations and achieve trust in a scalable, automated way

DCC Conference, Glasgow November, 20063

4 Preservation Environment Preservation Properties Preservation Control Preservation Operations Management Functions Assessment Criteria Management Policies Capabilities Preservation Environment Persistent State RulesServices Physical Infrastructure DatabaseRule EngineStorage System

DCC Conference, Glasgow November, Local Repository Policy/Rule Types Enterprise specification of assertions Archive a-periodic, deferred consistency rules Collection periodic rules Itemperiodic or atomic rules

DCC Conference, Glasgow November, Policy Framework Based on the NARA/RLG TDR checklist categories: Organization, environment and legal policies Community and usability policies Process and Procedure policies Technology and Infrastructure policies

DCC Conference, Glasgow November, Policy Framework Abstract policy (high-level) Example: repository stipulates the number and location of copies of all digital objects. Number of copies to be made, and which specific location(s), business rules, preferences for order of replication use. Repository has mechanisms in place to insure any/multiple copies of digital objects are synchronized.

DCC Conference, Glasgow November, Policy Framework Concrete policy (local policy and metadata) Example: Specific number of copies of digital objects Locations of copies of digital objects Order of preference for digital object copies Location of business rules for copies (e.g. contract with 3 rd party archives for remote copies)

DCC Conference, Glasgow November, Policy Encoding Looked at lots of schemas and approaches XACML and RuleML, BPEL too limited Single purpose (access control, rights management, workflow, etc.) Ponder and KAoS too risky Research projects that are no longer active Using Rei (N3) RDF ontology

DCC Conference, Glasgow November, Policy Exchange DSpace DIPs based on METS (also looked at XFDU, IMS CP, others) encapsulates content files, metadata, provenance, and policies iRODS enforces policies based on local rules produces state information (metadata) that can be audited by the DSpace repository over time

DCC Conference, Glasgow November, Example Functional Requirements The ERA list defines 854 key capabilities (functional requirements) needed for preservation. These can be loosely organized into categories related to: Management of disposition agreements describing record retention and disposition actions Accession, the formal acceptance of records into the data management system Arrangement, the organization of the records to preserve a required structure (implemented as a collection/sub-collection hierarchy) Description, the management of descriptive metadata as well as text indexing Preservation, the generation of Archival Information Packages Access, the generation of Dissemination Information Packages Subscription, the specification of services that a user picks for execution Notification, the delivery of notices on service execution results Queuing of large scale tasks through interaction with workflow systems System performance and failure reports. Of particular interest is the identification of all failures within the data management system and the recovery procedures that were invoked. Transformative migration, the ability to convert specified data formats to new standards. In this case, each new encoding format is managed as a version of the original record. Display transformation, the ability to reformat a file for presentation. Automated client specification, the ability to pick the appropriate client for each user.

DCC Conference, Glasgow November, Rule Definition Based on assessment criteria / preservation policies / preservation functional capabilities Implemented as Rules controlling micro-services with associated persistent state information

DCC Conference, Glasgow November, Case Study SRB/iRODS virtualized storage environment Provides 3 rd party preservation services Rules derived from local policy, preservation requirements Provides metadata to allow monitoring for trust institutional repository Defines local collection management policies Consumes 3 rd party preservation services (e.g. iRODS) Provides provenance/audit (History) to monitor trust

DCC Conference, Glasgow November, DSpace Event System Archivist defines TDR-level abstract policies, System curator defines ILM events of interest, based on policies e.g. ingest, modification, preservation migration, new edition, change in access rules, etc. System detects and acts on events, records them in the local History (provenance audit) e.g. iRODS deposit History/provenance uses ABC Harmony ontology for ILM (RDF) System curator monitors iRODS state information DSpace History subsystem (via standard RDF browsing tools)

DCC Conference, Glasgow November, iRODS Rule-based System Quantify the management policies Automate the application of the policies Track the outcomes from application of the policies First release of the software is this month

DCC Conference, Glasgow November, iRODS - infrastructure independence Six logical name spaces required to manage preservation properties Records Persons Storage resources Rules Micro-services Persistent state information

DCC Conference, Glasgow November, Example Archivist Policies Authenticity Are required provenance metadata provided with record? - Submission requirement Is the chain of custody properly documented? - Management requirement Integrity Are the bits protected against natural disasters? - Management requirement for replication and distribution Are the bits preserved without corruption? - Future assertion

DCC Conference, Glasgow November, Example Archivist Policies Infrastructure independence Management of preservation properties independently of choice of hardware and software infrastructure Management policies are needed for assertions about the properties of the records (authenticity and integrity) and the properties of the preservation environment (infrastructure independence)

DCC Conference, Glasgow November, Example of Complete Process of Rule Derivation from Preservation Criteria Assessment Criteria Integrity of records is preserved Management policy Integrity will be verified every 6 months Preservation capabilities Replication of records Checksum on each record Synchronization between replicas Federation between archives

DCC Conference, Glasgow November, Rule-based Preservation Policies Generated Rules Event-condition-(set of micro-service or other rules) Each micro-service corresponds to operations on a record at a remote storage location Each micro-service has a recovery procedure to handle remote system failure or unavailability Persistent state information is saved to track the outcome from applying the rule

DCC Conference, Glasgow November, Rule - validate record integrity Check permissions (requires archivist or proxy) Operations on specified record Access remote site Compute the checksum and compare with archived value If checksum is not correct Access a replica, compute checksum, and verify is correct Replace bad replica with a good replica Update audit list to track the replacement Update persistent state to record date of checksum verification

DCC Conference, Glasgow November, Additional implied Assessment Criteria Are there any orphaned records present in the archive with no preservation metadata? Are the replicas distributed across independent administrative domains on different types of storage systems? Is the observed error rate a factor of four lower than the validation rate? Have all records been validated within the required time period?

DCC Conference, Glasgow November, Self-consistency and Closure For every required preservation attribute (authenticity and integrity) are their assessment criteria? For every assessment criterion, does there exist preservation metadata? Are the properties of the preservation environment also preserved?