Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

A Community Approach to Preservation: Experiences with Social Science Data ASIST Summit 2010 Jonathan Crabtree April 9, 2010.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Archives & Technology Collide: The Carolina Digital Repository Erin O’Meara Electronic Records Archivist University Archives and Records Services University.
DCAPE Project Update Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Kevin L. Glick Electronic Records Archivist Manuscripts and Archives Yale University ECURE Arizona State University March 2, 2005 Fedora and the Preservation.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Persistent Digital Archives and Library System (PeDALS) A Guide for Wisconsin State Agencies.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
Digital Library Architecture and Technology
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
PeDALS Persistent Digital Archives & Library System Richard Pearce-Moses Deputy Director for Technology & Information Resources Arizona State Library,
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
A disaggregated model for preservation of E-Prints Gareth Knight SHERPA DP Project Arts and Humanities Data Service.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
November 2004 NDIIPP: Future Directions and Relevance to Other Countries Beth Dulabahn Office of Strategic Initiatives Library of Congress November 7,
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
This presentation describes the development and implementation of WSU Research Exchange, a permanent digital repository system that is being, adding WSU.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
Persistent Digital Archives and Library System (PeDALS)
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Infrastructure Breakout What capacities should we build now to manage data and migrate it over the future generations of technologies, standards, formats,
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Data Stewardship Lifecycle A framework for data service professionals Protectors of data.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of Data-PASS Shared Catalog
Policy-Based Data Management integrated Rule Oriented Data System
Joseph JaJa, Mike Smorul, and Sangchul Song
Implementing an Institutional Repository: Part II
Fedora and the Preservation of University Records ECURE
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments (DICE) Center Renaissance Computing Institute

USE CASE: (communication with Paul Watry, SHAMAN project) … Multinational companies operating in different jurisprudence environments, where for example common law is different than public law. A single federated preservation system would need to execute different policies for different jurisprudence environments. A preservation system would need to be policy- driven…

An Environment with Heterogeneous Technologies Fedora DSpace iRODS Eprints Handle System dLibra Greenstone

Sharing Data Across Repositories Enabling inter-repository data management allows us to share data by connecting the repositories of:  Different groups, projects  Different institutions, locations  Different disciplines  Diverse types of data  Diverse hardware, software infrastructure

Data Management Challenges Data driven research generates massive data collections – Data sources are remote and distributed – Collaborators are remote – Wide variety of data types: observational data, experimental data, simulation data, real-time data, office products, web pages, multi-media Collections contain millions of files – Logical arrangement is needed for distributed data – Discovery requires the addition of descriptive metadata Long-term retention requires migration of output into a reference collection – Automation of administrative functions is essential to minimize long- term labor support costs – Creation of representation information for describing file context – Validation of assessment criteria (authenticity, integrity)

To Manage Long-term Preservation Define desired preservation properties – Authenticity / Integrity / Chain of Custody / Original arrangement – Life Cycle Data Requirements Guide Implement preservation processes – Appraisal / accession / arrangement / description / preservation / access Manage preservation environment – Minimize costs – Validate assessment criteria to verify preservation properties

iRODS iRODS - integrated Rule-Oriented Data System A middleware providing functions to:  manage distributed storages  provide metadata support for digital preservation and search functions  allow running distributed workflows to enforce system policies and harvest distributed computing power. iRODS can be used for  building datagrid  building digital library  building digital repositories

iRODS Business Rules  Implement Policies  Verify enforcement (audit trails)  Automate management of exploding data  Let you handle petabytes in hundreds of millions of files  Each Rule defines  Event, Condition, Action chains (micro- services, other Rules), Recovery chains  Rule types  Atomic (immediate), Deferred, Periodic  Rules are executed by iRODS Rule Engine  Applied where data is (server-side)

In the Pledge project [Smith 2007], policies are defined as: “A policy is typically a rule describing (or prescribing) the interactions of actions that take place within the archive, or a constraint determining when and by whom an action may be taken. For example, a policy could demand that every Item being submitted include an approved deposit license. Another policy might demand that every Bitstream in the asset store be checked for content integrity (i.e. checksum recomputed and compared with the checksum on record) at least once in every six months.“ Communications of the ACM Magazine [Berman, 2008], describes the need for repositories to incorporate mechanisms that implement and automate policies and regulations is identified: “The digital data generated by research, industry and governments over the next decade will be subject to increased regulation and evolving community formats, standards, and policies. This means the cyberinfrastructure (CI) developed to host and preserve it will need to incorporate mechanisms to enforce community policies and procedures like auditing, authentication, monitoring, and association of affiliated metadata. Emerging data CI and management environments and systems, including iRODS, LOCKSS, the Fedora Commons, and DSpace are beginning to develop and incorporate mechanisms that implement relevant policies and procedures. Over the next decade, the ability to automatically address, the requirements of policy and regulation will be needed to ensure that our data CI empowers rather than limits us.”

Repositories can collaborate not only through the exchange of content and metadata but also through the enforcement of preservation policies across repository boundaries. A first scenario for repository integration between LOCKSS and iRODS that has been proposed would be: The same web content could be ingested into a LOCKSS box and an iRODS-enabled storage resource. LOCKSS policies could be implemented in iRODS as rules, such as the LOCKSS audit and repair protocol. Such an approach would allow both repositories to audit each other’s collections to ensure consistency, and would allow the repair of any missing or damaged content. This type of policy coupling would contribute to greater diversity in the implementation of LOCKSS network peers, and illustrates the value of policy-driven repository interoperability. A second scenario for DSpace and iRODS policy-level integration that has been researched is [Smith, 2006]: “It is our assumption that archivists who manage digital archives work primarily at the policy level, however, preservation systems necessarily function at the rules level, which are specific to the particular capabilities required by the system, and the rules engine implemented by it. There is a need to standardize the policies and the protocol for transferring them between preservation environments. In this way, archivists can set policies and have the same policy mapped to multiple rules engines and enforced by multiple preservation environments... We believe that this will allow preservation environments to scale appropriately in the coming decades.” “One approach in combining DSpace and iRODS would be to implement a policy repository in DSpace (which could be done through an RDF triple-store), where policies can be associated with objects such as Items, Collections, Communities, etc., and can be put into a Dissemination Information Package (DIP) which is sent to a policy-aware storage repository such as iRODS.” A third scenario (the current proposal) is illustrated through Fedora and iRODS policy integration exercises and prototypes [Fedora/iRODS Integration, 2008]: Expressing entities of Fedora’s FOXML object model in iRODS, such as key attributes, relationships, and behaviors would provide hooks for iRODS storage and policy enforcement through the iRODS rule repository and rule engine

Finding 1: The transfer of information packages between repositories is not sufficient to guarantee the integrity of the content. In addition, management policies from one repository need to be enforced in the repository where the content is replicated. Finding 2: Enforcing behaviors and relationships as machine-actionable policies at a remote repository needs to be further researched, as well as other policy-based mechanisms for validating assessment criteria for successful repository integration. Finding 3: Additional research is needed to explore the feasibility of repository- independent policy representations that lend themselves to interoperability, based upon tests of policy migration between archives. What is the feasibility of repository interoperability at the policy level? Research questions to be addressed are: Q1: Can a preservation environment be assembled from two existing repositories with differing management policies? Q2: Can the policies of the federation be enforced across both repositories, ensuring consistent management of the archives? Q3: Can policies be migrated between repositories, either by association of the policies with the storage repositories, or through control of repository procedures? Q4: What fundamental mechanisms are needed within a repository to implement new policies?

Carolina Digital Repository: CDR Invoking iRODS service in Fedora Use Cases Carolina Digital Repository: CDR Invoking iRODS service in Fedora iRODS Server iRODS Rule Engine iRODS Catalog iRODS Catalog iRODS Server iRODS Rule Engine iRODS Server iRODS Rule Engine iRODS Server iRODS Rule Engine Fedora User Call a service Invoke a rule (Event, Condition, Action chains, Recovery chains)

Overview of iRODS Architecture Staff Generate new visualization data RENCI Data Grid UNC Asheville iRODS Metadata Catalog iRODS Data System User Access and display content UNC Charlotte Duke University NC State University ECU UNC Chapel Hill UNC Health Sciences Library Europa Center

Building a Shared Collection DB Have collaborators at multiple sites, each with different administration policies, different types of storage systems, different naming conventions. Assemble a self-consistent, persistent distributed shared collection Chapel Hill Duke NCSU

Use Cases DCAPE: Distributed Custodial Archival Preservation Environments –Build a distributed production preservation environment that meets the needs of archival repositories for trusted archival preservation services –Develop preservation policies for state archives, university archives and cultural institutions –Use iRODS to implement and deliver the resulting services

Overview of iRODS Architecture Archivist A Automatic replication service requested Services can be invoked for automatic replication, generation of audit trails, notification of activity, ingestion of multiple files, format obsolescence, etc. Delivery of Preservation Services NC State Archives iRODS Metadata Catalog iRODS Data System NC State Library Getty Research Inst. Archivist B Validation service for a collection

DCAPE: Distributed Custodial Preservation Center Purpose: Build a distributed production preservation environment that meets the needs of archival repositories for trusted archival preservation services Distributed partnership of 11 institutions: 33 people * STATES: - California - Kansas - Michigan - Kentucky - North Carolina - New York * UNIVERSITIES: - Tufts University - West Virginia University - UNC (SILS/RENCI) * CULTURAL ENTITIES: - Getty Research Institute * INTERNATIONAL PARTNERS: - Carleton University (Geomatics and Cartographic Research Centre) Richard Marciano, Professor SILS Reagan Moore, Professor SILS Chien-yi Hou, Research Associate SILS John Gallagher, Dir. of Research Mgt. and Admin RENCI Kelly Eubank, Electronic Records Archivist Druscie Simpson, IT Administrator David Minor, Programmer Ed Southern, GRB Admin Jennifer Ricker, Digital Librarian

Use Cases NARA Transcontinental Persistent Archive Prototype –Federate 7 independent iRODS data grid: Each data grid manages its own resources and metadata catalog, applies its own policies –Use iRODS federation mechanism to establish the policies under which data can be shared between the data grids. –Control operations that a remote user is allowed to do within your data grid

Overview of iRODS Architecture Archivists Use iRODS in Preservation Workflow Archivists can use iRODS for preserving Electronic Records, from Appraisal to Access, with Rules enforcing trustworthy repository criteria with audits. Preserving Electronic Records with iRODS iRODS Metadata Catalog Includes audit trails Data Archive Holds Electronic Records Collection Dark Archive Secure Backup iRODS Data System Electronic Engineering Drawings

National Archives and Records Administration Transcontinental Persistent Archive Prototype U Md UCSD MCAT Georgia Tech MCAT Federation of Seven Independent Data Grids NARA II MCAT NARA I MCAT Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. Rocket Center MCAT U NC MCAT