Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Integrated Rule Oriented Data System (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
Trustworthy Repository Criteria, Virtual Organizations, and Infrastructure MacKenzie Smith, MIT Libraries NDIIPP Meeting, July 2010.
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
DCAPE Distributed Custodial Archival Preservation Environments ( Chien-Yi HOU Richard MARCIANO UNC Chapel Hill, SILS /
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Rule-Based Distributed Data Management Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Managing Simulation Output Storage Resource Broker Reagan W. Moore
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Interlib Technology Integration Reagan.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Interlib Technology Integration
Technical Issues in Sustainability
Presentation transcript:

Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar, {moore, schroede, mwan, sekar,

Data Grids SRB - Storage Resource Broker Persistent naming of distributed data Management of data stored in multiple types of storage systems Organization of data as a shared collection with descriptive metadata, access controls, audit traiils iRODS - integrated Rule-Oriented Data System Rules control execution of remote micro-services Manage persistent state information Validate assertions about collection Automate execution of management policies

Extremely Successful Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS Astronomy Data grid Bio-informaticsDigital library Earth SciencesData grid EcologyCollection EducationPersistent archive EngineeringDigital library Environmental science Data grid High energy physicsData grid HumanitiesData Grid Medical communityDigital library OceanographyReal time sensor data, persistent archive SeismologyDigital library, real-time sensor data Goal has been generic infrastructure for distributed data

Data Grids Data virtualization Provide the persistent, global identifiers needed to manage distributed data Provide standard operations for interacting with heterogeneous storage system Map from storage protocols to preferred clients Trust virtualization Manage authentication and authorization Enable access controls on data, metadata, storage Federation Controlled sharing of name spaces, files, and metadata between independent data grids Data grid chaining / Central archives / Master-slave data grids / Peer-to-Peer data grids

Production Data Grids: Observations Data grids manage shared collections that are distributed across multiple storage systems and institutions Data grids are responsible for providing recovery mechanisms for all errors that occur in the distributed environment The number of observed problems is proportional to the size of the collections Need to minimize labor costs by automating: Application of management policies Execution of administrative functions for error recovery Validation of preservation assessment criteria

Observations of Production Data Grids Each community implements different management polices Need a mechanism to support the socialization of shared collections Community specific preservation objectives Community specific assertions about properties of the shared collection Community specific management policies

Collection Management iRODS - integrated Rule-Oriented Data System

Rule-based Data Management Map from management policies to rules controlling execution of remote micro- services Manage persistent state information for results of each micro-service execution Support an additional three logical name spaces Rules Micro-services Persistent state information Constitutes representation information for preservation environments

Example Rules Rule composed of four parts: Name | condition | micro-service set | recovery Rule to automate replication of data for a specific collection acPostProcForPut | $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc,null) | nop Rule types Internal, administrative, user-defined Atomic, deferred, periodic

Management Virtualization Standard policies expressed as rules Integrity Validation of checksums Synchronization of replicas Data distribution Data retention Access controls Authenticity Chain of custody - audit trails Required preservation metadata - templates Generation of AIPs, DIPS

New Capabilities Management capabilities Rules to validate assessment criteria Access controls on rules Time-dependent access controls Access controls on each micro-service Redaction, access controls on structures in a file Rule to parse audit trails, verify consistency of system Data grid evolution Dynamic addition of new rules / micro-services / persistent state information Rules to control migration from old management policies to new management policies Federation Migration of rules and micro-services with data

Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection A

Digital Preservation Preservation is communication with the future How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)? SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology Preservation manages communication from the past What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)? iRODS - integrated Rule-Oriented Data System

Theory of Digital Preservation  Definition of the persistent name spaces  Definition of the operations that are performed upon the persistent name spaces  Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation  Characterization of the transformations that are made to the records for each operation  Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set.  Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria.  Demonstration that the persistent state information is complete, enabling the validation of assessment criteria.  The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record.  A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity.

For More Information Reagan W. Moore San Diego Supercomputer Center