Policy-Based Data Management integrated Rule Oriented Data System

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Presentations Introduction Case Studies:
Institutional Repositories It’s not Just the Technology New England Archivists Boston College March 11, 2006 Eliot Wilczek University Records Manager Tufts.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
A Very Brief Introduction to iRODS
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data Chien-Yi HOU Richard MARCIANO {chienyi, School.
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Kevin L. Glick Electronic Records Archivist Manuscripts and Archives Yale University ECURE Arizona State University March 2, 2005 Fedora and the Preservation.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
R2R ↔ NODC Steve Rutz NODC Observing Systems Team Leader May 12, 2011 Presented by L. Pikula, IODE OceanTeacher Course Data Management for Information.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
OAIS Producer (archive) Consumer Management
Joseph JaJa, Mike Smorul, and Sangchul Song
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Implementing an Institutional Repository: Part II
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

Policy-Based Data Management integrated Rule Oriented Data System Reagan Moore rwmoore@renci.org Arcot Rajasekar sekar@diceresearch.org Mike Wan mwan@diceresearch.org

Preservation is a Stage in the Data Life Cycle Each data life cycle stage re-purposes the original collection Project Collection Private Local Policy Data Grid Shared Distribution Policy Data Processing Pipeline Analyzed Service Policy Digital Library Published Description Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies for a broader community Virtualize the stages of the data life cycle through policy evolution Interoperability across data life cycle representations

Policy-based Preservation Environment Purpose - reason a preservation environment is assembled Properties - attributes needed to ensure the purpose Policies - control for ensuring maintenance of properties Procedures - functions that implement the policies State information - results of applying the procedures Assessment criteria - validation that state information conforms to the desired purpose Federation - controlled sharing of logical name spaces These are the necessary elements for a preservation environment 3

iRODS - Policy-based Data Management Turn policies into computer actionable rules Compose rules by chaining standard operations Standard operations (micro-services) executed at the remote storage location Manage state information as attributes on namespaces: Files / collections /users / resources / rules Validate assessment criteria Queries on state information, parsing of audit trails Automate administrative functions Minimize labor costs

Policy-based Preservation - Authenticity Purpose - Maintain authenticity of records Properties - Define template for required representation information Policies - Extract and register representation information for each file on ingestion Procedures - Parse record / XML file to extract metadata State information - Register representation information into metadata catalog Assessment criteria - Compare registered metadata with template defining required values A preservation environment should automate each of these steps 5 5

Assessment Criteria NARA Electronic Records Archive capabilities list 853 defined capabilities Mapped to 174 computer actionable rules Mapped to 212 state information attributes RLG/NARA Trusted Repository Audit Checklist Mapped to 105 computer actionable rules Included 66 rules specific to preservation ISO Mission Operations Information Management System repository audit checklist 106 policies for operation and control Mapped to 52 computer actionable rules

Examples of Assessment Criteria Specify a template that governs the representation information required for a specific record series content of a Submission Information Package (SIP) content of an Archival Information Package (AIP) number of replicas Verify compliance of SIP with specification compliance of AIP with specification compliance with required replica number integrity of the replicas

Preservation Communities NARA Transcontinental Persistent Archive Prototype Develop policies to automate preservation of selected digital holdings National Optical Astronomy Observatory Accession images from a telescope in Chile Carolina Digital Repository Preserve institutional collections

Federation of Seven Independent Data Grids National Archives and Records Administration Transcontinental Persistent Archive Prototype Federation of Seven Independent Data Grids NARA II MCAT Georgia Tech MCAT Rocket Center MCAT NARA I U NC U Md UCSD MCAT MCAT MCAT MCAT Extensible Environment, can federate with additional research and education sites. Each data grid can use different vendor products. Policy to coalesce authentic records from independent data grids. Choose whether write to central archive, or use soft links.

NOAO Zone Architecture Telescope Telescope Archive

Carolina Digital Repository Architecture: Web interface Fedora digital library middleware iRODS data grid Supports: Registration of file into iRODS Generation of FOXM Registration into Fedor Query through Fedor Synchronization of catalogs From Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using iRODS and Fedora (Pcolar, Davis, Zhu, Chassanoff, Hou, Marciano)

Preservation Concepts Preservation environments are inherently distributed and federated Mitigate risk of data loss Mitigate dependence on a single vendor Mitigate dependence on a single institution Management of technology evolution can be done through same mechanisms that support interoperability across heterogeneous storage systems At the point in time when add new technology, both the old and new technologies are present Migrate from old protocols to new protocols using data grids

Preservation Concepts (Cont.) Preservation requires management of communication with the future Need to migrate records to future technology Need procedural infrastructure independence to ensure can parse data formats in the future Preservation requires management of communication from the past Need to know what policies and procedures were applied by prior archivists Need to validate that policies were enforced Federation minimizes risk of data loss Deep archive implemented through rules that: turn on data staging, data versioning, replication turn off deletion, external write, external data grid access

Preservation Concepts (Cont.) Periodic verification of assessment criteria Check that required properties still hold These rules are in addition to the rules that enforce policies Compare values in metadata catalog with expected values Number of replicas, checksums of files, required metadata Verify relationships between files in storage and entries in metadata catalog Metadata record <----> files in storage Parse audit trails to track compliance over time Evaluate impact of changing preservation policy

Overview of iRODS Architecture User Can Search, Access, Add and Manage Data & Metadata iRODS Data System iRODS Metadata Catalog Track information iRODS Rule Engine Track policies iRODS Data Server Disk, Tape, etc. *Access data with Web-based Browser or iRODS GUI or Command Line clients.

Managing Properties of Records Namespaces Record (file name) Users Storage resources Rules State information User-defined metadata (provenance) System attributes Procedures Basic operations performed on data Store, retrieve, move, copy, replicate, parse, aggregate Extract metadata, checksum, synchronize, version

Migration of Procedures Map from actions requested by the access method to a standard set of Micro-services. Map the standard Micro-services to standard operations. Map the operations to protocol supported by the operating system. Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Storage System

Format of an iRODS Rule Action | Condition | MS1, …, MSn | RMS1, …, RMSn Action Name of action to be performed Name known to the server and invoked by server Condition – condition under which the rule applies Micro-services - Chain of micro-services to be executed Recovery micro-service - If any micro service fails, recovery micro-service(s) executed to maintain transactional consistency Example of MS/RMS createFile(*F) removeFile(*F) ingestMetadata(*F,*M) rollback

iRODS - Distributed Operating System

iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2009 Budget Supplement in the area of Human and Computer Interaction Information Management technology research. Reagan W. Moore rwmoore@renci.org http://irods.diceresearch.org NSF OCI-0848296 “NARA Transcontinental Persistent Archives Prototype” NSF SDCI-0721400 “Data Grids for Community Driven Applications”