PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano

Slides:



Advertisements
Similar presentations
© 2006 Open Grid Forum JSDL 1.0: Parameter Sweeps OGF 23, June 2008, Barcelona, Spain.
Advertisements

© 2006 Open Grid Forum Network Services Interface OGF30: Connection Services Guy Roberts, 27 th Oct 2010.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
© 2006 Open Grid Forum Network Services Interface Introduction to NSI Guy Roberts.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum JSDL 1.0: Parameter Sweeps: Examples OGF 22, February 2008, Cambridge, MA.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
© 2006 Open Grid Forum Joint Session on Information Modeling for Computing Resources OGF 20 - Manchester, 7 May 2007.
© 2007 Open Grid Forum JSDL-WG Session OGF21 – Activity schema session 17 October 2007 Seattle, U.S.
© 2006 Open Grid Forum 2 nd March 09 Enterprise Grid Requirements Research Group OGF25 EGR-RG Session Group.
© 2006 Open Grid Forum OGSA Next Steps Discussion Providing Value Beyond the Specifications.
Oct 15 th, 2009 OGF 27, Infrastructure Area: Status of FVGA-WG Status of Firewall Virtualization for Grid Applications - Working Group
© 2008 Open Grid Forum Resource Selection Services OGF22 – Boston, Feb
© 2006 Open Grid Forum Network Services Interface OGF29: Working Group Meeting Guy Roberts, 19 th Jun 2010.
© 2006 Open Grid Forum JSDL Optional Elements OGF 24 Singapore.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
© 2006 Open Grid Forum Grid Resource Allocation Agreement Protocol GRAAP-WG working session 2 Wenesday, 17 September, 2008 Singapore.
© 2006, 2007 Open Grid Forum Michel Drescher, FujitsuOGF-20, Manchester, UK Andreas Savva, FujitsuOGF-21, Seattle, US (update) Extending JSDL 1.0 with.
© 2006 Open Grid Forum Interoperability Requirements for e/Cyber-infrastructure Workshop.
A Very Brief Introduction to iRODS
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
1 ©2013 Open Grid Forum OGF Working Group Sessions Security Area – FEDSEC Jens Jensen, OGF Security Area.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Rule-Based Distributed Data Management Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
© 2006 Open Grid Forum DCI Federation Protocol BoF Alexander Papaspyrou, TU Dortmund University Open Grid Forum March 15-18, 2010, Munich, Germany.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
© 2007 Open Grid Forum Data Grid Management Systems: Standard API - community development Arun Jagatheesan, San Diego Supercomputer Center & iRODS.org.
© 2006 Open Grid Forum Service Level Terms Andrew Grimshaw.
Peter Ziu Northrop Grumman ACS-WG Grid Provisioning Appliance Concept GGF13, March 14, 2005 (Revised 8/4/2005)
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
OGF DMNR BoF Dynamic Management of Network Resources Documents available at: Guy Roberts, John Vollbrecht.
© 2006 Open Grid Forum Network Services Interface OGF 32, Salt Lake City Guy Roberts, Inder Monga, Tomohiro Kudoh 16 th July 2011.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
© 2007 Open Grid Forum Enterprise Best (Community) Practices Workshop OGF 22 - Cambridge Nick Werstiuk February 25, 2007.
© 2010 Open Grid Forum OCCI Status Update Alexander Papaspyrou, Andy Edmonds, Thijs Metsch OGF31.
© 2006 Open Grid Forum FEDSEC-CG Andrew Grimshaw and Jens Jensen.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
© 2006 Open Grid Forum Network Services Interface OGF 33, Lyon Guy Roberts, Inder Monga, Tomohiro Kudoh 19 th Sept 2011.
© 2015 Open Grid Forum ETSI CSC activities Wolfgang Ziegler Area Director Applications, OGF Fraunhofer Institute SCAI Open Grid Forum 44, May 21-22, 2015.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
© 2006 Open Grid Forum GridRPC Working Group 15 th Meeting GGF22, Cambridge, MA, USA, Feb
OGSA-RSS Face-to-Face Meeting Sunnyvale, CA, US Aug 15-16, 2005.
© 2006 Open Grid Forum Network Services Interface CS Errata Guy Roberts, Chin Guok, Tomohiro Kudoh 29 Sept 2015.
© 2006 Open Grid Forum OGSA-WG: EGA Reference Model GGF18 Sept. 12, 4-5:30pm, #159A-B.
© 2006 Open Grid Forum Remote Instrumentation Services in Grid Environment Introduction Marcin Płóciennik Banff, OGF 27 Marcin Płóciennik.
© 2006 Open Grid Forum Grid High-Performance Networking Research Group (GHPN-RG) Dimitra Simeonidou
Peter Ziu Northrop Grumman ACS-WG Grid Provisioning Appliance Concept GGF13, March 14, 2005
© 2008 Open Grid Forum PGI - Information Security in the UNICORE Grid Middleware Morris Riedel (FZJ – Jülich Supercomputing Centre & DEISA) PGI Co-Chair.
© 2007 Open Grid Forum OGF Management Area Meeting OGF20 7 May, am-12:30pm Manchester, UK.
© 2006 Open Grid Forum Grid Resource Allocation Agreement Protocol GRAAP-WG working session 1 Thursday, 5 March, 2009 Catania, Sicily.
© 2006 Open Grid Forum VOMSPROC WG OGF36, Chicago, IL, US.
© 2007 Open Grid Forum OGF20 Levels of the Grid Workflow Interoperability OGSA-WG F2F meeting Adrian Toth University of Miskolc NIIF 11 th May, 2007.
OGSA Data Architecture WG Data Transfer Session Allen Luniewski, IBM Dave Berry, NESC.
Network Services Interface
Welcome and Introduction
RISGE-RG use case template
Grid Resource Allocation Agreement Protocol
Policy-Based Data Management integrated Rule Oriented Data System
Network Services Interface
OGSA-Workflow OGSA-WG.
WS Naming OGF 19 - Friday Center, NC.
Activity Delegation Kick Off
Proposed JSDL Extension: Parameter Sweeps
OGF 40 Grand BES/JSDL Andrew Grimshaw Genesis II/XSEDE
Presentation transcript:

PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano Goals:  Analyze capabilities required by a preservation environment  Define rule-based preservation environment - iRODS  RLG/NARA assessment criteria for a Trusted Digital Repository CASPAR - representation information SHAMAN - migration micro-services  Demonstrate creation of a preservation environment based on data grid technology  Demonstrate creation of preservation rules controlling a preservation environment Participants:  CASPAR - Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval Sustaining Heritage Access through Multivalent ArchiviNg  SHAMAN - Sustaining Heritage Access through Multivalent ArchiviNg  NCRIS - National Collaborative Research Infrastructure Strategy  PLANETS - Preservation and Long-term Access through Networked Services  MIT - DSpace digital library  NARA Transcontinental Persistent Archive Prototype  U Md - Producer Archive Workflow Network  UK Digital Curation Centre  Taiwan National Archives

PERG OGF-22 Intellectual Property Policy I acknowledge that participation in OGF22 is subject to the OGF Intellectual Property Policy. Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Section 17 of GFD-C.1 (.pdf), which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session, any OGF working group or portion thereof, the GFSG, or any member thereof on behalf of the GFSG, the GFAC, or any member thereof on behalf of the GFAC, any OGF mailing list, including any working group or research group list, or any other list functioning under OGF auspices, the GFD Editor or the GWD process Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. Excerpt from Section 17 of GFD-C.1 Where the GFSG knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non- discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.

PERG OGF-22 Data Management Applications Data grids  Share data - organize distributed data as a collection Digital libraries  Publish data - support browsing and discovery Persistent archives  Preserve data - manage technology evolution Real-time sensor systems  Federate sensor data - integrate across sensor streams Workflow systems  Analyze data - integrate client- & server-side workflows Coalescence of requirements into generic infrastructure

PERG OGF-22 Generic Infrastructure Data grids organize distributed data into shared collections  Persistent name spaces for files, users, storage  Collection attributes  Provenance, descriptive, system metadata Data grids manage heterogeneous storage systems  Standard operations across file systems, tape archives, object ring buffers  Enable management of technology evolution  At the point in time when new technology is available, both the old and new systems can be integrated

PERG OGF-22 Preservation Requirements Authenticity  Maintain information about provenance of data  Assertions made about the file at the time of ingestion Integrity  Maintain information about the management of the data  Assertions made by the archivist  Access controls, audit trails, checksums, replication, synchronization, federation Infrastructure independence  Management of properties of records independently of choice of storage system Scalability  Management of large collections (billions of records, petabytes of data, thousands of attributes)

PERG OGF-22 National Archives and Records Administration Transcontinental Persistent Archive Prototype Federation of Seven Independent Data Grids Extensible Environment, can federate with additional research and education sites. Each data grid uses different vendor products. U Md SDSC MCAT Georgia Tech MCAT NARA II MCAT NARA I MCAT Rocket Center MCAT U NC MCAT

PERG OGF-22 Extremely Successful Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, …  Astronomy Data grid  Bio-informaticsDigital library  Earth SciencesData grid  EcologyCollection  EducationPersistent archive  EngineeringDigital library  Environmental science Data grid  High energy physicsData grid  HumanitiesData Grid  Medical communityDigital library  OceanographyReal time sensor data, persistent archive  SeismologyDigital library, real-time sensor data Goal has been generic infrastructure for distributed data

PERG OGF-22

PERG OGF-22 Data Grid Evolution Data grids  Management of preservation environment properties  Data and trust virtualization  Infrastructure independence  SRB - Storage Resource Broker Rule-based data grids  Automation of management policies  Management virtualization  Open source software  iRODS - integrated Rule-Oriented Data System 

PERG OGF-22 Using a Data Grid - Details iRODS Server Rule Engine Data request goes to iRODS Server iRODS Server Rule Engine Metadata Catalog Rule Base DB Server looks up information in catalog Catalog tells which iRODS server has data 1 st server asks 2 nd for data The 2nd iRODS server applies rules User asks for data

PERG OGF-22 Requirements Driving Evolution Observe that as the size of the shared collections grow, the administrative tasks can become onerous.  Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment Need to minimize labor support through automation of administrative functions  File ingestion tasks  Verification of desired collection properties  Integrity checks and replica management

PERG OGF-22 Requirements Driving Evolution Observe that each preservation environment has unique management policies  User administration  File retention & deletion  Time-dependent access controls  Data distribution and replication  File update (versions, backups)  Descriptive metadata

PERG OGF-22 Requirements Driving Evolution Socialization of collections  The archivists have specific properties that they assert the collection will possess  Completeness  Authoritative sources  Authenticity  The creators of the records have their own criteria for the properties they expect Socialization is the mapping from creator assertions to archivist expectations  Extract records from the environment in which they were created and migrate into the preservation environment  Extract records from the preservation environment and deliver to users of the archive  Maintain assertions about the records during both extraction processes

PERG OGF-22 Data Management iRODS - integrated Rule-Oriented Data System

PERG OGF-22 Rules Rule classes  System enforced rules  Administrator controlled rules  User defined rules Rule execution  Atomic rules - executed on each operation invoked by a client  Deferred rules - executed at a future time  Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity)

PERG OGF-22 iRODS Rule Syntax Event | Condition | Action-set | Recovery-set  Event - triggered by operation or queued rule  Condition- composed of tests on any attributes in the persistent state information  Action-set - composed from both micro-services and rules  Recovery-set - used to ensure transaction semantics and consistent state information Executed by a rule engine installed at each storage location - server side workflows

PERG OGF-22 Micro-Services Challenge is that storage systems do not provide desired processes  Have “minimal” set of standard operations that are performed at the storage system  Have actions required by clients such as replication, metadata extraction, format migration  Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes.

PERG OGF-22 Data Virtualization Storage System Storage Protocol Access Interface Standard Micro-services Data Grid Map from the actions requested by the access method to a standard set of micro- services. The standard micro- services are mapped to the operations supported by the storage system Standard Operations

PERG OGF-22 integrated Rule-Oriented Data System Client InterfaceAdmin Interface Current State Rule Invoker Micro Service Modules Metadata-based Services Resources Micro Service Modules Resource-based Services Service Manager Consistency Check Module Rule Modifier Module Consistency Check Module Engine Rule Confs Config Modifier Module Metadata Modifier Module Metadata Persistent Repository Consistency Check Module Rule Base

PERG OGF-22 Distributed Management System RuleEngine DataTransport MetadataCatalog ExecutionControl MessagingSystem ExecutionEngine Virtualization ServerSideWorkflow PersistentStateinformation Scheduling PolicyManagement

PERG OGF-22 Digital Preservation Preservation community is defining the rules need to assert trustworthiness of a digital repository  RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist. enceInputDocuments/trac.pdf Defined 105 rules that are being implemented in iRODS

PERG OGF-22 RLG/NARA Assessment Example TRAC assessment criteria 90Verify descriptive metadata and source against SIP template and set SIP compliance flag 91Verify descriptive metadata against semantic term list 92Verify status of metadata catalog backup (create a snapshot of metadata catalog) 93Verify consistency of preservation metadata after hardware change or error

PERG OGF-22 Classes of Assessment Criteria Collection properties  List properties of associated name spaces  Verify properties  Compare properties with assertions Collection operations  Transform file formats  Migrate data  Generate audit trails Structured information  Parse audit trails to generate compliance reports  Apply templates to extract information  Apply templates to format state information

PERG OGF-22 Which Comes First? Specification of required provenance metadata  PREMIS - defines metadata that should be maintained about events associated with record  Definition of the procedures left to each preservation environment Specification of required management policies  Define explicitly the management procedures  Derive the required state information needed to track outcomes  Implies provenance metadata is defined by management policies  Observe this leads to multiple classes of preservation metadata associated with each preserved name space

PERG OGF-22 Persistent State Information User name space  Identity of archivists  Qualifications of archivists Record (file) name space  Provenance metadata  Transformative migrations  Chain of custody (storage locations)  Integrity  Representation information (OAIS) Storage resource name space  Archival properties  Error rates

PERG OGF-22 Persistent State Information Representation information for preservation environment Rule name space  Management policies that control operations within preservation environment  Versions of rules  Verification criteria Micro-service name space  Management procedures that quantify operations on records  Versions of micro-services  Verification criteria Persistent State name space  State information created by each version of a micro-service

PERG OGF-22 Preservation Requirements What are your required preservation management policies? What are your required preservation processes? What are your required preservation assessment criteria? What preservation systems are you using, and how can the preservation systems interoperate? Can a set of records be migrated from your preservation environment into another system while maintaining authenticity, integrity, and chain of custody?

PERG OGF-22 Theory of Digital Preservation Given the set of preservation policies Given the set of preservation procedures Given the set of persistent state information Does the system have demonstrable closure and consistency properties?  Is the required persistent state information generated that is needed to make assertions about trustworthiness, authenticity, integrity?  Can assertions be made about the set of preservation procedures that have been applied to the records (no missing steps)?  Do the applied preservation procedures enforce all preservation policies?

PERG OGF-22 iRODS Application NSF - SDCI grant “Adaptive Middleware for Community Shared Collections”  iRODS development, SRB maintenance NARA - Transcontinental Persistent Archive Prototype  Trusted repository assessment criteria NSF - Ocean Research Interactive Observatory Network (ORION)  Real-time sensor data stream management NSF - Temporal Dynamics of Learning Center data grid  Management of Institution Research Board approval

PERG OGF-22 iRODS Development Status Current release is version 1.0  January 23, 2008  International collaborations  SHAMAN - University of Liverpool  Sustaining Heritage Access through Multivalent ArchiviNg  CASPAR  Representation information, TRAC assessment criteria  UK e-Science data grid  IN2P3 (Lyon, France) data grid migration  DSpace policy management integration  Fedora user middleware integration  LStore distributed metadata catalog integration

PERG OGF-22 Planned Development In progress:  GSI support  Audit trails - mechanisms to record and track iRODS persistent state changes  Structured information interface based on mounted collection driver (tar file)  GUI Browser (AJAX)  Driver for HPSS  Porting to additional versions of Unix/Linux (Ubuntu completed) Planned  Time-limited sessions via a one-way hash authentication  Python Client library  Driver for SAM-QFS  Porting to Windows  Support for MySQL as the metadata catalog  MCAT to ICAT migration tools  Extensible Metadata including Databases Access Interface  Zones/Federation  Cheshire / Multivalent Browser micro-service

PERG OGF-22 For More Information Reagan W. Moore San Diego Supercomputer Center