San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center

Slides:



Advertisements
Similar presentations
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
A Very Brief Introduction to iRODS
Sustainable Preservation Services for Archivists through Distributed Custody Caryn Wojcik State of Michigan Records Management Services.
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Please Describe Data ingestion. This includes support for real-time sensor data (object ring buffers) as well as simulation output (grid portals) –We have.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Rule-Based Distributed Data Management Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Interoperability Grids, Clouds and Collaboratories Ruth Pordes Executive Director Open Science Grid, Fermilab.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
San Diego Supercomputer Center Grid Physics Network (GriPhyN) University of Florida DGL: The Assembly Language for Grid Computing Arun swaran Jagatheesan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 iRODS: A Rule Oriented Data ManagementSystem SRB Space.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
Digital Library The networked collections of digital text, documents, images, sounds, scientific data, and software that are the core of today’s Internet.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
OOI Cyberinfrastructure and Semantics OOI CI Architecture & Design Team UCSD/Calit2 Ocean Observing Systems Semantic Interoperability Workshop, November.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
OAIS (archive) OAIS (archive) Producer Management Consumer.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Presentation transcript:

San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center

San Diego Supercomputer CenterUniversity of California, San Diego Preservation Environments External World Preservation Environment Records A preservation environment protects records from changes in the external world

San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Interpreting digital data  How to build generic format descriptions across both scientific data and office products such that only the description is migrated to new syntax - persistent objects Preservation environment management  How to build generic preservation management software that is more broadly used Interoperability  How to show preservation environments can exchange records while preserving integrity and authenticity  How to exchange records between systems with different management policies

San Diego Supercomputer CenterUniversity of California, San Diego Research Agenda Generic infrastructure Infrastructure used for preservation should also support:  Digital libraries  Data grids  Real-time sensor systems  Workflow provenance systems  Cyberinfrastructure Minimizes risk that infrastructure will become obsolete  Includes development efforts from other projects

San Diego Supercomputer CenterUniversity of California, San Diego Scientific Data Format Virtualization Characterize the properties of a digital entity independently of the creation application (scientific data)  Describe the structures present within the bit stream - DFDL  Describe the relationships present between the structures Logical relationships  Semantic labels Temporal relationships  Mapping of time stamps to a coordinate system Structural relationships  Mapping of bytes to words to arrays Spatial relationships  Mapping of arrays to coordinate systems  Mapping of coordinate systems to geometry Functional relationships  Mapping of semantic labels to physical quantities, and the allowed compositions of the physical quantities

San Diego Supercomputer CenterUniversity of California, San Diego Persistent Objects Keep the original bits unchanged Separate knowledge required for parsing from manipulation behaviors Migrate the knowledge representation onto new syntax over time For office products - Multivalent  Structure and relationships captured within a media adaptor  Behaviors (manipulations of the structures) based on the defined relationships  Can add new behaviors on the original structures  Or can restrict presentation to the original behaviors.

San Diego Supercomputer CenterUniversity of California, San Diego Designated Community Each designated community defines: Standard semantics  Astronomy community - Uniform Content Descriptors Standard encoding format  Astronomy community - FITS file Standard services  Manipulate standard format using standard semantics  Astronomy community - SIAP, Simple Image Access Protocol Can we build better representations for description of the community standards?  Can format virtualization simplify tasks for the designated community?

San Diego Supercomputer CenterUniversity of California, San Diego

San Diego Supercomputer CenterUniversity of California, San Diego Preservation Environment

San Diego Supercomputer CenterUniversity of California, San Diego iRODS - integrated Rule-Oriented Data System Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Metadata Persistent Repository Engine Rule Current State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module

San Diego Supercomputer CenterUniversity of California, San Diego iRODS - infrastructure independence Six logical name spaces required to manage preservation properties  Records  Persons  Storage resources  Rules  Micro-services  Persistent state information

San Diego Supercomputer CenterUniversity of California, San Diego Summary of Mapping ERA Capabilities to Management Rules Multiple systems need to be integrated:  PAWN submission pipeline - 34 operations  Cheshire indexing system - 13 operations  Kepler workflow - 53 operations  iRODS data management operations  Operations facility - the remaining capabilities The 597 operations are executed by 174 generic rules The analysis identified five types of metadata attributes:  Collection metadata - 11 attributes  File metadata attributes  User metadata - 38 attributes  Resource metadata - 9 attributes  Rule metadata - 32 attributes

San Diego Supercomputer CenterUniversity of California, San Diego Two Types of Rules Manage micro-services  Replicate, validate integrity, synchronize, manage disposition, …  Compare outcomes with expectations Manage structured information  Parse information from submission agreements, disposition agreements  Format information for dissemination information packages, archival information packages, error reporting Expect transformation to higher levels of granularity  Structured management policies  Structured micro-services - workflows  Structured assertions

San Diego Supercomputer CenterUniversity of California, San Diego More Information SRB: iRODS: