Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.

Slides:



Advertisements
Similar presentations
Building Shared Collections Using the Storage Resource Broker Storage Resource Broker Reagan W. Moore
Advertisements

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
Peter Berrisford RAL – Data Management Group SRB Services.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids, Digital Libraries and Persistent Archives Reagan.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure Integration of Data Grids, Digital Libraries, and Persistent.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
Federating Archives in the DELAMAN Network Reagan W. Moore San Diego Supercomputer Center Storage Resource.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Archival Prototypes and Lessons Learned Mike Smorul UMIACS.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Data Grid Interactions with Firewalls Michael Wan Reagan Moore SDSC/UCSD/NPACI.
January, 23, 2006 Ilkay Altintas
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
SDSC Projects Part 1: BUILDING PRESERVATION ENVIRONMENTS (Reagan Moore, Storage Resource Broker (SRB) and collection migration technologies:
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
MCAT: A Metadata Catalog San Diego Supercomputing Center Part of the Storage Resource Broker (SRB)
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
Managing Simulation Output Storage Resource Broker Reagan W. Moore
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
San Diego Supercomputer Center National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center National Partnership for Advanced.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Grid Services/SRB/SRM & Practical Hai-Ning Wu Academia Sinica Grid Computing.
Reference Model for an Open Archival Information System (OAIS) ESIP Summer Meeting John Garrett – ADNET Systems at NASA/GSFC ESIP Summer Meeting.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Digital Preservation: Current Thinking Anne Gilliland-Swetland Department of Information Studies.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Alternative Architecture for Information in Digital Libraries Onno W. Purbo
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
The Global Land Cover Facility is sponsored by NASA and the University of Maryland.The GLCF is a founding member of the Federation of Earth Science Information.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
Introduction to The Storage Resource.
M-1 INGEST OVERVIEW Don Sawyer National Space Science Data Center NASA/GSFC October 13, 1999.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
OAIS (archive) OAIS (archive) Producer Management Consumer.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Presentation transcript:

Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker

Topics Preservation environments Digital library technology Data grid technology Fundamental concepts / future research Data / information / knowledge Persistent objects Knowledge management

Preservation Archival processes through which a digital entity is extracted from its creation environment, and then supported in a preservation environment, while maintaining authenticity and integrity information. Extraction process requires insertion of support infrastructure underneath the digital material Goal is infrastructure independence, the ability to use any commercial storage system, database, or access mechanism

Preservation Communities InterPARES - diplomatics Preservation of records NARA Preservation of records from federal agencies State archives Preservation of submitted “collections” Continuum model Preservation of active data and records

Preservation What differentiates a preservation environment from a digital library?

Digital Libraries Support the community vocabulary Discovery and browse using community relevant terms Support the community data format Maintain information on the data format of each item Support the community access services Provide services that manipulate and display the community data format

Preservation Mandates Diplomatics Authenticity Integrity NARA Infrastructure independence Scalability State archives Automation of archival processes

InterPARES - Diplomatics Authenticity - maintain links to metadata for: Date record is made Date record is transmitted Date record is received Date record is set aside [i.e. filed] Name of author (person or organization issuing the record) Name of addressee (person or organization for whom the record is intended) Name of writer (entity responsible for the articulation of the record’s content) Name of originator (electronic address from which record is sent) Name of recipient(s) (person or organization to whom the record is sent) Name of creator (entity in whose archival fonds the record exists) Name of action or matter (the activity for which the record is created) Name of documentary form (e.g. , report, memo) Identification of digital components Identification of attachments (e.g. digital signature) Archival bond (e.g. classification code)

InterPARES - Diplomatics Integrity - maintain links to metadata for Name(s) of the handling office / officer Name of office of primary responsibility for keeping the record Annotations or comments Actions carried out on the record Technical modifications due to transformative migration Validation

Preservation Approach Provide mechanisms to: Create archival context for the content Context is preservation metadata (provenance, administrative, descriptive, structural, behavioral) Content is the submitted digital entity Assert integrity - the consistency between the context and the content Track operations done on material and update context Assert authenticity - that the material represents the original site Track the chain of custody Manage technology evolution (encoding standard, storage repository, information repository, access methods)

Data Grids What is the difference between a preservation environment and a data grid?

Data Grids Manage shared collections that are distributed in space Location of item, access controls, checksums Implement infrastructure independence Standard operations for interacting with storage repositories Implement presentation independence Standard APIs to support porting of user interfaces

Preservation Environment Digital library infrastructure that supports Preservation metadata Arrangement and description of items Access mechanisms Data grid infrastructure that supports Shared collections that are migrated forward in time Management of technology evolution Administrative metadata providing status of records

Infrastructure Independence Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Access Methods (Web Browser, DSpace, OAI-PMH) Naming conventions provided by storage systems

Data Grids Provide a Level of Indirection for Each Naming Convention Storage Repository Storage location User name File name File context (creation date,…) Access constraints Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection

Demonstration Logical file name space Distinguished user name space Shared collection Distributed data storage Replication as a file property Digital entities

Data Grids Provide two levels of indirection: Low level API used to interact with storage repositories Standard operations for manipulating files in a storage system Standard operations for manipulating a catalog stored in a database High level API used to support user interfaces Three basic APIs - “C” library call, Unix shell commands, Java class library Other are interfaces ported on top of the basic APIs.

Unix Shell NT Browser, Kepler Actors OAI, WSDL, (WSRF) HTTP, DSpace, OpenDAP, GridFTP Archives - Tape, Sam-QFS, DMF, HPSS, ADSM, UniTree, ADS Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix File Systems Unix, NT, Mac OSX Application ORB Storage Repository Abstraction Database Abstraction Databases - DB2, Oracle, Sybase, Postgres, mySQL, Informix C Library, Java Logical Name Space Latency Management Data Transport Metadata Transport Consistency & Metadata Management / Authorization, Authentication, Audit Linux I/O C++ DLL / Python, Perl, Windows Federation Management Storage Resource Broker 3.3

Accessing Multiple Types of Storage Systems User Application Archive at SDSC Archive at NARA Archive at U Md

Standard Data Access Operations Common set of operations for interacting with every type of storage repository User Application Remote operations Unix file system Latency management Procedures Transformations Third party transfer Filtering Queries Collective operations Replication Fault tolerance Load leveling Archive at SDSC Archive at NARA Archive at U Md

Accessing Data at Multiple Sites User Application Each site has their own naming convention for files A data grid provides a uniform way to name and access the files across the sites Archive at SDSC Archive at NARA Archive at U Md

Building a Distributed Collection Archive at SDSC Data Grid Common naming convention and set of attributes for describing digital entities User Application Logical name space Location independent identifier Persistent identifier Collection owned data Authenticity metadata Access controls Audit trails Checksums Descriptive metadata Inter-realm authentication Single sign-on system Archive at NARA Archive at U Md

SRB server SRB agent SRB server Federated Server Architecture MCAT Read Application SRB agent Logical Name Or Attribute Condition 1.Logical-to-Physical mapping 2.Identification of Replicas 3.Access & Audit Control Peer-to-peer Brokering Server(s) Spawning Data Access Parallel Data Access R1 R2 5/6

Managing Access Authenticate users independently of storage systems Preservation environment owns the data Authorize data access independently of storage system ACLs on both data and metadata Maintain audit trails of all accesses Both read and write

Collection-owned Data Store data at remote storage system under data-grid ID Access data through data grid servers Track all operations on data and update state information User authenticates to a data grid server Access controls are checked for permissions Data grid servers authenticate messages from other servers Remote server authenticates to remote storage system Multiple authentication mechanisms GSI / challenge-response / tickets

Provide Context for Data Properties of files Provenance - source Descriptive attributes Structure Organize properties as metadata in a collection hierarchy Define operations on file properties Manage state information - location, replicas, containers Separate context management from content management Maintain consistency of context as operations are done on content

Database Operations Standard interface to support Schema extension - user defined attributes Snowflake table creation SQL generation Import and export of XML files Bulk metadata load and unload Operations required to manage a catalog that resides in a database

National Archives and Records Administration - Research Prototype Persistent Archive NARAU MdSDSC MCAT Principle copy stored at NARA with complete metadata catalog Replicated copy at U Md for improved access, load balancing and disaster recovery Deep Archive at SDSC, no user access, but complete copy Demonstrate preservation environment Authenticity Integrity Management of technology evolution Mitigation of risk of data loss Replication of data Federation of catalogs Management of preservation metadata Scalability EAP collection 350,000 files 1.2 TBs in size Federation of Three Independent Data Grids

Preservation Requirements Maintain authenticity and integrity of electronic records Authenticity - assertion of provenance of data Integrity - assertion of invariance of bits Manage risk of data loss Media corruption / System failures / Operational errors / Natural disaster / Malicious users Manage technology obsolescence Support migration of collection to new systems Bulk data operations

Replication How many replicas are enough?

Federation Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Control/consistency constraints Data Collection A Access controls and consistency constraints on cross registration of digital entities

Data Grid Zones Choose how name spaces will be shared Cross register storage resources May the other data grid write to my storage? Cross register user names Users are authenticated by their home zone Cross register files Can replicate files into another data grid Cross register metadata Can build a copy of the metadata catalog

Replicated Catalog Deep Archive Partial User-ID Sharing Partial Resource Sharing No Metadata Synch Hierarchical Zone Organization One Shared User-ID System Managed Replication Connection From Any Zone Complete Resource Sharing System Set Access Controls System Controlled Complete Synch Complete User-ID Sharing System Managed Replication System Set Access Controls System Controlled Partial Synch No Resource Sharing Super Administrator Zone Control System Controlled Complete Synch No User-ID Sharing Peer-to-Peer Data Grids Replication Data Grids Hierarchical Data Grids Occasional Interchange Free Floating Resource Interaction User and Data Replica Nomadic Snow Flake Master Slave Replicated Data Federation Environments Replication Constraints Consistency Constraints Access Constraints

Examples of Extensibility Storage Repository Driver evolution Initially supported Unix file system Added archival access - UniTree, HPSS Added FTP/HTTP Added database blob access Added database table interface Added Windows file system Added project archives - Dcache, Castor, ADS Added Object Ring Buffer, Datascope Adding GridFTP version 3.3 Database management evolution Postgres DB2 Oracle Informix Sybase mySQL (most difficult port - no locks, no views, limited SQL)

Examples of Extensibility The 3 fundamental APIs are C library, shell commands, Java Other access mechanisms are ported on top of these interfaces API evolution Initial access through C library, Unix shell command Added inQ Windows browser (C++ library) Added mySRB Web browser (C library and shell commands) Added Java (Jargon) Added Perl/Python load libraries (shell command) Added WSDL (Java) Added OAI-PMH, OpenDAP, DSpace digital library (Java) Added Kepler actors for dataflow access (Java) Adding GridFTP version 3.3 (C library )

Sites Using the SRB

Research Areas Characterization of data / information / knowledge Preservation architecture Knowledge management - dynamic application of preservation policies Persistent object - characterization of digital entities

Characterizing Knowledge Data - bits that comprise a digital entity Information - a semantic label that is applied to data Knowledge - relationships between semantic labels Metadata - the combination of the semantic label and the data The creation of a semantic label is driven by the application of a process / relationship Information is the result of applying knowledge relationships Information is the reification of knowledge

Knowledge Management Reify relationships to improve access performance Easier to query on metadata than to apply the original relationships Manage state information about the reification process - support for relationship changes Support levels of granularity for application of relationships - collective properties versus procedural properties Goal is to build a scalable knowledge management system

Preservation Strategies Emulation Migrate the display application onto new operating systems Equivalent to forcing use of candlelight to look at 16th century documents Transformative migration Migrate the encoding format to the new standard Migration period is expected to be 5-10 years Persistent object Characterize the encoding format Migrate the characterization forward in time

Persistent Objects Display Applications Digital Entities Characterize standard manipulation operations Characterize encoding format - data structure

Preservation Standards OAIS - Open Archival Information System Submission Information Package (SIP) Archival Information Package (AIP) Dissemination Information Package (DIP) Producer Archive Interface Abstract Methodology Standard (CCSDS Document R-1)

Containers SRB provides support for aggregation of files into a container AIP is the aggregation of both preservation context and the records into a container What is the appropriate form for a self- describing container?

Self-instantiating Archive Preservation of Digital Data with Self- Validating, Self-Instantiating Knowledge- Based Archives, B. Ludäscher, R. Marciano, R. Moore, SIGMOD Record, ACM, 30(3), pp , An archives consists of the application of archival processes to create the collection managed in the preservation environment Instantiation corresponds to the application of the archival processes to the original data

Example Web Crawl National Science Digital Library maintains registry of URLs for education material at Cornell Crawl sites Recursion to a depth of 10 redirections Restriction to pages within initial site plus one level outside site Store material on processing platform 70,000 URLs - 2 million digital entities, 200 GB On average 30 files per URL, Each file with average size 100 kBytes

Collection Requirements Provide containers for managing small files 26 million files, average size 100 kB Aggregate data in containers before storage Support web-based access to archived data Redirect web page internal HTTP links to data grid handles Support integrity Manage checksums on files

Accessioning Web Sites Use OAI harvesting to extract URLs from the NSDL repository Crawl each URL and process each digital entity Replace internal URLs with data grid logical names Aggregate digital entities into containers (files) for storage. Archives store files that are 40 MBytes in size. Generate archival context Register digital entity into a data grid Use collection hierarchy to associate web crawl properties with each file (date, site, initial URL, …) Write processed files into a storage system managed by a data grid Replicate data on Grid Bricks and archival storage system Provide OAI interface for reporting validation results

Persistent Archive Collections Build collections based on date crawled For each collection, use separate folder to hold digital entities associated with the original URL Typically 30 digital entities per URL Aggregate digital entities into containers before storage Preservation metadata maintained for each digital entity Administrative, descriptive, structural, behavioral

A Few Statistics on NSDL Content SDSC Crawl (April 03, 4 Links Deep) received correctly no data received see other forbidden file not found internal server error application error service temp. overloaded WIMS User Error Gone unused redirection w/out location total digital entities error percentage —1,530,206 —51 —5 —311 —38,386 —946 —15 —8 —1 —1,569,932 —2.53%

Encoding Formats Present in Archive CSS - Cascading Style Sheet ASP - Microsoft Active Server Page

Automated Processes: Categorizing the “Space” of all Descriptive Patterns Data-driven validation of descriptive metadata from NARA Archival Information Locator records Exhaustive examination of every metadata occurrence Automatic creation of an open-source relational database implementation Accumulation of all descriptive patterns Based on deriving “Descriptive Signatures” relying on regular expressions Creation of a Perl-based Validation Regular Expression Tool Refined regular expression to identify anomalies in the legacy metadata Annotated artifacts introduced by archival processes

A String Analysis Approach Accumulate all occurrence strings at each level of description in the hierarchy, and derive a regular expression that characterizes all instances: Record Group OR Collection (total of ~550) Series File Unit Item OR ItemAV (audio-visual) ___________________________________________ Physical Occurrence Media Occurrence Object

A String Analysis Approach Example - structural characterization: At the Series level, possible patterns are (S=Series, I=Item, O=Object, F=FileUnit): SIOSIOOOOOO SIO SFFFF SIIIIII SIOIOOOOO SFIIII An inferred regular expression is: S( F*(I+O+)* | I+ )* Relational tables are derived from these regular expressions for each of the 9 levels

Metadata Validation Analyze each regular expression to identify the classes of anomalies Cases in which a subset of the objects have a unique characterization different from the majority of the objects Identify cases with incorrect metadata tags Identify cases with missing metadata or missing objects Identify changes in metadata definitions

Regular Expressions COLLECTION (2 characterizations): *********** ="TiMtldColid(XcXs)*(Date)?(Ab)?(Tcsd(Tcsdq)?Tced(Tced q)?)?(Tisd(Tisdq)?Tied(Tiedq)?)?" = "(Odonor)?(Pdonor)*(Daut(Ndad)?)?(FatFan)?Dcgsd" RECORD GROUP (2 characterizations): ************* ="TiMtldGrno(Date)?(Tcsd)?(Tcsdq)?(Tced)?(Tcedq)?Tisd(T isdq)?Tied(Tiedq)?" = "(FatFan)*Dcgsd"

Regular Expressions SERIES (5 characterizations): ******* = "Ti(Altti)*MtldS(Grno)?(Formerrg)*(Colid)?" ="(Acnum)*(Arra)?(Chn)?(Date)?(Funcu)?(Gen)*(Numb)?(Sc ale)?(Ab)?(Tran)?(Itn)?(Staff)?(Rctno)*(Dano)*(XcXs)+" ="((Tcsd)?(Tcsdq)?Tced(Tcedq)?)?(Tisd(Tisdq)?Tied(Tiedq) ?)?" ="(Grt)*(Srt)*(OcontrOcontrtp)*(Orefer)*(Tgn)*(Lan)*(PcontrP contrtp)*(Prefer)*(Subj)*((Ars)?(Sar)*(Arsn)?)" ="(Urrs)?(Surr)?(Urrn)?((Daut)?(Ndad)?)*(Fat(Fan)?)*(MpiMp t(Mpn)?)*(Taed)?(Tst)?(CrorgCrorgtp)?(CrindCrindtp)*(Dcgs d)" --> SarSar: Decision to combine them

Regular Expressions ITEM (4 characterizations): ***** ="Ti(Altti)*Mtld(Grno)?(Formerrg)?(Colid)?(Acnum )?(Arra)?(Date)?(Gen)*(Ab)?(Staff)?(XcXs)+" = "((Tcsd)*(Tcsdq)?(Tced)*(Tcedq)?)?" ="(Tpd(Tpdq)?)*(Grt)+(Srt)*(OcontrOcontrtp)*(Oref er)*(Tgn)*(PcontrPcontrtp)*(Prefer)*(Subj)*(Ars)?(S ar)*(Arsn)?" ="Urrs(Surr)?(Urrn)?((Daut)?Ndad)*(MpiMpt(Mpn) ?)*(CrorgCrorgtp)?Dcgsd" --> SarSar: Decision to combine them --> Tcsd/Tced:

Regular Expression FILE UNIT (4 characterizations): ********** ="Ti(Altti)?Mtld(Grno)?(Formerrg)?(Colid)?(Acnum)?(A rra)?(Gen)*(Ab)?(XcXs)+" ="((Tcsd)*(Tcsdq)?(Tced)*(Tcedq)?)?(Tisd(Tisdq)?Tied( Tiedq)?)?" ="(Grt)+(Srt)*(OcontrOcontrtp)?(Orefer)*(Tgn)*(PcontrP contrtp)?(Prefer)*(Subj)*(Ars)?(Sar)*(Arsn)?" ="Urrs(Surr)?(Urrn)?((Daut)?Ndad)?(MpiMpt(Mpn)?)?(C rorgCrorgtp)?Dcgsd" --> SarSar: Decision to combine them --> Tcsd/Tced:

Lessons Learned Data-driven analysis of actual preservation metadata can be used to implement a new catalog on new technology Variant of self-instantiating archive, in which the preservation structure and catalogs are re-created

Preservation Archival processes through which a digital entity is extracted from its creation environment and migrated to a preservation environment, while maintaining authenticity and integrity information. Extraction process requires insertion of support infrastructure underneath the digital material, characterization of the authenticity and integrity, characterization of the digital encoding format, and characterization of the display operations Goal is infrastructure independence, the ability to use any commercial storage system, database, or access mechanism

For More Information Reagan W. Moore San Diego Supercomputer Center