US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.

Slides:



Advertisements
Similar presentations
U.S. Government Printing Office Packaging and Metadata PREMIS Implementers Panel Library of Congress June 13, 2007.
Advertisements

October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
METS In order to reconstruct the archive, we will need to understand the METS files. METS is schema that provides a flexible mechanism for encoding descriptive,
Dr Gordon Russell, Napier University Unit Data Dictionary 1 Data Dictionary Unit 5.3.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Funded by: © AHDS Sherpa DP – a Technical Architecture for a Disaggregated Preservation Service Mark Hedges Arts and Humanities Data Service King’s College.
Andrea Fojtu Charles University in Prague, National Library of the CR.
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
XML Parsing Using Java APIs AIP Independence project Fall 2010.
3. Technical and administrative metadata standards Metadata Standards and Applications.
PREMIS What is PREMIS? – Preservation Metadata Implementation Strategies When is PREMIS use? – PREMIS is used for “repository design, evaluation, and archived.
SOAPI: a flexible toolkit for implementing ingest and preservation workflows Mark Hedges Centre for e-Research, King’s College London Arts and Humanities.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
METS What is METS ? What is METS ? A schema that provides a flexible mechanism for encoding descriptive, administrative, and structural metadata for a.
DigiTool METS Profile DigiTool Version 3.0. DigiTool METS Profile 2 What is METS? A Digital Library Federation initiative built upon the work of MOA2.
US GPO AIP Independence Test CS 496A – Senior Design Fall 2010 Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong.
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
WMS: Democratizing Data
Metadata: use of METS with Fedora Marie Lagerwall Technical Officer Centre for Learning Technology London School of Economics and.
Descriptive Metadata o When will mods.xml be used by METS (aip.xml) ?  METS will use the mods.xml to encode descriptive metadata. Information that describes,
The Repository Bridge project Sally Mcinnes, NLW.
Incompatible or Interoperable? A METS bridge for a small gap between two digital preservation software packages Lucas Mak Metadata & CatalogLibrarian
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
1 The Universal Object Format - A METS Profile for an archiving and exchange format for digital objects.
PREMIS Implementation at The Royal Library of Denmark by Eld Zierau.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
U.S. Government Printing Office FDsys Update Spring Depository Library Council April 16, 2007.
Government Printing Office The mission of GPO is to produce, preserve, and distribute the official publications and information products of the Federal.
GPO’s Federal Digital System August 17, 2010 U.S. Government Printing Office.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
Archival Information Packages for NASA HDF-EOS Data R. Duerr, Kent Yang, Azhar Sikander.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
VITAL at the National Library of Wales Glen Robson
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
“Interchange You Can Believe In” PREMIS in TIPR. TIPR – a partnership between FCLA, Cornell and NYU Generously funded by the IMLS Goals: –Demonstrate.
GPO’s Future Digital System (FDsys) November 2, 2006 LS&CM CENDI Presentation.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
The OAIS Reference Model and Trustworthy Repositories Josh Lubell Manufacturing Engineering Laboratory NIST
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Meeting of the Member States Expert Group on Digitisation and Digital Preservation , Luxembourg European Archival Records and Knowledge Preservation.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Applying preservation metadata to repositories The British Library, 21 January 2008 Led by Steve Hitchcock With Bill Hubbard, Gareth Johnson.
Joint Meeting of CSUL Committees,
US GPO AIP Independence Test
Building A Repository for Digital Objects
DAITSS: Dark Archive in the Sunshine State
Statewide Digitization and the FCLA Digital Archive
Better than it was Finding what works for processing born-digital archives at the Bentley Historical Library Mike Shallcross U-M Bentley Historical Library.
Integrating PREMIS and METS
Presentation transcript:

US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ Abbott GPO contact: Kate Zwaard

Overview  Background US GPO FDsys Project objectives A note on deliverables  File formats (AIP) METS, MODS, and PREMIS  Hardware interface  XML parsing  Solution Strategy  Repositories  Testing  Conclusion

US GPO  The United States Government Printing Office (GPO) is in charge of producing and archiving documents for every branch of the federal government.  “The U.S Government Printing Office (GPO) provides publishing & dissemination services for the official & authentic government publications to Congress, Federal agencies, Federal depository libraries, & the American public.” (

FDsys  GPO is developing the Federal Digital System, a new content management system (CMS) designed to manage all of its digital data.  “The U.S. Government Printing Office’s (GPO) Future Digital System (FDsys) will ingest, authenticate, preserve and provide access to digital content from all three branches of the U.S. Government. FDsys, which is in public beta testing, is intended to preserve digital content free from dependence on specific hardware or software.” (project description)

Project Objectives  “The objective of this project is to test whether the AIPs in FDsys are truly independent of the surrounding content management system. The CSULA team aims to either confirm or reject the claim that, with help from resources commonly available to the digital curation community, an interested party could fully reconstruct the archive using only the content data.”

Project Objectives  “GPO will supply a set of content data from its archival storage. This data will include content files, metadata files (in XML according to the standards referenced above), and METS binding files (in XML) that describe how all of the objects are related. The CSULA team will inspect the information and, using the METS standard, determine whether the information in XML is sufficient for a user to make sense of the data and ingest it to another repository. Because the data is stored in arbitrary folders, scripts would have to be written to assemble the content packages from the locations specified in the METS file.”

Project Objectives  This project simulates FDsys breaking down due to some catastrophic attack or error.  We are attempting to categorize and reconstruct an amount of sample data from FDsys outside the context of the actual CMS. The only references we have available, other than the actual files in the archive, are publicly defined standards.  It is our hope that this project will help GPO improve the robustness of their file system.

A Note on Deliverables  This is not a typical computer science design project because our aim is not to design software. Instead, we will be conducting scripted tests on real data and forming conclusions based on the results.  Deliverables will most likely include: a written report of our findings and recommendations a reorganized version of the input data

AIP  Archival Information Package Defines how digital objects and its associated metadata are packaged using XML based files.  METS (binding file)  MODS  PREMIS

METS  Schema  XML file format  Seven major sections

METS Schema  5 Major Sections 1) METS Header METS Header 2) Descriptive Metadata Descriptive Metadata 3) Administrative Metadata Administrative Metadata 4) File Section File Section 5) Structural Map Structural Map

MODS  MODS file will be used to encode descriptive metadata.  A MODS file can be used as an extension schema to METS.  MODS consist of top-elements elements that are mandatory, recommended or optional.

MODS

PREMIS  PREMIS file will be used to encode preservation metadata.  Preservation metadata consists of the following: Provenance Authenticity Preservation activity Technical environment Rights management

PREMIS  PREMIS data model includes of the following: Intellectual Entity Object Entity Event Entity Agent Entity Rights Entity*  Object, Event, and Agent Entities are described using mandatory and optional elements.

PREMIS

Hardware Interface  PC computer  External hard drive

XML Parsing  As described above, all metadata is in the form of XML files. Hence, using code to read XML files is integral to the project.  We plan to use the Java programming language for our scripting needs. Java API for XML Processing (JAXP): the standard Java library for handling XML It provides several different possible representations for XML

Solution Strategy  Data submitted to us are AIPs, not SIPs. Repository software cannot ingest AIPs, only SIPs. We must write scripts that parse the AIPs in such a way to construct SIPs from the the arbitrary file structure, then ingest those SIPs with a repository software to create to new AIPs.

Repositories  We have also looked into third-party repository software to help parse and organize data. DSpace, Fedora Commons, EPrints  Unfortunately, so far none of them seem ideal for the task.

Testing  After parsing and organizing the data, it will be important to perform checks to ensure that the reconstruction is accurate. We may send a preliminary report to GPO for verification.  The exact testing procedure is still undefined, as we haven’t had a chance to investigate the data in depth yet. Our goals should be clearer once we understand exactly what type of data we are dealing with.

Conclusion  Our thanks to Kate, Dr. Abbott, and Dr. Pamula for their support.