METS, MODS and PREMIS, Oh My! (and a little MIX and other schema too)

Slides:



Advertisements
Similar presentations
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
Advertisements

Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
ALA Summer 2007Habing1 METS, MODS and PREMIS, Oh My! (and a little MIX and other schema too) Integrating Digital Library Standards for Interoperability.
An Introduction to Repositories Thornton Staples Director of Community Strategy and Alliances Director of the Fedora Project.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
Standards showcase: MODS, METS, MARCXML ALA Annual 2006 Rebecca Guenther and Jackie Radebaugh Network Development and MARC Standards Office Library of.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
TIPR: Repository Exchange Package Use Cases and Best Practices Joseph Pawletko and Priscilla Caplan IS&T Archiving 2011.
Digital Preservation - Its all about the metadata right? “Metadata and Digital Preservation: How Much Do We Really Need?” SAA 2014 Panel Saturday, August.
Interoperability and Preservation with the Hub and Spoke (HandS) Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign.
Interoperability and Preservation with the Hub and Spoke (HandS) Tom Habing, Bill Ingram, Robert Manaster University of Illinois Urbana-Champaign
Fedora 3.0 and METS: A Partnership for the Organization, Presentation and Preservation of Digital Objects Open Repositories Georgia Tech, Atlanta,
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
1 The Universal Object Format - A METS Profile for an archiving and exchange format for digital objects.
Jenn Riley Metadata Librarian Indiana University Digital Library Program.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
HUB AND SPOKE TOOL SUITE PREMIS Implementation Fair – 7 October 2009 Bill Ingram Visiting Research Programmer University of Illinois at Urbana-Champaign.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
Habing1 Integrating PREMIS and METS PREMIS Tutorial Implementers’ Panel June 21, 2007, 9:00-5:30 Library of Congress, Jefferson Building, Whittall.
OCLC Online Computer Library Center Preservation Metadata Standards PREMIS & METS Taylor Surface, OCLC.
PREMIS Implementation Fair – SF 2009 PREMIS use in Rosetta Yair Brama – Ex Libris.
ETD2006 Preserving ETDs With D.A.I.T.S.S. FLORIDA CENTER FOR LIBRARY AUTOMATION FC LA PAPER AUTHORS: Chuck Thomas Priscilla.
IMPLEMENTATION ISSUES. How PREMIS can be used  For systems in development as a basis for metadata definition  For existing repositories as a checklist.
VITAL at the National Library of Wales Glen Robson
OAIS: From Requirements to Reality at OCLC FLICC / CENDI Symposium, Dec Pam Kircher Product Manager, Digital Archive OCLC Digital & Preservation.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
How to Implement an Institutional Repository: Part II A NASIG 2006 Pre-Conference May 4, 2006 Technical Issues.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
Interoperability and Collection of Preservation Metadata for Digital Repository Content Matt Cordial, Tom Habing, Bill Ingram, Robert Manaster University.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
The NLW Digital Asset Management System Paul Bevan DAMS Implementation Manager
Digital Preservation Panel Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on PREMIS Kyle Rimkus, Preservation.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Florida Digital Archive PREMIS and DAITSS. Florida Digital Archive.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
Arwen Hutt & Bradley D. Westbrook Metadata Analysis and Specification Unit UCSD Libraries For PREMIS Workshop La Jolla, CA, 11 Feb 2008.
Lifecycle Metadata for Digital Objects The Final Curtain December 4, 2006.
Repository-specific Spoke Scripts Content Repository JSR-170/283 Content Repository for Java Technology API Normalized H&S METS Files METS Import/ExportMETS.
Joint Meeting of CSUL Committees,
FLORIDA CENTER FOR LIBRARY AUTOMATION
DAITSS: Dark Archive in the Sunshine State
Introduction to Metadata
VI-SEEM Data Repository
Introduction to DSpace
Integrating PREMIS and METS
Implementing an Institutional Repository: Part II
METS, MODS and PREMIS, Oh My! (and a little MIX and other schema too)
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Medusa at the University of Illinois
Robin Dale RLG OAIS Functionality Robin Dale RLG
Implementing an Institutional Repository: Part II
How to Implement an Institutional Repository: Part II
Presentation transcript:

METS, MODS and PREMIS, Oh My! (and a little MIX and other schema too) Integrating Digital Library Standards for Interoperability and Preservation Thomas Habing, thabing@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign ALA Summer 2007 Habing

Presentation Outline Brief Background on our Project Hub and Spoke METS Profile MODS for descriptive metadata PREMIS for technical and provenance metadata MIX (plus some others) for media-specific technical metadata Technical Implementation in Java Future Plans ALA Summer 2007 Habing

NDIIPP ECHODEP1 http://ndiipp.uiuc.edu/ Quick Project Background NDIIPP ECHODEP1 http://ndiipp.uiuc.edu/ Repository Evaluation Tools development Web harvesting and archiving (OCLC’s WAW) ** Hub and Spoke interoperability and preservation architecture ** Preservation Research preserving the authenticity and semantic meaning of digital resources through time. 1Exploring Collaborations to Harness Objects in a Digital Environment for Preservation ALA Summer 2007 Habing

Hub and Spoke Repository Interoperability Architecture with a forward-looking emphasis on preservation metadata and activities ALA Summer 2007 Habing

The Problem Plethora of repositories Overabundance of data sources Not just across institutions, but even with a single institution Overabundance of data sources Web crawlers like Heritrix or OCLC's WAW, digitization and scanning services, individual authors, batch ingest from legacy systems Current integration solutions are local and ad hoc Enforcing centralized preservation policy difficult H&S for Interoperability ------------------------ There are currently many different digital repositories in widespread use, such as DSpace, Greenstone, Fedora, Eprints, contentDM, also including OCLC's and CDL's digital archive services, to name just a few.  There are also many different sources of input into these systems, such as web crawlers like Heretrix or OCLC's Web Archivist Workbench, or numerous digitization and scanning services.  It also is not uncommon for several of these systems to be in use within a single institution, and if multiple institutions within a consortia want to share data it is very likely that multiple of these system will come into play.  In short, the problem being addressed by the H&S is that almost none of these systems can inter-operate beyond a rudimentary level, usually nothing more than an OAI data provider supplying simple Dublin Core metadata.  Even if they have embraced some of the OAIS concepts (which few have) such as submission or dissemination information packages (SIPs and DIPs), their implementation of these concepts vary greatly.  For example, a DIP produced by DSpace has nothing in common with a SIP which could be used by Eprints.  Because of these problems, achieving any level of interoperability between these systems usually entails some level of custom software development, and any time a new repository is thrown into the mix some of that software development will need to be redone. The H&S interoperability system seeks to address these issues with the development of a common METS-based profile, a standard programming API, and a series of scripts that use that API and METS profile for creating SIPs and DIPs which can be used across different repositories. H&S for Preservation -------------------- This problem is based on the same premise as the interoperability statement.  There are many different repositories.  Few of these repositories have any explicit support for preservation, either preservation metadata, such as PREMIS, or activities to support preservation, such as format migrations or checksum validations.  For an institution with several of these systems deployed, which is common, something as simple as performing consistent backups to off-line storage like tape can be complicated by the fact that these system all store the underlying data differently.  There may be data stored in relation databases, XML databases, RDF triple stores, and different file systems all of which must be backed up, usually using various different backup techniques. It can be this complex even for a single system, no less for a combination of such systems. The H&S preservation system will leverage the work done on interoperability.  The METS profile will be treated as a common archival information package (AIP) with support for the PREMIS preservation metadata schema, and the programming API will be enhanced to support common preservation activities, such a generation and validation of technical metadata as well as provenance metadata, format migrations, among others.  The system will also included mechanism for persisting AIPs most likely using the JSR-170 standard for content repositories. This will provide a common preservation data store for all digital packages regardless of which other repository they might have been created in or which other repository is used for search and browse by end users.  In other words, the H&S repository will be used for preservation activities which can benefit greatly from a common system, but other activities related to the digital objects, such as creation or access can still occur in whatever repository is best suited for that activity. ALA Summer 2007 Habing

A Solution A common METS-based profile A standard programming API A series of scripts that use the API and METS profile for creating Information Packages which can be ‘used’ across different repositories ALA Summer 2007 Habing

ALA Summer 2007 Habing

To-Hub Spoke Hub Data Store / DIPs ALA Summer 2007 Habing image.jpg Generate/collect provenance metadata Data Store / DIPs Hub Extract format-specific technical metadata Generate/collect digital provenance metadata Embed links to digital items image.jpg Model structure of the item Embed native metadata Transform/enrich native metadata metadata.xml ALA Summer 2007 Habing

From-Hub Spoke Hub SIPs ALA Summer 2007 Habing metadata.xml Generate provenance metadata SIPs Hub Transform hub metadata to repository-compatible metadata Assemble into packages for repository ingest Add the METS file as an item in the submission package metadata.xml hubMets.xml ALA Summer 2007 Habing

METS Profile The METS Profile is the ‘Hub’ Two Registered Profiles http://www.loc.gov/standards/mets/profiles/00000015.xml http://www.loc.gov/standards/mets/profiles/00000016.xml Also http://dli.grainger.uiuc.edu/echodep/METS/ May be overlaid on top of, or inherited from, other profiles Primary Focus of Profiles Digital preservation Repository interoperability minimally at the technical and descriptive metadata level, not at the structural level or file format level Web captures Focus on preservation, not access agnostic regarding file formats or structures ALA Summer 2007 Habing

METS Profile in More Detail Descriptive Metadata Primary DMD is MODS Alternate DMD are encouraged Provenance for DMD is required Technical Metadata PREMIS object entities MIX for images Other metadata for other media types Digital Provenance PREMIS events and agents ALA Summer 2007 Habing

Simple Object Example http://gita.grainger.uiuc.edu/metsviz/grapher.htm http://dli.grainger.uiuc.edu/echodep/METS/junit/p1a1.xml ALA Summer 2007 Habing

Descriptive Metadata for the Entire Package MODS as the primary descriptive metadata The Aquifer MODS profile is used as the minimal requirement (see presentation by Sarah Shreeves) Other descriptive metadata schema should be preserved as alternative dmdSec’s Transformations of descriptive metadata must be recorded in digiprovMD sections using PREMIS event and agent elements Individual files may have their own dmdSec’s; these are considered outside the scope of our profile. However we encourage the use of relatedItem’s in the primary MODS for this purpose. ALA Summer 2007 Habing

Technical Metadata for Files A techMD section wrapping a PREMIS object element is required for each file or bit-stream Minimal required elements: fixity, size, formatDesignation creatingApplication and software are encouraged especially for MIME types starting with ‘application/…’ ALA Summer 2007 Habing

Technical Metadata for Files Alternative technical metadata schemas for different media types are encouraged: MIX for images http://www.loc.gov/standards/mix/mix.xsd textMD for text http://dlib.nyu.edu/METS/textmd.xsd AUDIOMD for audio http://lcweb2.loc.gov/mets/Schemas/AMD.xsd VIDEOMD for video http://lcweb2.loc.gov/mets/Schemas/VMD.xsd Where possible we are using JHOVE to derive all of these; the profile also allows raw JHOVE output to be used in techMD (http://hul.harvard.edu/jhove/) ALA Summer 2007 Habing

Technical Metadata for Representations Technical metadata can also be associated with representations There is a special required techMD called the ‘primary representation’ that corresponds to the entire METS file. Used mostly for alternate identifiers for the file, but may also be used to record other technical metadata about the whole METS document Each structural map may also have representation technical metadata. ALA Summer 2007 Habing

Digital Provenance Recorded for all non-trivial changes to: Descriptive Metadata (must) Creation, Transformation, Modification, Deletion Files and Bitstreams (should) Events from PREMIS data dictionary Structural Maps (may) PREMIS event and optional associated agents are wrapped in a digiprovMD ALA Summer 2007 Habing

Using PREMIS in METS All linking via ID & IDREF-type attributes not identifier elements Embedding Object in techMD Event in digiprovMD Rights in rightsMD Agent in digiprovMD or rightsMD All Files at a Composition level of 0 No packaging, compression, or encryption ALA Summer 2007 Habing

Profile for Web Captures Inherits almost everything from base profile Adds rules for the primary structural map Adds rules for referencing ARC files and their constituents from the fileSec ARC is used by Internet Archive, Heritrix web crawler, and OCLC’s WAW http://www.archive.org/web/researcher/ArcFileFormat.php ALA Summer 2007 Habing

Challenges in Developing the Profile How to deal with overlaps between the various schema Properties that occur in multiple places METS attributes, PREMIS elements, MODS elements, MIX elements Differences in how to tie sections together ID and IDREFS or embedded identifiers or nested XML elements What METS sections in which to embed the various PREMIS entities ALA Summer 2007 Habing

Java Implementation Partially complete and in-work Open source Base-level API, plus support for DSpace and to lesser degree Fedora Open source Javadocs: http://dli.grainger.uiuc.edu/echodep/HnS/JavaDocs/ Source Code http://sourceforge.net/projects/echodep ALA Summer 2007 Habing

Technical Architecture (Java) ALA Summer 2007 Habing

Future Plans Add support for other repositories such as CONTENTdm EPrints Develop additional sub-profiles Transformations/Adaptations to/form other METS profiles Continue to improve the documentation and program code ALA Summer 2007 Habing

Questions? ALA Summer 2007 Habing