HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
Data Publishing Service Indiana University Stacy Kowalczyk April 9, 2010.
1 Metadata Tools for JISC Digitisation Projects of still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton.
HathiTrust Digital Library: Enrich Your Research and Scholarship Doreen Bradley Chris Powell University Library May 2011.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
MacKenzie Smith Associate Director for Technology MIT Libraries.
PREMIS: To Be or Not To Be in My METS The Preservation Journey at the University of Connecticut Libraries ALA Annual 2013 ALCTS PARS Intellectual Access.
The future’s so bright…. DAITSS DIGITAL PRESERVATION SYSTEM: RE-ARCHITECTED, RE- WRITTEN, AND OPEN SOURCE Priscilla Caplan Florida Center for Library Automation.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
PREMIS Conformance. Agenda 1.NLNZ and NLB conformance exercise 2.History of PREMIS Conformance 3.Current status 4.Mapping to functionality.
October 24, 2006Merit Technical Staff Meeting1 The Google Project at the University of Michigan Perry Willett Head, Digital Library Production Service.
E-Live The UM-Google Digitization Deal What it is, how we got there, and what it will mean for the UM.
1 Large-scale collaborative digitisation 19 th Century Pamphlets Online Mar-2007 – Feb-2009 Grant Young Project Manager, 19 th Century.
Authentication of the Federal Register Charley Barth Director, Office of the Federal Register United States Government.
Digital archival storage for the University of Michigan Library collections.
Workflows for Digital Curation and Preservation Stacy Kowalczyk PASIG Dublin 2012 October 17, 2012.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
ISO & OAI-PMH By Neal Harmeyer, Amy Hatfield, and Brandon Beatty PURDUE UNIVERSITY RESEARCH REPOSITORY.
R.Jantz, August 31, Two-day forum on PREMIS Preservation Metadata and the Trusted Digital Repositories August 31, September 1 National Library of.
Archivematica-Islandora Integration Module Evelyn McLellan
PREMIS What is PREMIS? o Preservation Metadata Implementation Strategies When is PREMIS use? o PREMIS is used for “repository design, evaluation, and archived.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
AIP Archival Information Package – Defines how digital objects and its associated metadata are packaged using XML based files. METS (binding file) MODS.
Active Data Curation in Libraries: Issues and Challenges ASEE ELD Presentation June 27, 2011 William H. Mischo & Mary C. Schlembach.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The British Library’s METS Experience The Cost of METS Carl Wilson
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Glen Robson Head of Systems Unit National Library of Wales
PREMIS and the National Digital Newspaper Program Justin Littman Office of Strategic Initiatives, LC
Kat Hagedorn University of Michigan Library Migrating 1st-Generation Digital Texts from Local Collections to HathiTrust SAA 2015 | August 21 10am | Session.
Government Printing Office The mission of GPO is to produce, preserve, and distribute the official publications and information products of the Federal.
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan, Florida Center for Library Automation DCC Workshop on Long-term Curation within Digital Repositories.
NCSU Libraries 27 March 2006 Digital Preservation in State Government – Wilmington, NC North Carolina Geospatial Data Archiving Project Workflow, Tools,
PREMIS Rathachai Chawuthai Information Management CSIM / AIT.
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
Implementor’s Panel: BL’s eJournal Archiving solution using METS, MODS and PREMIS Markus Enders, British Library DC2008, Berlin.
The FCLA Digital Archive Joint Meeting of CSUL Committees, 2005.
Digital preservation activities at the NLW Sally McInnes 18 September 2009.
The Statistics New Zealand Prototype PREMIS creation tool Euan Cochrane PREMIS Fair October 2009
The New DRS Introduction. What is DRS? Digital repository for preservation and access – Maintains integrity of deposited content – Preserves content for.
PREMIS at the British Library Markus Enders, The British Library PREMIS Implementation Fair, San Fransisco, CA 07 October 2009.
The UM-Google Digitization Deal What it is, how we got there, and what it will mean for the UM.
DAITSS: Dark Archive in the Sunshine State Priscilla Caplan Florida Center for Library Automation (FCLA)
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
Digital Preservation Panel Medusa at the University of Illinois at Urbana-Champaign: A Digital Preservation Service Based on PREMIS Kyle Rimkus, Preservation.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
HathiTrust: Possibilities Metadata Working Group Cornell University Library March 21, 2014.
Florida Digital Archive PREMIS and DAITSS. Florida Digital Archive.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
NLW. Object Classes Class 1  1 MARC Record  1 Image  No METS Class 2  1 MARC Record  Many images  No METS Class 3  1 MARC Record  Many.
Data Wrangling: Developing Local Best Practice for Born Digital Metadata Tracy Popp, Digital Preservation Coordinator Ayla Stein, Metadata Librarian University.
Joint Meeting of CSUL Committees,
DAITSS: Dark Archive in the Sunshine State
Jim Tuttle North Carolina State University Libraries
DAITSS and the Florida Digital Archive
Bentley Project Reel Digitization Bentley Historical Library t
Building the Universal Library: Introducing HathiTrust
digital archival storage
The Bentley Digital Media Library
Presentation transcript:

HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010

Overview University of Michigan and University of California worked together to develop ingest processes for Internet Archive content IA materials did not match previously developed standards for HathiTrust materials Solutions were developed to transform IA materials into HathiTrust-compatible AIPs Discuss our use of PREMIS events to document processes and transformations

HathiTrust Overview Launched in 2008 by CIC and University of California system libraries to archive and share digital collections Partnership is open to institutions worldwide Currently: Nearly 30 partners 6.6 million digital volumes 1.3 million public domain 247 terabytes

Internet Archive capture1 capture T19:50:13 Initial capture of item AgentID Internet Archive Executor tool scribe7.la.archive.org image capture

UM fixity check1 fixity check T16:34:02 Calculation of md5 hash values for downloaded IA files, comparison with pre-download md5 values warning files failed checksum validation arcanacaelestiah03swed_files.xml arcanacaelestiah03swed_meta.xml ….

… AgentID UM Executor tool md5sum software

UM package inspection1 package inspection T16:34:01 Inspection of IA download package for missing files pass AgentID UM Executor tool ingest_ia_volumes.pl software

UM mod1_image_header image header modification T16:34:29 Image header modification to HathiTrust conventions AgentID UM Executor tool ingest_ia_volumes.pl software …

tool exiftool software

UM mod2_file_rename file rename T16:34:03 File renaming to HathiTrust conventions AgentID UM Executor tool ingest_ia_volumes.pl software

UM mod3_ocr_split ocr split T16:34:05 Splitting of IA XML OCR into one plain text OCR file and one XML file (with coordinates) per page AgentID UM Executor tool ingest_ia_volumes.pl software

UM mod4_ia_mets_creation ia mets creation T16:34:30 Creation of IA METS file AgentID UM Executor tool ingest_ia_volumes.pl software

UM message digest calculation1 message digest calculation T16:34:30 Calculation of page-level md5 checksums AgentID UM Executor tool md5sum software

UM validation1 validation T16:34:30 IA METS validation AgentID UM Executor tool Xerces-C software

identifier uc2.ark:/13960/t2p55qw6d 1 file count 1584 page count 528

UM transformation1 transformation T16:34:30 Transformation of files for ingest: mod1-mod4 in IA METS AgentID UM Executor tool ingest_ia_volumes.pl software

UM page feature mapping1 page feature mapping T16:35:48 Map original page feature tags to HathiTrust AgentID UM Executor tool GROOVE software

UM fixity check1 fixity check T16:34:39 Validation page-level md5 checksums pass AgentID UM Executor tool md5sum software

UM ingestion1 ingestion T16:35:48 Ingestion of object package into repository AgentID UM Executor tool GROOVE software

UM validation1 validation T16:35:18 Validation of object components AgentID UM Executor tool GROOVE software tool jhove1.5 software