Download presentation
Presentation is loading. Please wait.
Published byArnold Henry Modified over 9 years ago
1
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010
2
Overview University of Michigan and University of California worked together to develop ingest processes for Internet Archive content IA materials did not match previously developed standards for HathiTrust materials Solutions were developed to transform IA materials into HathiTrust-compatible AIPs Discuss our use of PREMIS events to document processes and transformations
3
HathiTrust Overview Launched in 2008 by CIC and University of California system libraries to archive and share digital collections Partnership is open to institutions worldwide Currently: Nearly 30 partners 6.6 million digital volumes 1.3 million public domain 247 terabytes
4
Internet Archive capture1 capture 2008-08-04T19:50:13 Initial capture of item AgentID Internet Archive Executor tool scribe7.la.archive.org image capture
5
UM fixity check1 fixity check 2010-04-27T16:34:02 Calculation of md5 hash values for downloaded IA files, comparison with pre-download md5 values warning files failed checksum validation arcanacaelestiah03swed_files.xml arcanacaelestiah03swed_meta.xml ….
6
… AgentID UM Executor tool md5sum software
7
UM package inspection1 package inspection 2010-04-27T16:34:01 Inspection of IA download package for missing files pass AgentID UM Executor tool ingest_ia_volumes.pl software
8
UM mod1_image_header image header modification 2010-04-27T16:34:29 Image header modification to HathiTrust conventions AgentID UM Executor tool ingest_ia_volumes.pl software …
9
tool exiftool software
10
UM mod2_file_rename file rename 2010-04-27T16:34:03 File renaming to HathiTrust conventions AgentID UM Executor tool ingest_ia_volumes.pl software
11
UM mod3_ocr_split ocr split 2010-04-27T16:34:05 Splitting of IA XML OCR into one plain text OCR file and one XML file (with coordinates) per page AgentID UM Executor tool ingest_ia_volumes.pl software
12
UM mod4_ia_mets_creation ia mets creation 2010-04-27T16:34:30 Creation of IA METS file AgentID UM Executor tool ingest_ia_volumes.pl software
13
UM message digest calculation1 message digest calculation 2010-04-27T16:34:30 Calculation of page-level md5 checksums AgentID UM Executor tool md5sum software
14
UM validation1 validation 2010-04-27T16:34:30 IA METS validation AgentID UM Executor tool Xerces-C software
15
identifier uc2.ark:/13960/t2p55qw6d 1 file count 1584 page count 528
16
UM transformation1 transformation 2010-04-27T16:34:30 Transformation of files for ingest: mod1-mod4 in IA METS AgentID UM Executor tool ingest_ia_volumes.pl software
17
UM page feature mapping1 page feature mapping 2010-04-27T16:35:48 Map original page feature tags to HathiTrust AgentID UM Executor tool GROOVE software
18
UM fixity check1 fixity check 2010-04-27T16:34:39 Validation page-level md5 checksums pass AgentID UM Executor tool md5sum software
19
UM ingestion1 ingestion 2010-04-27T16:35:48 Ingestion of object package into repository AgentID UM Executor tool GROOVE software
20
UM validation1 validation 2010-04-27T16:35:18 Validation of object components AgentID UM Executor tool GROOVE software tool jhove1.5 software
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.