Bringing Order to Chaos: Preparation and Organization for Long- Term Access University of Alabama Libraries Jody L. DeRidder Image courtesy of Life Magazine
Software Changes... File Formats change... Question: What would it take to reconstruct YOUR digital library in another software system, from scratch? Athley, Jake “Understanding the Digital Asset Life Cycle.” Widen Enterprises. ONLINE IS NOT ENOUGH!!...and sometimes, we run out of money!
Where's the TIFF? No tiff?? Reference to archival file missing in OAI exports Not valid XML!
Again, Where's the TIFF? Page-level metadata AND reference to archival file missing in CONTENTdm XML exports ALSO. … Tab-Delimited text export is your only hope of reconstruction. No tiff??
Identification…?? 32 different file naming schemes, each with anomalies that did not fit the collection’s own pattern 10 possible fields in which to find an identifier: Many metadata files had NO identifiers or ones which did NOT match the filename Sometimes CONTENTdm changed the archival filename on upload…
File storage is a lot like a basement closet... Image courtesy of Teemo, Master of Clowning Image courtesy of Life Magazine What happens when it's time to move???
Bringing Order to Chaos 1) Identification 2) Consistency 3) Organization 4) Documentation University of Alabama Libraries Holder ID: u0003 Collection ID: Item ID: Sequence ID: 0005 Archival File: u0003_ _ _0005.tif
u0003_ _ is the first digitized item in the MSS 1980 collection HOLDER ID COLLECTION ID
(Unambiguous) Identification u0003_ _ _0004.tif …depends on US!!! (not the software) Tuscaloosa Service Men's Center Scrapbook, MSS 1604, William Stanley Hoole Special Collections Library, University of Alabama.
Consistency : Hugh Davis Farm Journals Voyages dans l’Amerique Septentrionale Jesse Griffin Letter, 1813 September Nehemiah Denton papers, F.H. Petrie Letters, : George S Smith Diary Confederate Imprints Sheet music S. R. Norton Letters, : S. D. Cabaniss Papers Joe Wheeler Josiah and Amelia Gorgas Family Papers : Roland Harper Railroad Timetables Central Iron and Coal Daphne Cunningham Diary Eugene Allen Smith
collection linking
CONSISTENCY! In merging collections, you discover all the different metadata variations you have… Item Identifier Filename Identifier Title Other Title Cover Title First Line of Text First Line of Chorus Masthead Title Series Title Special Issue Title from Plate Subject(s) Description Biographic and Historical Note Scope and Content Transcript URL Provenance Funding Information Abstract Creator(s) Arranger(s) Author(s) Composer(s) Conductor(s) Diariest(s) Etcher(s) Instrumentalist(s) Interviewee(s) Lyricist(s) Photographer(s) Sender(s) Vocalist(s) Work(s) Publisher Digital Publisher Donor(s) Funder(s) Contributor(s) Editor(s) Interviewer(s) Performer(s) Recipient(s) Date(s) Date of Photograph Performance Date Date ISO Type(s) Genre(s) Format Album Number Bibliographic Citation Box Number Call Number Collection Number Container Number Folder Number Plate Number Photograph Number Source Language(s) Relation Published In Digital Collection Repository Repository Collections Is Referenced By Mode of Access Coverage Location Performance Location Place of Publication Recipient Location Sender Location States Served Rights Terms Audience Sorting Number Staff Notes Transcript Object File Name
Item Identifier:item:TEXT:SMALL:BLANK:BLANK:NOSEARCH:HIDE:NOVOCAB:BLANK Filename:filena:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:VOCAB:identi Identifier:identi:TEXT:SMALL:BLANK:BLANK:SEARCH:HIDE:VOCAB:identi Title:title:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:title Other Title:other:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea Cover Title:cover:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea First Line of Text:first:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea First Line of Chorus:firsta:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea Masthead Title:masthe:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea Series Title:series:TEXT:SMALL:BLANK:BLANK:SEARCH:NOHIDE:NOVOCAB:titlea imagesupp Collection directory in /contentdbs index etc config.txt Configure it once... Then copy the config file to the other directories. cp coll1/index/etc/config.txt coll2/index/etc/config.txt
Capturing ALL the metadata on EVERY level for preservation<mods> 6th grade class picture Ebsco Industries Funder Funder Still Image Photographs early 1900s 1 photograph : gelatin developing-out paper, black and white ; 5 x 7 in. on mount 5 x 7 in. Jeff Coleman with his 6th grade classmates at Seth Mellew elementary school The digitization of this collection was funded by a gift from EBSCO Industries. u0001_ _ United States--Alabama--Sumter County—Livingston Coleman, Jefferson Jackson Seth Mellew Elementary School Archivists Utility translates spreadsheet rows to MODS xml
mods Organization starts with the working area! Before… And after!
working area A Collection Folder in the Working Area Collection folders are named for the collection identifier. Allowed subfolders include: Admin Metadata Scans Transcripts Compound objects have their own subfolders for pages, named for the item.
Consistency and organization are cost-saving....and they let you AUTOMATE your work.
An Example of the Lowest- Cost Model: The Alabama Digital Preservation Network Lots of Copies Keeps Stuff Safe!!
storage area Simple, Clear Hierarchical Organization: Holder ID Collection ID Item ID Sequence ID
u0003 slide Identification, Organization and Consistency Each segment of numbers: Holder ID Collection ID Item ID Sequence ID is used in the directory structure.
file org pattern storage area Automated file storage and creation of LOCKSS Manifests: … a VERY good thing! Organization and Consistency Pay Off
DOCUMENTATION
Documentation is a wonderful thing… it helps your digital content survive … well into the future.
How do you know if your file has been altered? Can you verify that this is the unchanged original? (it’s not that hard) Tuscaloosa Service Men's Center Scrapbook, MSS 1604, William Stanley Hoole Special Collections Library, University of Alabama.
Get a CONTENTdm Standard XML Export
California Digital Library 7Train Software
CDL METS Descriptive Metadata is in the dmdSec
California Digital Library 7train on CONTENTdm Standard XML Export… NO Item-level information beyond the title… but LOOK! You get the OCR!
File System LIVE Links …for web delivery NOT intended for preservation. What good is this in 50 years??
/contentdbs/{coll}/index/description/desc.all /contentdbs/{coll}/supp/{dmrecord number} /contentdbs/{coll}/image/ Matching it all up!! Identification is a wonderful thing. Where’s my JPEG? Where’s my metadata? (then look up the parent dmrecord number in desc.all)
Holder ID: u Collection ID: Item ID: Sequence ID: Sub-Page: File: u0003_ _ _0002_004.tif Metadata and Documentation stored at the applicable level METS documents how files relate to one another in a hierarchical structure… which we already have!!!
Dropping the Technical Metadata in… where it belongs Makes METS creation a Piece of Cake! (and redundant!)
Output → XML Output →
MIX: Metadata for Images in XML
AudioMD: Audio Technical Metadata
Don’t forget to add the namespace at the top! xmlns:mix= xmlns:audioMD=“ METS has 5 sections: Descriptive Metadata section: dmdSec Administrative Metadata section: amdSec File Group section: fileSec Structural Map: structMap Behavior: behaviorSec So where does this technical information GO?? <mets:file ID="FID1" MIMETYPE="image/tiff" SEQ="1" CREATED=" T00:00:00“ ADMID=" MIX1" GROUPID="GID1"> Put it here! Refer to it here !
What’s confusing about this? Simple, Clear, Low Cost, Scalable. That’s a good thing.
convert foo.tiff -strip -density 96 -resample 96x96 -resize 2048x2048 -filter Cubic foo_2048.jpg Bringing Content Up to the Level Of the WEB!!! Greater Usability and Access == Longer Life (it’s free!)
ACCESS! Via Acumen (also free!)
Bringing Order to Chaos 1) Identification 2) Consistency 3) Organization 4) Documentation University of Alabama Libraries Holder ID: u0003 Collection ID: Item ID: Sequence ID: 0005 Archival File: u0003_ _ _0005.tif Jody L. DeRidder jodyderidder.com