METS Case Study: The NYU Digital Library Team METS Opening Day 27 October, 2003 Leslie Myrick
Projects at NYU using METS EAD Finding Aid Project Tokyo Tribunal Proceedings Afghanistan Digital Library CRL Political Web Archiving Project DRAM * Hemispheric Institute * REPO History Sign Project *
WHY METS? (1) METS was formulated to serve as a: Submission Information Package Archival Information Package Dissemination Information Package
Why METS? (2) In other words, it’s a … Transfer Syntax Archival Syntax Functional Syntax
METS and Complex Digital Objects Finding aid + images with multiple scans/versions Page turner for photo albums, documents, books – Edisto Album, Tokyo Tribunal brief, Afghanistan Digital Library Multimedia/Time-Based Media Navigators: Hemispheric Institute; SMIL Viewer Web Site Navigator – CRL Political Communications Web Archiving Project
Using METS as a SIP Berol Collection Finding Aid -- in negotiations with RLG Cultural Materials Project METS will be bundled with objects; EAD
METS as a Functional Syntax METS designed not only for transfer and archival management, but for giving access to, navigating an object METS + XSLT can create dynamic interfaces with links to resources and their metadata METS can be dumped into Oracle, indexed and searched using context-aware queries.
METS Plays Well With Others We have … EAD Finding Aids pointing to METS METS pointing to Finding Aids and marcxml records METS pointing to and manipulating TEI
METS and Extensions at NYU MODS and DC for descriptive MIX for Images/technical textMD for text/technical LC A/V Prototype + smptetechMD + AES Missing Links: overall Preservation Schema plugin (PREMIS); rights MD schema
Ingredients (so far) Perl MySQL and some Oracle Tomcat Servlets and jsp Saxon and XT XSLT
Tools for Creation zeroDB Database Input via interface as well as batch loading of metadata extracted by scripts e.g. ImageMagick identify, arcscraper.pl Outputs METS using Perl DBI
Tools for Dissemination Page-turners Multimedia Viewers Thumbnail Browsers
Typical METS Creation Workflow ImageMagick extraction of image metadata Database input (batch and manual entry) of descriptive and technical metadata Generation of METS using Perl DBI against MySQL
Image Magick Verbose Dump Image: taqw_001s.jpg Format: JPEG (Joint Photographic Experts Group JFIF format) Geometry: 625x886 Class: DirectClass Type: true color Depth: 8 bits-per-pixel component Colors: Profile-color: 552 bytes Profile-iptc: 5636 bytes unknown: êëÿ Resolution: 100x100 pixels/inch Filesize: 210kb Interlace: None Background Color: white Border Color: #dfdfdf Matte Color: grey74 Iterations: 0 Compression: JPEG signature: 8c37d0b82374d8eaa6b4d6b062699a9b8d7d86f2ba1d4e320f d Tainted: False
Image Magick non-Verbose Dump taqw-fr001.tif TIFF 6500x6817 DirectClass 8- bit 126mb 4.3u 0:06 taqw-fr001s.jpg[1] JPEG 625x886 DirectClass 8-bit 191kb 0.0u 0:01 taqw-fr001t.jpg[2] JPEG 100x142 DirectClass 8-bit 9954b 0.0u 0:01
Extracting METS from a DB doWebArchive.cgi MODS for homepage; DC for pages MIX for images/technical textMD for web page/technical
METS for Discovery Dump METS files into Oracle as CLOB Create Oracle Intermedia index – XML-aware full-text search Example: CRL political web archiving project
CRL Political Web Archive Collaboration between Stanford, Cornell, Texas, NYU, IA under aegis of CRL, Mellon Sub-Saharan Africa, South East Asia, Latin America, Western Europe Testbed: 400 URLs; websites from radical groups, NGOs Internet Archive.arc files
.arc file 100 MB aggregate of harvested files, along with HTTP headers and crawler- generated header for each file Fine as a simple SIP, but basically unmanageable as an AIP or DIP At present accessed using byte offsets to grab content from aggregate file Only searchable by URL (Wayback Machine)
Automated extraction of text-based metadata e.g. web pages arcscraper.pl – Descriptive and technical MD for object datscraper.pl – Checksums, titles – Links from each object makeLinkTable.pl – Creates link to object relationships
Go to Videotape
The Future? Persistent Identifiers Preservation Metadata Schema Java development Move from Oracle to Cheshire II