Automating the Audit: Updates from the Metadata Upgrade Project at the University of Houston Libraries Andrew Weidner, Metadata Librarian Santi Thompson, Head of Digital Repository Services Annie Wu, Head of Metadata and Digitization Services CONTENTdm Southern Users Group Meeting, Columbia, SC 2014
1.UHDL Background 2.Metadata Upgrade Project Overview 3.Subject Authority App 4.Name Authority App 5.Questions Presentation Outline
UHDL Background Est CONTENTdm 6 65 digital collections 54,000+ images
Improve access of digital collection data in Primo Make our metadata interoperable with other systems Enhance discoverability of digital objects Improve display and retrieval of our digital metadata through our new UHDL interface More Robust, Reliable and Retrievable Metadata! Rationales for Metadata Upgrade
Metadata Upgrade Project Overview Metadata Upgrade Data Analysis Methodology Recommenda -tions Metadata Upgrade Strategies New Data Dictionary
Focus group interviews Fields inspection and review Benchmarking image source: Methodology – Data Collection
Standardize Data Controlled vocabularies Mapping to DC elements Collection names Integrate Data Facets in Summon Discovery tool and in new UHDL interface image source: Recommendations: Focus Groups
Correct inaccurate, incomplete, or missing data Add new metadata fields to better describe and preserve items Implement templates for titles for consistency Standardize mapping to DC elements image source: Recommendations: Field Inspection & Benchmarking
Standardize and update field labels in the digital library Standardize and update mapping used to link fields to DC elements Add new fields to the digital library Upgrade Strategies – Phase One
Utilize CDM Administration page to: o Add, delete, and edit fields o Standardize mapping to DC elements Track all work in Metadata Unit Wiki Phase One in CDM
Collection Level Metadata Upgrade: Standardize and update collection names for archival and digital content Add and/or edit collection level fields Additional activities: Link content in the digital library with the library catalog or finding aid, and vice-versa Upgrade Strategies – Phase Two
Import existing collection into Project Client Populate collection- level fields, including: o Language o Format (IMT) o Repository Phase Two in CDM
Item Level Metadata Revision: Apply templates to titles Add and/or edit item level fields o add ISO date to Date Digital o add type terms to Type DCMI field o standardize the controlled vocabulary terms to their source Upgrade Strategies – Phase Three
We intended for the final process to: Utilize automation/batch processing whenever possible Not burden the existing system or server capabilities Complement the existing digitization and digital collection publication workflow Phase Three in CDM
Limitations of existing interfaces o Refining and reconciling tools are not integrated into the project client o Must edit item-by-item, which is time consuming and resource intensive Limitations of tab-delimited process o Exporting metadata, editing tab delimited files, and re-uploading content can strain server and software o It may also impact the redesigned DL interface Limitations on staff time Phase Three Challenges
Sought feedback at CONTENTdm Users Group Meeting 2013 Decided to work from project client Developing tools that exist outside of the project client to automate key areas of the upgrade process Phase Three Solutions
AutoHotkey Automates mapping from various vocabularies to LCSH Linked Data Upgrade Tools: Subject App
Upgrade Tools: Name App Large collections with legacy metadata All names entered in LCNAF field and formatted to look like an LCNAF name App moves names to correct authority
Vocabulary
Match Name
Authorized Form
Mapping Function
Check if line belongs to the right vocabulary Mapping Function
Check if line belongs to the right vocabulary If so, store the match name… Mapping Function
Check if line belongs to the right vocabulary If so, store the match name… …and the authorized form Mapping Function
Check if line belongs to the right vocabulary If so, store the match name… …and the authorized form Add to the authorized list if it’s a positive match Mapping Function
Check if line belongs to the right vocabulary If so, store the match name… …and the authorized form Add to the authorized list if it’s a positive match Return authorized list for pasting into CDM field Mapping Function
Data Facets and Comparisons Data Manipulation Data Reconciliation image source: Upgrade Tools: Open Refine
Barriers and Challenges Aligning upgrade workflow and output with requirements and capabilities of CONTENTdm Juggling parallel process of metadata upgrade and creation
Benefits Best Practices Data interoperable Enhance retrieval of digital data New and better standard for future metadata creation Linked Data Image source:
Thompson, Santi and Annie Wu. “Metadata Overhaul: Upgrading Metadata in the University of Houston Digital Library.” Journal of Digital Media Management Vol. 2, No. 2: 137 – 147. Weidner, Andrew and Daniel Alemneh. “Workflow Tools for Digital Curation.” Code4Lib Journal Issue 20: Resources
Andrew Weidner Santi Thompson Annie Wu