Experience Talks: Post-Digitization Quality Control Strategies and Tools Mark Phillips – August 5, 2010
Repurposing NDNP content.
Or...
What we learned when we started to eat our own dog food.
We had two years of happy content delivery
Then we got to the point where we had to add it to our system.
We had a slightly different model in our system
So we had to write our own ingest tools
Which wasn't hard because we had a nice “robust” specification
But we ran into some interesting 'stuff'
Our process for reusing NDNP content.
LC accepts batch, and ships it back
We move it to the local network
And start to add additional metadata
What do we add?
Our model for the Portal to Texas History (and all of our digital collections)
Has us adding a bit more metadata for each issue.
Really it is most of the data in the title record pushed down to the issue level
We add the following fields
title.serial subject.lcsh subject.untl-bs description.content description.physical coverage.placeName coverage.era publisher creator.editor contributor.(various) language identifiers.(various)
We create a metadata template for each group of “like” issues.
Creating a new template anytime something significant changes.
Then we automate the addition of date, volume, issue, edition, pagination from the METS/MODS
I run a few scripts to check groupings for consistency.
Then I create submission packages for each issue.
Issues get added to our repository where they are treated like any other content in our repository.
We also shove the full batch into our repository as a digital object.
We sometimes catch things that were missed in previous QC
Mainly because it is a different view of the data
Looking at the same data in different ways is a very good qc tool
lists, simple graphs or timelines
Reveal subtle patterns that “could” be problems
Bring it back around Mark...
Oh yeah QC of metadata...
So one of the things we've wanted to experiment with was looking at a batch with different views of the data.
And up until recently that has required more effort than we were willing to put toward the project.
We are very interested in using the Chronicling America application as a framework for understanding these NDNP batches easier.
Creating views of batches that we could share with other awardees.
Hopefully improving quality along the way.