UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

There are currently two processes for digitizing technical reports with TRAIL

The bulk of content goes to the University of Michigan for scanning by Google and inclusion in Hathitrust

There are some formats that are not sent to UMich

Reports with foldouts

Reports with Maps

Reports with other random parts

Microforms

Microfiche

Microcard

These items are scanned at the UNT Libraries in the Digital Projects Unit

The workflow

UNT receives boxes of new items from Arizona for scanning.

These arrive at the DPU and are processed by Lee Fulton and his students.

All reports come to UNT with a unique identifier assigned to them.
metadc303203

We remove the binding for the items that have been donated for destructive scanning

Items loaned that can't be cut are set aside in a different workflow

Lee and his students scan all of the pages of an item and all foldouts and oddly shaped pages

600 DPI bitonal 400 DPI grayscale 400 DPI color

All uncompressed TIFF files

They align the pagination with the sequence of files

0001.tif = Front Cover 0002.tif = Front Inside Cover 0003.tif = blank 0004.tif = blank 0005.tif = title page 0006.tif = blank 0007.tif = Page 1 0008.tif = Page 2

This is done so you can “jump to page 4, not image sequence 4”

000100fc.tif = Front Cover 000200fi.tif = Front Inside Cover tif = blank tif = blank 000500tp.tif = title page tif = blank tif = Page 1 tif = Page 2

We use a local naming convention called “magicknumbers” for this.

Which also helps with the QC of the items.

Each report is verified to have all of the pages scanned.

A descriptive metadata record using the UNTL metadata schema is created partly from the supplied MARC record from Arizona and augmented with additional information.

PrimeOCR by Prime Recognition
Each tiff image is processed with Optical Character Recognition (OCR) software. PrimeOCR by Prime Recognition

The RAW output is used so we have the coordinates of the words for highlighting later

A searchable PDF is created for each page along with the OCR output.

These pdfs are combined into a single PDF document for the report.

A finished report looks like this on the disc.

metadc303234 01_tif 000100fc.tif 000200fi.tif tif tif ... 02_pdf metadc pdf metadata.xml

Reports are ingested into the UNT Libraries Digital Infrastructure in batches

Web scale images, pdf, and metadata are added to the UNT Digital Library into the TRAIL collection

Master files are added to the Coda Repository for preservation.

Once online physical reports are inventoried and discarded once verified to be online.

Loaned reports are returned to Arizona or the loaning insitution.

UNT Digital Library

2015 TRAC Self-Audit

In 2015 the UNT Libraries completed a self-audit using the Trusted Repository Audit & Certification (TRAC) framework

Full documentation for the self-audit is available via the UNT Libraries Website
Includes Policies related to preservation, access, user feedback and usage Content and partnership agreements Detailed workflows and technical documentation

https://www. library. unt

UNT Libraries continues to value the partnership we have with TRAIL and look forward to opportunities to expand our work to provide access to these resources for users around the world.

Thank you. Mark Phillips http://digital.library.unt.edu/

UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

Similar presentations

Presentation on theme: "UNT Libraries TRAIL Processing Mark Phillips April 26, 2016"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

Similar presentations

Presentation on theme: "UNT Libraries TRAIL Processing Mark Phillips April 26, 2016"— Presentation transcript:

Similar presentations

About project

Feedback