Download presentation
Presentation is loading. Please wait.
Published byScott Miller Modified over 6 years ago
1
UNT Libraries TRAIL Processing Mark Phillips April 26, 2016
2
There are currently two processes for digitizing technical reports with TRAIL
3
The bulk of content goes to the University of Michigan for scanning by Google and inclusion in Hathitrust
4
There are some formats that are not sent to UMich
5
Reports with foldouts
6
Reports with Maps
7
Reports with other random parts
8
Microforms
9
Microfiche
10
Microcard
11
Maps
12
These items are scanned at the UNT Libraries in the Digital Projects Unit
13
The workflow
14
UNT receives boxes of new items from Arizona for scanning.
15
These arrive at the DPU and are processed by Lee Fulton and his students.
16
All reports come to UNT with a unique identifier assigned to them.
metadc303203
17
We remove the binding for the items that have been donated for destructive scanning
18
Items loaned that can't be cut are set aside in a different workflow
19
Lee and his students scan all of the pages of an item and all foldouts and oddly shaped pages
20
600 DPI bitonal 400 DPI grayscale 400 DPI color
21
All uncompressed TIFF files
22
They align the pagination with the sequence of files
23
0001.tif = Front Cover 0002.tif = Front Inside Cover 0003.tif = blank 0004.tif = blank 0005.tif = title page 0006.tif = blank 0007.tif = Page 1 0008.tif = Page 2
24
This is done so you can “jump to page 4, not image sequence 4”
25
000100fc.tif = Front Cover 000200fi.tif = Front Inside Cover tif = blank tif = blank 000500tp.tif = title page tif = blank tif = Page 1 tif = Page 2
26
We use a local naming convention called “magicknumbers” for this.
27
Which also helps with the QC of the items.
28
Each report is verified to have all of the pages scanned.
29
A descriptive metadata record using the UNTL metadata schema is created partly from the supplied MARC record from Arizona and augmented with additional information.
30
PrimeOCR by Prime Recognition
Each tiff image is processed with Optical Character Recognition (OCR) software. PrimeOCR by Prime Recognition
31
The RAW output is used so we have the coordinates of the words for highlighting later
32
A searchable PDF is created for each page along with the OCR output.
33
These pdfs are combined into a single PDF document for the report.
34
A finished report looks like this on the disc.
35
metadc303234 01_tif 000100fc.tif 000200fi.tif tif tif ... 02_pdf metadc pdf metadata.xml
36
Reports are ingested into the UNT Libraries Digital Infrastructure in batches
39
Web scale images, pdf, and metadata are added to the UNT Digital Library into the TRAIL collection
40
Master files are added to the Coda Repository for preservation.
41
Once online physical reports are inventoried and discarded once verified to be online.
42
Loaned reports are returned to Arizona or the loaning insitution.
43
UNT Digital Library
62
2015 TRAC Self-Audit
63
In 2015 the UNT Libraries completed a self-audit using the Trusted Repository Audit & Certification (TRAC) framework
64
Full documentation for the self-audit is available via the UNT Libraries Website
Includes Policies related to preservation, access, user feedback and usage Content and partnership agreements Detailed workflows and technical documentation
66
https://www. library. unt
67
UNT Libraries continues to value the partnership we have with TRAIL and look forward to opportunities to expand our work to provide access to these resources for users around the world.
68
Thank you. Mark Phillips http://digital.library.unt.edu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.