Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

Similar presentations

Presentation on theme: "UNT Libraries TRAIL Processing Mark Phillips April 26, 2016"— Presentation transcript:

1 UNT Libraries TRAIL Processing Mark Phillips April 26, 2016

2 There are currently two processes for digitizing technical reports with TRAIL

3 The bulk of content goes to the University of Michigan for scanning by Google and inclusion in Hathitrust

4 There are some formats that are not sent to UMich

5 Reports with foldouts

6 Reports with Maps

7 Reports with other random parts

8 Microforms

9 Microfiche

10 Microcard

11 Maps

12 These items are scanned at the UNT Libraries in the Digital Projects Unit

13 The workflow

14 UNT receives boxes of new items from Arizona for scanning.

15 These arrive at the DPU and are processed by Lee Fulton and his students.

16 All reports come to UNT with a unique identifier assigned to them.

17 We remove the binding for the items that have been donated for destructive scanning

18 Items loaned that can't be cut are set aside in a different workflow

19 Lee and his students scan all of the pages of an item and all foldouts and oddly shaped pages

20 600 DPI bitonal 400 DPI grayscale 400 DPI color

21 All uncompressed TIFF files

22 They align the pagination with the sequence of files

23 0001.tif = Front Cover 0002.tif = Front Inside Cover 0003.tif = blank 0004.tif = blank 0005.tif = title page 0006.tif = blank 0007.tif = Page 1 0008.tif = Page 2

24 This is done so you can “jump to page 4, not image sequence 4”

25 000100fc.tif = Front Cover 000200fi.tif = Front Inside Cover tif = blank tif = blank 000500tp.tif = title page tif = blank tif = Page 1 tif = Page 2

26 We use a local naming convention called “magicknumbers” for this.

27 Which also helps with the QC of the items.

28 Each report is verified to have all of the pages scanned.

29 A descriptive metadata record using the UNTL metadata schema is created partly from the supplied MARC record from Arizona and augmented with additional information.

30 PrimeOCR by Prime Recognition
Each tiff image is processed with Optical Character Recognition (OCR) software. PrimeOCR by Prime Recognition

31 The RAW output is used so we have the coordinates of the words for highlighting later

32 A searchable PDF is created for each page along with the OCR output.

33 These pdfs are combined into a single PDF document for the report.

34 A finished report looks like this on the disc.

35 metadc303234 01_tif 000100fc.tif 000200fi.tif tif tif ... 02_pdf metadc pdf metadata.xml

36 Reports are ingested into the UNT Libraries Digital Infrastructure in batches



39 Web scale images, pdf, and metadata are added to the UNT Digital Library into the TRAIL collection

40 Master files are added to the Coda Repository for preservation.

41 Once online physical reports are inventoried and discarded once verified to be online.

42 Loaned reports are returned to Arizona or the loaning insitution.

43 UNT Digital Library



















62 2015 TRAC Self-Audit

63 In 2015 the UNT Libraries completed a self-audit using the Trusted Repository Audit & Certification (TRAC) framework

64 Full documentation for the self-audit is available via the UNT Libraries Website
Includes Policies related to preservation, access, user feedback and usage Content and partnership agreements Detailed workflows and technical documentation


66 https://www. library. unt

67 UNT Libraries continues to value the partnership we have with TRAIL and look forward to opportunities to expand our work to provide access to these resources for users around the world.

68 Thank you. Mark Phillips

Download ppt "UNT Libraries TRAIL Processing Mark Phillips April 26, 2016"

Similar presentations

Ads by Google