Overwhelmed by Large-scale Digitization Projects Xiaocan (Lucy) Wang Digital Repository Librarian Eric Holt University Archivist Cunningham Memorial Library Indiana State University
Agenda Project background Implementation Outcome Lesson learned Equipment Software choices Process Ingestion Workflow Outcome Lesson learned Conclusion
Project Background Indiana State University
Project Background ETD (electronic theses and dissertations) ETD Digital Initiative 2010 and onward Access
Project background (cont.) RTD (retrospective theses and dissertations) Number: 3,802 Where: Archives + Library basement Condition: most in usable condition, but… Access
Project Background (cont.) Purposes Centralize: ETD & RTD Improve access, search and retrieval Support teaching, learning and research Improve preservation
Project Background (cont.) Consideration Format Copyright Privacy
Equipment Bookdrive DIY
Disclosure Not currently or previously an employee of the corporations whose products I discuss I am not compensated for my comments or opinions Older software version being used
Capture New Book window
Capture in action
Batch entry
Irfanview
GIMP Open source equivalent to Photoshop Batch processing requires additional plugin Supervisor unfamiliarity
Photoshop Can record action to perform batch processing Graphical interface while setting up recorded action
Changing DPI
Color Grayscale B/W
PDF Compression All items being converted are compressed Some formats compress better than others Compression artifacts can also become visible
Original image of page is visible Searchable text layer is hidden
First Review All pages present? All text legible? No shadows covering text? Page in focus? Essential color elements retained?
PDF/a Copy saved to Archives server Only accessible to staff
Final Review and cleanup Review metadata Correct if necessary Approve and publish Remove original camera images, processed images, and extra copies of pdf
Workflow Imaging original theses or dissertations
Workflow (cont.) Processing image files
Workflow (cont.) Converting to PDF/A
Workflow (cont.) Publishing on ISU IR
Outcomes Volume finished: 848 Average volume size: 96 pages Average student time: 1.3 hours Average supervisor time: 5-10 minutes Average file size: 5.5 MB Total Disk Space: 4.6 GB Approximate cost: $15-18
Worth It? Centralize Improve access Via digital repository Search engines Digital repository registries WorldCat
Worth it? (cont.) Support teaching, learning and research Improve preservation strategies Multiple digital copies Backup Bitstream preservation Distributed preservation network via MetaArchive Cooperative
Lesson learned Control quality: Supervise students Add MARC 856 field monochrome and grayscale Supervise students Add MARC 856 field Secure continued funds
Conclusion Complex Various issues In-house vs. outsourcing Funding Technical standards Quality control Format selection In-house vs. outsourcing Metadata Delivery Preservation Rights management Workflow development
Contact info Xiaocan (Lucy) Wang Xiaocan.wang@indstate.edu Eric Holt Eric.Holt@indstate.edu