PIALA 2010 UH Manoa Hamilton Library Chronicling America and the National Digital Newspaper Program: Technical Aspects  Part 1: Newspapers and Microfilm.

Slides:



Advertisements
Similar presentations
Preservation of the Texas Agricultural Experiment Station Bulletin in the Digital Repository By Dr. Rob McGeachin Texas A&M University Libraries June,
Advertisements

E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
Putting together a METS profile. Questions to ask when setting down the METS path Should you design your own profile? Should you use someone elses off.
Business Development Suit Presented by Thomas Mathews.
Capacity Building Passing on the Experience Dr. Noha Adly World Digital Library Arab Peninsula Regional Group meeting.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
METS: An Introduction Structuring Digital Content.
The UM Libraries’ Frost Concert Archive Documenting the Performance History of the University of Miami Frost School of Music Amy Strickland University.
Services Digitisation & Content Management. 600 People – India.
OCLC Online Computer Library Center Microfilmed Newspapers: Selection for Digitization Success ALA June 25, 2006 OCLC Preservation Service Centers.
Illinois Newspapers: Anna FitzSimmons, Amy Sullivan, Tracy Nectoux, Nathan Yarasavage Preparing Our Past for the Future.
Special collections and digital libraries: a new role for consortia? Dale Flecker Harvard University Library.
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
These ain’t “Old News”! Creating access to historic newspapers Christine Guenther OCLC Product Manager, Digital Services Preservation Service Centers Bethlehem,
Newspaper Preservation through Collaboration and Communication The Texas Digital Newspaper Program By Ana Krahmer & Mark Phillips University of North Texas.
Joachim Bauer Senior System Engineer, CCS
Mark Evans, Tessella Digital Preservation Boot Camp – PASIG meeting, Washington DC, 22 nd May 2013 PREMIS Practical Strategies For Preservation Metadata.
Providing Online Access to the HKUST University Archives: EAD to INNOPAC Sintra Tsang and K.T. Lam The Hong Kong University of Science and Technology 7th.
The Academy of Motion Picture Arts & Sciences Building an Interim Digital Preservation System Nancy Silver Digital Archival Program Manager Science and.
Constructing the Memories Creating a Digital Collection Linda J. White, Digital Project Coordinator.
THE RUTGERS WORKFLOW MANAGEMENT SYSTEM Mary Beth Weber Cataloging and Metadata Services Rutgers University Libraries August 3, 2007.
OLC Spring Chapter Conferences Metadata, Schmetadata … Tell Me Why I Should Care? OLC Spring Chapter Conferences, 2004 Margaret.
Use of METS in CDL Digital Special Collections Brian Tingle.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
1 Newspaper Digitisation Workflows Rose Holley- Manager ANDP Presentation to Cultural Heritage Digitisation professionals 26 November 2008.
The Voice of A Community Chinese Times Digitization Project Ian Song Prepared for the Multicultural Canada Conference
Guest Lecture LIS 656, Spring 2011 Kathryn Lybarger.
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Port Townsend Leader Historical Newspaper Archive Keith Darrock.
Guide to Using Message Maker Robert Snelick National Institute of Standards & Technology (NIST) December 2005
By Carrie Moran. To examine the Metadata Object Description Schema (MODS) metadata scheme to determine its utility based on structure, interoperability.
{ Building Open Access To Our Heritage Andrew Weidner Project Coordinator, New Mexico Historical Newspapers University of North Texas Libraries: Digital.
EAD: A Technical Introduction Julie Hardesty, Metadata Analyst June 3, 2014.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Erin Kinney, Wyoming State Library. Motivation #1 priority that came out of 2004 statewide digitization meeting WSL received many reference questions,
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Julie Hannaford, Meryl Greene, Kristian Galberg,
Metadata Considerations Implementing Administrative and Descriptive Metadata for your digital images 1.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Metadata: Essential Standards for Management of Digital Libraries ALI Digital Library Workshop Linda Cantara, Metadata Librarian Indiana University, Bloomington.
Quality Review of Digital Newspapers: Lessons Learned at UNT National Digital Newspaper Program Awardee Conference September 27, 2012.
Integrating a Statewide Web Gateway With Digital Collections ______________________ Eric Weig and Beth Kraemer University of Kentucky and KCVL.
Robin L. Dale Director of Digital & Preservation Services LYRASIS Getting Started with the Digital Commonwealth.
Digitization An Introduction to Digitization Projects and to Using the Montana Memory Project.
DLOC Toolkit Mark V. Sullivan Solutions Developer, University of Florida Libraries Programmer and Trainer, Digital Library of the Caribbean
1 UNOG Library Digitization and Microform Unit (DMU) – December 2009.
Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.
1 Digital Preservation Testbed Database Preservation Issues Remco Verdegem Bern, 9 April 2003.
Introduction to metadata
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
Sobek for Curators and Collection Managers Training Two: Submitting and Editing Resource Files and Metadata Mark Sullivan November 2013 University of Florida.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
5. Applying metadata standards: Application profiles Metadata Standards and Applications Workshop.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
EAD 101: An Introduction to Encoded Archival Description XML and the Encoded Archival Description: Providing Access to Collections Oregon Library Association.
Chapter Three Presentation: User interface How to Build a Digital Library Ian H. Witten and David Bainbridge.
Digitizing Historical Newspapers South Carolina Digital Newspaper Program's participation with the Library of Congress' Chronicling America: Historic American.
WORKING WITH VENDORS: THE UCONN “DAILY CAMPUS” STUDENT NEWSPAPER DIGITAL REFORMATTING CASE STUDY DIGITAL COMMONWEALTH ANNUAL CONFERENCE MAY 1, 2013 DEVENS.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
Michigan Digital Newspaper Project Contributing 100 thousand pages in Chronicling America
7th Annual Hong Kong Innovative Users Group Meeting
Bentley Project Reel Digitization Bentley Historical Library t
DIGITAL ARCHIVES Into the Light
RESEARCH TOPICS Web-Interface Performance DTD Extensibility Imaging
Presentation transcript:

PIALA 2010 UH Manoa Hamilton Library Chronicling America and the National Digital Newspaper Program: Technical Aspects  Part 1: Newspapers and Microfilm  Challenges  USNP  Part 2: Technical Details  Image views  Text searching  Indexing  Part 3: Managing a newspaper digitization project

PIALA 2010 UH Manoa Hamilton Library Challenges  Newspapers are a difficult medium  Never meant to last, made for daily use and disposal  Pages crumble and acid corrodes the materials  Tracking serial publications over time  Patron demand increased, storage space grew scarce, binding costs rose

PIALA 2010 UH Manoa Hamilton Library Microfilm  Adopted in the 1920s as a standard  Turns newspaper from a storage nightmare to a relatively easy medium to handle  Libraries had to decide what to do with the hardcopy  Keep in holdings?  Deaccession?

PIALA 2010 UH Manoa Hamilton Library United States Newspaper Program (USNP) Began in 1982  Funded by National Endowment for the Humanities, managed by the Library of Congress  University of Hawai’i with Hawaiian Historical Society, Hawai’i State Archives and State Library contributed for Hawai’i  In mid-2000s: the USNP had received over $54 million in NEH support & non-federal contributions of approx $19.6 million  Bibliographic records for over 140,000 newspaper titles; access to 70 million pages of newsprint in microfilm

PIALA 2010 UH Manoa Hamilton Library USNP  Goal: Locate, catalog, and microfilm newspapers  Hawai’i microfilmed 260,000 pages and cataloged 476 titles  Program ended in 2007

PIALA 2010 UH Manoa Hamilton Library USNP Preservation Microfilming Guidelines  Optimum legibility Image orientation & reduction ratios to fill frame & obtain greatest degree of legibility in public use copies  Quality Each roll of first generation film shall be inspected frame-by-frame by both the filming agency and the project for density and resolution and to determine that the film is free of emulsion scratches, abrasions, fingerprints, spots, fog, and other defects

PIALA 2010 UH Manoa Hamilton Library USNP Preservation Microfilming Guidelines  Density No less than five readings at start, middle & end of each reel with a transmission densitometer calibrated daily Maximum (Dmax) density measurements taken on exposed image with no words or graphics Background densities no lower than.80 & no higher than 1.20, lower densities preferred for older pages & to facilitate production of reader-printer & enlargement prints. Base-plus-fog density (Dmin) on the master negative shall not exceed.10

PIALA 2010 UH Manoa Hamilton Library National Endowment for the Humanities and Library of Congress created NDNP  No single US collection of newspapers  Every institution focusing on particular themes relating to their collecting plans  Thousands of volumes of newspapers spread across the country  Enhance access to newspapers, building on the foundation of the United States Newspaper Program

PIALA 2010 UH Manoa Hamilton Library NDNP Overview  2-Year awards to state projects, renewable  Digitize 100,000 pages of microfilmed newspaper  Newspapers picked must be from between 1836 to 1922  Historical essays on each newspaper  Collation and Quality Control on all papers

PIALA 2010 UH Manoa Hamilton Library NDNP Goals  20-year span with phased, sustainable development of 30 million page database  Establish technical conversion specs & practices for efficient basic discovery & access  Develop production tools to ensure good digital objects that can be managed & preserved long-term  Provide public access to and take preservation responsibility for the digitized newspapers  Create a national resource of historically significant newspapers from all the states and U.S. territories

PIALA 2010 UH Manoa Hamilton Library NDNP Microfilm-related Challenges  Where are the master reels?  Copyright issues (Who filmed the newspapers and owns the master microfilm)  Technical specifications (Poorly filmed, low density readings, etc)  Microfilm standards applied vary widely

PIALA 2010 UH Manoa Hamilton Library No universally accepted metadata standard for historical newspapers  Online historical newspapers produced by public or private sector existed as discrete systems, metadata structures not designed for interoperability  Titles, issues, pages and reels all need to be represented as different yet related classes of objects

PIALA 2010 UH Manoa Hamilton Library NDNP Digital Deliverables  Images scanned at dpi Three formats:  grayscale, uncompressed Tiff 6.0 Images  Compressed JPEG2000 images  PDF Image with hidden text  Accompanying structural and technical metadata  OCR text for all pages

PIALA 2010 UH Manoa Hamilton Library NDNP Scanning specifications  De-skew images with a skew of greater than 3 degrees  Crop to visible edge of page  Capture grayscale preservation microfilm targets

PIALA 2010 UH Manoa Hamilton Library NDNP OCR specifications  Conform to ALTO XML schema ALTO (Analyzed Layout and Text Object) is a XML (Extensible Markup Language) Schema that details technical metadata for describing the layout and content of physical text resources  Bounding box coordinate data Each column is sectioned and coordinates are used to place words

PIALA 2010 UH Manoa Hamilton Library NDNP Metadata requirements  METS (Metadata Encoding and Transmission Standard) format records preservation metadata  Structural metadata to relate pages to title, date, and edition; sequence pages within issue or section; and to identify image and OCR files  Technical metadata to support the functions of the Library of Congress repository (Metadata is Information about Information)

PIALA 2010 UH Manoa Hamilton Library XML Rules  Single, unique root element  Matching open/close tags  Consistent capitalization  Correctly nested elements (no overlapping elements)  Attribute values enclosed in quotes  No repeating attributes in an element  Provides international, vendor independent standard for describing information

PIALA 2010 UH Manoa Hamilton Library Family of XML data standards includes:  METS – Metadata Encoding and Transmission Standard  MODS – Metadata Object Description Schema  PREMIS – PREservation Metadata Implementation Strategies  EAD – Encoded Archival Description

PIALA 2010 UH Manoa Hamilton Library METS (Metadata Encoding and Transmission Standard)  XML Schema for the purpose of creating XML files that define: the hierarchical structure of digital library objects (images, text files, etc.) the names and locations of the files the associated metadata (e.g., MODS)

PIALA 2010 UH Manoa Hamilton Library Metadata Object Description Schema (MODS) An XML Schema designed for expressing bibliographic data (Think of it as an alternative to the MARC format)

PIALA 2010 UH Manoa Hamilton Library Sections of a METS file - METS header (document talks about itself) - Descriptive metadata (MODS, etc.) - Administrative metadata (copyright info., etc.) - File section (names and locations of files) - Structural map (relationships of the parts) - Linking information - Binding executables/actions to object

PIALA 2010 UH Manoa Hamilton Library Title METS  Combines bibliographic and holdings data in a single title record, converted from MARC to MARC XML format  Titles digitized will have additional data descriptive essays, more precise geographic coverage data which is put in a Metadata Object Description Schema (MODS) object within the larger METS document

PIALA 2010 UH Manoa Hamilton Library Issue and Reel METS  Issue METS Issue Data Page Data  Reel METS Reel Data Target Data

PIALA 2010 UH Manoa Hamilton Library WHY?  XML structure used by software for creation of multiple outputs: HTML/XHTML for Web display; PDF for printing  Ease of editing (single records or batches of records)  Ability to validate data  Ease of data management and publishing  Interoperability Repository submission and OAI harvesting

PIALA 2010 UH Manoa Hamilton Library  Geographic metadata  Title metadata  Date metadata All that coding pays off for the user when SEARCHING

PIALA 2010 UH Manoa Hamilton Library Keyword searching  OCR/OWR does not yield article “transcriptions”; text OCR’d from images of newspapers is used for searching purposes  Several options ANY of the words, ALL of the words EXACT PHRASE Proximity search – Look for words within 5, 10, 50 or 100 words of one another

PIALA 2010 UH Manoa Hamilton Library Page thumbnail view  Click on thumbnail or description of page to view larger version

PIALA 2010 UH Manoa Hamilton Library Page view  Different format can be selected with one click

PIALA 2010 UH Manoa Hamilton Library Browse Issues  A calendar view indicating which issues have been digitized  Can change which year you’re viewing  Browse First Pages

PIALA 2010 UH Manoa Hamilton Library From Microfilm to Digital Images Managing a Newspaper Conversion Project Project Management

PIALA 2010 UH Manoa Hamilton Library NDNP & University of Hawai’i  UH first grant began in July 2008, running until June 2010  Grant renewed: July 2010-June 2012  Utilizing the microfilm created under the USNP  Excellent quality microfilm (in theory)  Fewer problems with cataloging/description, acquiring 2N duplicates (in theory)

PIALA 2010 UH Manoa Hamilton Library Project Management  Request for Proposals (RFP) Include all LC technical specifications  Position Description(s) Coordinator, students  Hiring and Training

PIALA 2010 UH Manoa Hamilton Library Project components  Microfilm identification and duplication  Digitization  Metadata creation & Validation

PIALA 2010 UH Manoa Hamilton Library Microfilm selection  Choose what is important to your institution(s) if possible  Copyright Reels created by or for your institution Reels by Proquest, etc, you may have to ask for permission and pay much higher duplication fees  Decide Complete runs of few titles, or many short/incomplete runs of a lot of titles

PIALA 2010 UH Manoa Hamilton Library Vendors  iArchives Leaders in the field Lots of experience  OCLC/BSLW (Backstage Library Works)  Apex/Covantage  Northern Micrographics (NMT)  Local or national microfilm duplication companies

PIALA 2010 UH Manoa Hamilton Library Equipment  GB External Hard Drives (Western Digital MyBooks) and Pelican cases  1 PC with double monitor  Software: Library of Congress’ Digital Validator and Viewer (DVV)  Densitometer  Microfilm reader/scanner

PIALA 2010 UH Manoa Hamilton Library Our Stuff Densitometer Pelican Cases Microfilm scanner PC with 2 monitors & portable HDs (red)

PIALA 2010 UH Manoa Hamilton Library Staffing  Project Coordinator Quality Control Technician  Graduate students  Advisory Board  Subject/history/newspaper specialists

PIALA 2010 UH Manoa Hamilton Library Metadata Collection  Density readings  Recorded onto a spreadsheet

PIALA 2010 UH Manoa Hamilton Library Preparing the Microfilm: Metadata Data from, OCLC MARC record & local holdings

PIALA 2010 UH Manoa Hamilton Library Preparing the Microfilm: Collation  Review use copy of reel Missing issues or pages Duplicate issues or pages Mutilated pages Other abnormalities (E.g. pages out of order, incorrect dates)

PIALA 2010 UH Manoa Hamilton Library Preparing the Microfilm: Collation Review use copy, record data on spreadsheet

Film Scanning Customer Deliverables Workflow Manager DB Page/Reel Metadata Page/Reel Metadata Shared Storage (NAS) Split, De-Skew, Crop Split, De-Skew, Crop Post Process Post Process OCR Framework OCR Framework Image Metadata Image Metadata Image Processing Image Processing KEY: ■ Automatic process [image processing, OCR, …] ■ Manual process [image + page metadata] ■ Quality Control QC QC QC QC QC Automated Processing Cloud QC iArchives Digitization Workflow

Scan QC

Split, Crop & DeSkew

2,000,000 Word Dictionary 2,000,000 Name Dictionary 3 Leading OCR Software Programs OWR iArchives OWR Framework

apple (99%) epple (73%) opple (88%) OCR Engine 1 (dictionary choice) OCR Engine 2 OCR Engine 3 apple Text image word (predicted accuracy) How does OWR ™ work?

PIALA 2010 UH Manoa Hamilton Library Post-vendor validation  Once the hard drive returned, we verify/validate the batch using the DVV program  Verification compares the metadata listed in the master XML file to the metadata found in the issue XML files for correctness  Validation is done if a new master XML file needs to be created. It creates checksums for each file and records them in the subsequent metadata  Copy contents of hard drive onto our server

PIALA 2010 UH Manoa Hamilton Library Quality Control  Image quality Too dark? Too light? Skewed?  Correct image? Compare digitized image to microfilmed image No Missing Issue/Page tags  Review metadata Dates LCCN # Locations

PIALA 2010 UH Manoa Hamilton Library Thumbnail View can use DVV or any graphics program

PIALA 2010 UH Manoa Hamilton Library Quality Control LC Digital Viewer and Validator (DVV)

PIALA 2010 UH Manoa Hamilton Library Metadata Viewer

PIALA 2010 UH Manoa Hamilton Library OCR

PIALA 2010 UH Manoa Hamilton Library Headers

PIALA 2010 UH Manoa Hamilton Library Title Essays words  Describes newspaper’s history Date of establishment Editors Type of news reported Political viewpoint Where is the paper today?  Published to Chronicling America

PIALA 2010 UH Manoa Hamilton Library Links  Chronicling America:  Library of Congress:  National Endowment for the Humanities:  Hawai’i Newspapers: a union list  Using and to Create XML Standards-based Digital Library Applications ts-mods-morgan-ala07/ ts-mods-morgan-ala07/

PIALA 2010 UH Manoa Hamilton Library Thank You! Mahalo! Kinisou Chapur!  Questions? Comments?  us at: ♦ ♦