Digital Text and Libraries Michael Popham
DOI Meeting, Oxford, June 2006 Ranganathans laws of library science 1. Books are for use 2. Every reader his book 3. Every book its reader 4. Save the time of the reader 5. A library is a growing organism (Ranganathan, 1931)
DOI Meeting, Oxford, June 2006 Libraries and digital texts …as purchasers of digital texts – from publishers, aggregator services …as producers of digital texts – digitized from analogue originals, analogue surrogates …as custodians of digital texts – purchased and licensed material – institutional repositories, digital assets created in-house – acquired e-MSS and personal digital collections
DOI Meeting, Oxford, June 2006 Libraries and digital texts – the challenges …as purchasers of digital texts – we have to work with what were sold/whats available …as producers of digital texts – we have to work with what weve got …as custodians of digital texts – we have to work with what were given
DOI Meeting, Oxford, June 2006 Thomas Bodleys Vision Bodleian founded 1602 Universal library Bodleys Republic of Letters Legal deposit privilege since % of Bodleian readers not members of Oxford University
DOI Meeting, Oxford, June 2006 Bodleian Library 400 staff Budget of £14m (20.5m) Stock 8 million items 45,000 registered users 120 Miles (192km) of shelving 123,000 monograph items and 194,000 serial items added each year
DOI Meeting, Oxford, June 2006 Oxford University Library Services > 660 staff (600 fte) 40 libraries, including the Bodleian Budget > £25m (37m) Total bookstock:11 million items 156 miles (250km) of shelving, including repository space
DOI Meeting, Oxford, June 2006 The Digital Library at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc Oxford Digital Library (ODL) 2005ELISO (Electronic Library & Information Service) Google/Oxford partnership
DOI Meeting, Oxford, June 2006
An affecting and sublime! scene, or, : The great captain going to head his armies
DOI Meeting, Oxford, June 2006 Oxford-Google Project: what to digitize? Direct discussions with Google since 2003 Win/win situation for both parties Extensive collection of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from other partners in this aspect of our agreement – Decision made to begin with the 19 th century material – Looking at approximately 1+ million items
DOI Meeting, Oxford, June 2006 Overview of workflow Selection Suitable for digitization? Reshelve Fast-track Slow-track Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google.print index
DOI Meeting, Oxford, June 2006 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 (probably), and TIFFs – Uncorrected OCR Audit of production process There are quality control processes at Google & Oxford Deliverable images (to be hosted by Google in the first instance) linked to OPAC records Ongoing software/hardware developments to improve the process
DOI Meeting, Oxford, June 2006 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users expectations about what we can deliver
DOI Meeting, Oxford, June 2006 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge Preservation and access
DOI Meeting, Oxford, June 2006 Useful links