The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!
FIBS 25/01/07 The Lawyers’ Vision ( non-attributable ) Google and Oxford plan to digitize 1-1.5M books as part of the Google Books Library Project The project will take at least 3 years to complete and involve approximately 35 digitization workstations running in 2 shifts Files will be created as TIFFs and JPEGs and delivered as PNG or PDFs….etc. Fortunately both Oxford and Google like to make information accessible…
FIBS 25/01/07 Thomas Bodley’s Vision Bodleian founded 1602 Universal library Bodley’s “Republic of Letters” Legal deposit privilege since % of Bodleian readers not members of Oxford University
FIBS 25/01/07 Oxford University Library Services > 660 staff (600 fte) 40 libraries, including the Bodleian Budget > £25m (€37m) Total bookstock:11 million items 156 miles (250km) of shelving, including repository space
FIBS 25/01/07 The “Digital Library” at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc Oxford Digital Library (ODL) 2005ELISO ( Electronic Library & Information Service ) Google/Oxford partnership 2006FENIX e-prints/e-theses institutional repository
FIBS 25/01/07 Some Oxford digitization projects Toyota City Imaging Project (1993) Specialized Research Collections in the Humanities (NFF) and eLib projects ( ) – John Johnson Collection – Broadside Ballads – Early manuscripts in Oxford Oxford Digital Library (2001 onwards) – Scoping study ( ) – ODL Development Fund (Mellon Foundation ) – Three production phases
FIBS 25/01/07 Why partner with Google? The synergy between missions: – Bodley’s “Republic of Letters” – Google’s “To organize the world’s information and make it universally accessible and useful” Emphasis is on access not conservation – Oxford University Library Services: opening-up our closed stacks – Google: “…the next generation of the card catalog” Bring more Oxford-held content into the digital landscape making it available for scholarly and public benefit. Builds on the work of the Oxford Digital Library (ODL)
FIBS 25/01/07 What to digitize? Direct discussions with Google since 2003 Win/win situation for both parties Extensive collection of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from other partners in this aspect of our agreement – Decision made to begin with the 19 th century material – Looking at approximately 1+ million items
FIBS 25/01/07 Overview of workflow Selection Suitable for digitization? Reshelve Fast-track Slow-track Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google.print index
FIBS 25/01/07 Approach OULS staff work closely with Google staff – e.g. training on how to handle the material Each component of the workflow must be comfortable for both parties A large and complex logistical operation that must not compromise the service to our users
FIBS 25/01/07 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 (probably), and TIFFs – Uncorrected OCR Audit of production process There are quality control processes at Google & Oxford Deliverable images (to be hosted by Google in the first instance) linked to OPAC records Ongoing software/hardware developments to improve the process
FIBS 25/01/07 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users’ expectations about what we can deliver
FIBS 25/01/07 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge Preservation and access
FIBS 25/01/07
What libraries are you working with? We're currently working with the University of Michigan, Harvard University, Stanford University, The New York Public Library, Oxford University, the Universidad Complutense de Madrid, the University of Virginia, the University of Wisconsin-Madison and the University of California to include their collections in Google Book Search and, like a card catalog, show you basic information about the books and in some cases a few snippets - sentences of your search term in context. If a book is determined to be in the public domain, we'll show you the full text of the book - that is, you can page through the book from start to finish.
FIBS 25/01/07
Useful links