Download presentation
Presentation is loading. Please wait.
Published byWillis Garrett Modified over 9 years ago
1
The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!
2
WISER – 4 th June 2008 The Lawyers’ Vision ( non-attributable ) Google and Oxford plan to digitize 1-1.5M books as part of the Google Books Library Project The project will take at least 3 years to complete and involve approximately 35 digitization workstations running in 2 shifts Files will be created as TIFFs and JPEGs and delivered as PNG or PDFs….etc. As far as possible, both OULS and Google like to make information accessible…
3
WISER – 4 th June 2008 Why partner with Google? The synergy between missions: – Bodley’s “Republic of Letters” – Google’s “To organize the world’s information and make it universally accessible and useful” Emphasis is on access not conservation – Oxford University Library Services: opening-up our closed stacks – Google: “…the next generation of the card catalog” Bring more Oxford-held content into the digital landscape making it available for scholarly and public benefit. Builds on the work of the Oxford Digital Library (ODL)
4
WISER – 4 th June 2008 The “Digital Library” at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc. 2001 Oxford Digital Library (ODL) 2005Google/Oxford partnership 2006ORA (Oxford University Research Archive) e-prints/e-theses institutional repository 2008New LMS hybrid library service
5
WISER – 4 th June 2008 Some Oxford digitization projects Toyota City Imaging Project (1993) Specialized Research Collections in the Humanities (NFF) and eLib projects (1995-1998) – John Johnson Collection – Broadside Ballads – Early manuscripts in Oxford Oxford Digital Library (2001 onwards) – Scoping study (1998-99) – ODL Development Fund (Mellon Foundation 2002-2005) – Three production phases
20
WISER – 4 th June 2008 What to digitize? Direct discussions with Google since 2003 Mutual benefits for both parties Extensive holdings of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from most other partners in this aspect of our agreement (Michigan vs Harvard) – Decision made to begin with the 19 th century material – Scope = approximately 1+ million items
21
WISER – 4 th June 2008 Overview of workflow (1) Selection Suitable for digitization? Reshelve Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google Books index ODC
22
Overview of workflow (2) OULSGoogle Retrieve catalogue records Survey items Pick items Bibliographic Evaluation Metadata checks Digitization Quality Assurance OCR and index Receive and Reshelve items Update catalogue recordsMount in books.google.com Retrieve Oxford Digital CopyPreserve/reprocess master files
23
WISER – 4 th June 2008 Approach OULS staff work closely with Google staff – e.g. training on how to handle the material Each component of the workflow must be comfortable for both parties – Identify, survey, pick, track, reshelve, update OPAC… A large and complex logistical operation that must not compromise the service to our users – or other parts of OULS(!)
24
WISER – 4 th June 2008 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 page images – Uncorrected OCR (per page) – Report on scanning process Quality Control checks at Google (and Oxford) Deliverable images –hosted by Google in the first instance – linked to OPAC records Ongoing software/hardware developments to improve the process and outputs
25
WISER – 4 th June 2008 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users’ expectations about what we can deliver
26
WISER – 4 th June 2008 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge (Digital) preservation and access – Sun Center of Excellence – ODL DAMS
51
WISER – 4 th June 2008 Useful links http://books.google.com/ http://books.google.com/googlebooks/library.html http://www.bodley.ox.ac.uk/google/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.