The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

Slides:



Advertisements
Similar presentations
Digital Text and Libraries Michael Popham. DOI Meeting, Oxford, June 2006 Ranganathans laws of library science 1. Books are for use 2. Every reader his.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
CURRENT ISSUES Current contents Over 3,000 items open access, 42% reports and working papers, 21% journal articles, 21% conference items, 7% book chapters,
" OPEN ACCESS INITIATIVE IN ONE OF THE PALESTINIAN UNIVERSITIES: BIRZEIT UNIVERSITY" Prepared by Mrs. Diana Sayej-Naser Library Director Birzeit University.
HathiTrust Digital Library: Enrich Your Research and Scholarship Doreen Bradley Chris Powell University Library May 2011.
JISC Collections 04 September 2014 | Presentation to PRATT-SILS MA Summer School | Slide 1 JISC Collections.
DIGITAL HUMANITIES SUMMER SCHOOL 2011 DIGITAL LIBRARY TECHNOLOGIES AND BEST PRACTICE, PART 1: DECONSTRUCTING DIGITAL LIBRARIES Christine Madsen R&D Project.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
October 24, 2006Merit Technical Staff Meeting1 The Google Project at the University of Michigan Perry Willett Head, Digital Library Production Service.
1 Large-scale collaborative digitisation 19 th Century Pamphlets Online Mar-2007 – Feb-2009 Grant Young Project Manager, 19 th Century.
DRS 2 one in a series of periodic updates Harvard University Library Andrea Goethals October 21, 2009 DRS = Digital Repository Service.
Connected Histories Sources for Building British History, Funded under the JISC eContent Capital Programme for 18 months Partners:  Prof. Tim.
The Oxford Google Digitization Project Frances Boyle.
John OckerbloomDec. 6, 2002 Supporting learning at the library Towards integrating LMS and digital library technology at Penn John Mark Ockerbloom CNI.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph JaJa, Mike Smorul, Mike McGann.
JSTOR & OCR - A Case Study Kiffany Francis. What is JSTOR? “JSTOR is a not-for- profit organization with a dual mission to create and maintain a trusted.
Digital Partnerships at San Francisco Public Library: So Many Suitors, So Little Time.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
Massively Digitizing UC Library Collections Google, Microsoft, and More Learning in Retirement Libraries – The Intersection of Tradition and Innovation.
Partnership agreement between Complutense University and Google Books Manuela Palafox Parejo Servicio Edición Digital y Web Biblioteca de la Universidad.
Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries.
Digital Library Architecture and Technology
1 NEWSPLAN – The Way ahead Ed King, Head of Newspaper Collections, British Library NEWSPLAN LIEM Regional Council 2 October 2008.
Scholars Portal Project Ontario Council of University Libraries Scholars Portal in 2007 A Progress Report Leslie Weir Université d’Ottawa - University.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
SobekCM’s Community Ecosystems & Socio-Technical Practices Presented by Mark V. Sullivan June 10 th, 2014 Sobek image created by Jeff Dahl and is shared.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
The Fundamentals of Preserving Knowledge Assets Pacific Neighborhood Consortium 2010 Catherine Quinlan, Dean of the USC Libraries USC's Dual Approach.
Digitization of the Federal Depository Library Program Judith C. Russell Superintendent of Documents & Managing Director, Information Dissemination “Electronic.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Google Books, UMI and Other Intriguing Trends in Digital Publishing Joe Wible Hopkins Marine Station of Stanford University October 9, 2006.
From Concept to Reality: An overview of the University of Wisconsin Digital Collections Melissa Mclimans.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
Google Confidential Daniel Clancy Engineering Director, Google Print 18-July-05.
Digitizing Aloha: Using Information Technology to Preserve and Present the History and Culture of Hawai'i Bob Schwarzwalder Assistant University Librarian,
National and University Library Zagreb Digitisation Activities.
Library Repositories and the Documentation of Rights Leslie Johnston, University of Virginia Library NISO Workshop on Rights Expression May 19, 2005.
ILS Futures. Background Changes 95/96 to 06/07 –Stacks circ= 179,996 to 160,970 ILS’s are no longer the center of the library universe. To most users,
1 Annual Meeting 2004 CrossRef Publishers International Linking Association, Inc Charles Hotel, Cambridge, MA November 9 th, 2004.
Digitising Special Collections Public-Private Partnerships at the KB and abroad Marieke van Delft, KB, Keeper of Early Printed Collections / Project Leader.
UNIZULU INSTITUTIONAL REPOSITORY GATEWAY TO LOCAL CONTENT.
Enterprise Content Management
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
1 The Oxford-Google mass-digitisation project: How, why and what? An EDUCAUSE Webcast by Reg Carr (University of Oxford) 15 June 2005.
National Library of the Czech Republic as End-User of the Research Networks Adolf Knoll deputy director
The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!
Collecting History: Profiles in Science Alexa T. McCray National Library of Medicine Bethesda, MD Stanford University August 21, 1999.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
April 2, For Today Review of Google Research Techniques and Software to me (10 points) Google Scholar Google Books Google Search.
Discovering Value : Discovery Services and ERM Systems Together Nancy Fleck Michigan State University Ted Fons Innovative Interfaces.
WISER: Workshops in Information Skills and Electronic Resources Katherine Melling OULS Subject Consultant (French & Italian)
DAEDALUS - An ePrints Case Study William J Nixon Service Development Susan Ashworth Advocacy.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
HathiTrust: A valuable and visionary Partnership.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
Access for user self- sufficiency: making rich local content intuitively available Catalog Transformed: From Traditional to Emerging Models of Use Program.
HathiTrust Digital Library Interface and Services
Pre-Course Assignment
Moving on : Repository Services after the RAE
Christopher C. Brown Reference Librarian
Presentation transcript:

The Oxford-Google Digitization Project* Michael Popham Oxford Digital Library * Rules of commercial confidentiality apply to this presentation!

FIBS 25/01/07 The Lawyers’ Vision ( non-attributable ) Google and Oxford plan to digitize 1-1.5M books as part of the Google Books Library Project The project will take at least 3 years to complete and involve approximately 35 digitization workstations running in 2 shifts Files will be created as TIFFs and JPEGs and delivered as PNG or PDFs….etc. Fortunately both Oxford and Google like to make information accessible…

FIBS 25/01/07 Thomas Bodley’s Vision Bodleian founded 1602 Universal library Bodley’s “Republic of Letters” Legal deposit privilege since % of Bodleian readers not members of Oxford University

FIBS 25/01/07 Oxford University Library Services > 660 staff (600 fte) 40 libraries, including the Bodleian Budget > £25m (€37m) Total bookstock:11 million items 156 miles (250km) of shelving, including repository space

FIBS 25/01/07 The “Digital Library” at Oxford 1960s Machine-readable texts for scholarly purposes 1976 Oxford Text Archive founded 1980s Networked databases and CD-ROMs 1990s Libraries on the web, e-journals etc Oxford Digital Library (ODL) 2005ELISO ( Electronic Library & Information Service ) Google/Oxford partnership 2006FENIX e-prints/e-theses institutional repository

FIBS 25/01/07 Some Oxford digitization projects Toyota City Imaging Project (1993) Specialized Research Collections in the Humanities (NFF) and eLib projects ( ) – John Johnson Collection – Broadside Ballads – Early manuscripts in Oxford Oxford Digital Library (2001 onwards) – Scoping study ( ) – ODL Development Fund (Mellon Foundation ) – Three production phases

FIBS 25/01/07 Why partner with Google? The synergy between missions: – Bodley’s “Republic of Letters” – Google’s “To organize the world’s information and make it universally accessible and useful” Emphasis is on access not conservation – Oxford University Library Services: opening-up our closed stacks – Google: “…the next generation of the card catalog” Bring more Oxford-held content into the digital landscape making it available for scholarly and public benefit. Builds on the work of the Oxford Digital Library (ODL)

FIBS 25/01/07 What to digitize? Direct discussions with Google since 2003 Win/win situation for both parties Extensive collection of out-of-copyright (and mostly out-of-print) material identified – Oxford differs from other partners in this aspect of our agreement – Decision made to begin with the 19 th century material – Looking at approximately 1+ million items

FIBS 25/01/07 Overview of workflow Selection Suitable for digitization? Reshelve Fast-track Slow-track Digitize Generate deliverables Store outputs Update OULS OPAC QA Y Y N N Update Google.print index

FIBS 25/01/07 Approach OULS staff work closely with Google staff – e.g. training on how to handle the material Each component of the workflow must be comfortable for both parties A large and complex logistical operation that must not compromise the service to our users

FIBS 25/01/07 Outputs and outcomes Large raw colour images from digitization process Per volume, OULS receives: – JPEG2000 (probably), and TIFFs – Uncorrected OCR Audit of production process There are quality control processes at Google & Oxford Deliverable images (to be hosted by Google in the first instance) linked to OPAC records Ongoing software/hardware developments to improve the process

FIBS 25/01/07 Challenges that lie ahead… Building the local infrastructure to manage and deliver the Oxford Digital Copy of the data Investigating ways to exploit the data, e.g.: – Correcting OCR files, adding additional markup – (Re-)structuring the data – moving beyond a simple search and page-turning presentation – Completing/extending volumes and collections – Automatic collation, authorship attribution, stylistic analysis.….and many, many more(?!) Raising the barrier of what is possible, and end-users’ expectations about what we can deliver

FIBS 25/01/07 Feel the Fear…. ©opyright and IPR Threat to (Scholarly) e-Publishers Proliferating plagiarism Encouraging poor research Scope creep, scalability, data deluge Preservation and access

FIBS 25/01/07

What libraries are you working with? We're currently working with the University of Michigan, Harvard University, Stanford University, The New York Public Library, Oxford University, the Universidad Complutense de Madrid, the University of Virginia, the University of Wisconsin-Madison and the University of California to include their collections in Google Book Search and, like a card catalog, show you basic information about the books and in some cases a few snippets - sentences of your search term in context. If a book is determined to be in the public domain, we'll show you the full text of the book - that is, you can page through the book from start to finish.

FIBS 25/01/07

Useful links