Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
How HathiTrust Serves the UC Community Users Council May 21, 2012 Heather Christenson, California Digital Library.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
FHWA Research Library Martha Soneira Team Leader, Strategic Communications Federal Highway Administration.
Google Series Part 1: gmail Part 2: maps Part 3: talk Part 4: earth Part 5: books Part 6: picasa Part 7: sites Part x: ?
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
The Google Books Settlement: A Partner Library Perspective Ivy Anderson California Digital Library Library Journal Virtual E-Book.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Massively Digitizing UC Collections Ivy Anderson Director, Collections California Digital Library May 2009.
Moving libraries to Web scale Matt Goldner Product & Technology Advocate 14 June 2011.
Moving Shared Print to the Network Level Emily Stambaugh ALA Annual Conference Las Vegas, NV June 27, 2014 “Looking to the Future of Shared Print” Shared.
Toulouse School of Graduate Studies Theses and Dissertations ETDs - Why We Do them –We at UNT believe that electronic theses and dissertations enhance.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
 an easy-to-use interface for deposit and update  access via persistent URLs  tools for long-term management  permanent storage Merritt is a new cost-effective.
Massively Digitizing UC Library Collections Google, Microsoft, and More Learning in Retirement Libraries – The Intersection of Tradition and Innovation.
Partnership agreement between Complutense University and Google Books Manuela Palafox Parejo Servicio Edición Digital y Web Biblioteca de la Universidad.
Metadata Guidelines for Disclosing Shared Print Commitments Lizanne Payne Shared Print Consultant ALA Midwinter 2013.
The impacts of google digitization projects on libraries
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
Rich Foley - Executive Vice President Academic & Public Markets Helen Wilbur - Vice President Consortia Sales & Marketing Digital ArchivesResearch CollectionseBooks.
Overview of the Google Books digitized from the University of Michigan Library collection: its impact to Korean Studies scholars -- Yunah Sung, University.
. Do not distribute. 2 Online Content (Billions of items indexed) Offline Content (Billions of items still un-indexed) Google’s.
How Research Libraries Became E-knowledge Networks Peter X. Zhou 周欣平 University of California, Berkeley University of California, Berkeley October 6, 2009.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Google Book Settlement NIH Public Access Act The Fair Copyright in Research Works Act FRPAA Institutional Mandates OA Day.
Cataloging and Metadata at the University Library.
Digitising Journals, March 2000, Copenhagen Astrid Wissenburg Information Services and Systems King’s College London
Live Search Books University of Toronto – Scholar’s Portal Forum 2007 January 2007.
Google Books, UMI and Other Intriguing Trends in Digital Publishing Joe Wible Hopkins Marine Station of Stanford University October 9, 2006.
Publisher’s Perspective: Digitization of print resources, and archiving of digital resources Judy Best, June 13, 2006.
Looking to the East: Challenges in Connecting Asian Libraries in the World of Information Karen T. Wei University of Illinois at Urbana-Champaign Hong.
The New Digital World and the Transformation of Information and Libraries Patricia L. Thibodeau Associate Dean Library Services & Archives Oct. 26, 2011.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
University of California Mass Digitization Projects Update Users Council Annual Meeting May 8, 2008 Heather Christenson, Mass Digitization Project Mgr,
HOT TOPIC: ARE E-BOOKS THE FUTURE: July 23, 2012 American Association of Law Libraries 2012 Marshall Breeding Independent Consult, Author, Founder and.
GOOGLE BOOK SEARCH Intellectual Property RightsCompetition Law.
Jonathan Band Jonathan Band PLLC Library Association Concerns with the Google Book Settlement.
Challenges and Opportunities for Academic Libraries Collaborative Imperatives to Support Collections, Digital Initiatives, and New Services for a Changing.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
E-books and E-Journals in US University Libraries: Current Status and Future Prospects James Michalko Vice President, OCLC Research Symposium Keio University.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
INTELLECTUAL RIGHTS AND HISTORIC CORPORA Mark Sandler University of Michigan ICOLC, March, 2003.
1 The Oxford-Google mass-digitisation project: How, why and what? An EDUCAUSE Webcast by Reg Carr (University of Oxford) 15 June 2005.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
April 2, For Today Review of Google Research Techniques and Software to me (10 points) Google Scholar Google Books Google Search.
Building on Other’s Creative Expression By: Alicia Trevino.
The Future of Scholarly Communication & the Role of Libraries Roy Tennant eScholarship, The California Digital Library.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Next-Generation Melvyl Pilot supported by WorldCat Local: The Future of Searching UNIVERSITY CALIFORNIA LIBRARIES The UNIVERSITY of CALIFORNIA LIBRARIES.
Leveraging the Expertise of our Staff and the Information Resources We Manage MIT Libraries Visiting Committee April 13, 2005.
Jacquie Samples Duke University Libraries MARC Formats Interest Group January 8, 2011 Will RDA Mean the Death of MARC?
Using Content Presented by Karen Andrews Physical Sciences & Engineering Librarian, U.C. Davis Tuesday, September 13, :30-9:30 ASIDIC Fall 2005 Meeting.
Scholarly works, research, reports, publications What is an Institutional Repository? Focus on Research Groups Promoting Physics Faculty, Students and.
Let the Patron Drive: Purchase on Demand of E-books Jonathan Nabe Andrea Imre Southern Illinois University Carbondale NASIG, June 4, 2010.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
HathiTrust--a GovDocs Repository? Brian Vetruba, Catalog Librarian/Germanic Studies Librarian Washington University in St. Louis Leveraging.
Introduction to SHERPA RoMEO and its Significance for Publishers
HathiTrust Digital Library Interface and Services
Mass Digitization of Books and the Potential for Universal Access
Digitality and Research: What has Changed
Copyright and Plagiarism and Citations, Oh My! SCHOOL OF PHARMACY
Re-envisioning (and Re-purposing) Collections:
Copyright Policy & Education Officer
Christopher C. Brown Reference Librarian
Metadata Guidelines for Disclosing Shared Print Commitments
HathiTrust And Its Research Center
Presentation transcript:

Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009

I have always imagined that paradise will be a kind of library - Jorge Luis Borges

Diderot’s Encyclopédie,

UC Holdings By Format (Adjusted for Duplication)

Usage of Library Materials at UC (2007)

…and Along Came Google Google Library Project –2005: The ‘Google Five:’ Harvard, Oxford, New York Public Library, Stanford, University of Michigan –2009: 22 library partners in 5 countries Google Publisher Partner Program

…and the Open Content Alliance October 2005 –Founders: Internet Archive, University of California, U of Toronto… –Large-scale digitization of out-of-copyright works only –A project of the Internet Archive

…and Microsoft Out-of-Copyright Works Only

UC Mass Digitization Projects October 2005 August 2006 March July 2008 Founding Member of Open Content Alliance UC Joins Google Library Project Microsoft Digitization Agreement

So: Two Projects, One Goal Goal: Mass digitization of library book collections Google –In-copyright and out-of-copyright works –All languages –Available via Google search engine and Google Book Search Internet Archive / Open Content Alliance –Out-of-copyright works only –Primarily English language, some romance languages now –Available via the Internet Archive and Open Library websites to any and all search engines –Library and grant-funded

Why Are They Doing It? Google’s vision: –To put all the world’s information online –To gain marketshare and competitive advantage for their search (and online advertising) services –It’s all about Search Internet Archive: To put the world’s information online, for free, forever –It’s all about the public good

Why are we doing it? Improve discovery –indexing the full text of every book and making that full text available via Google and other search engines makes our books easier to find by placing them where the users are. Fulfill our public service mission –Many books of enduring general interest that are in the public domain – including classic works of literature but also more unique items such as early histories of the settlement of California and the West – can now be read by anyone, anywhere, anytime. Preserve and protect our collections –In earthquake and fire-prone California, digitizing books in our collections may also help protect the university from catastrophic loss should disaster someday strike our libraries Enhance student and faculty research –Scholars can trace the evolution of ideas and perform other sophisticated textual analysis more easily when the full text is indexed and searchable by computer, opening scholarship in new ways. Support collection management –by making our collections more available digitally, we can explore more efficient and effective ways to manage our print collections

Internet Archive: UC Contributors Northern Regional Library Facility (NRLF) Southern Regional Library Facility (SRLF) UC Berkeley, Bancroft Library UCLA UC Davis

Google Project: UC Contributors Northern Regional Library Facility (NRLF) + UC Berkeley Systems UC Santa Cruz UC San Diego

CDL’s role, on behalf of UC Liaison with partners Planning & coordination Funding Stewardship of digital content New services

Campuses Provide the Books

The Book Digitization Process A world of barcodes, logistics, loading docks, packing materials, and scanning machines!

Digital files Images OCR - Text OCR - Page coordinates Metadata

Reasons books might get rejected (images)

What subjects are being digitized? Cookbooks Children’s books American history Humanities Science East Asian & Pacific Rim collections

Languages

Where can you access the books? Google Book Search: Internet Archive: _libraries _libraries Melvyl and WorldCat Local –Via Google API And eventually… –Google Institutional Subscription –HathiTrust

What does the future hold? Additional access via the Google Settlement Access and Preservation via HathiTrust

Google Settlement: Key Facts In October 2008, Google settled a class action lawsuit brought by organizations representing authors and publishers, who claimed that Google’s library scanning program violated their copyrights. Google has always claimed that this was fair use and legitimate under copyright law. The Settlement must be approved by the courts in order to become effective. At this time we do not know if the court will approve the Settlement, although we hope to know more after a court hearing in June. At this time, everything we say about the effect of the Settlement should be considered preliminary and provisional. UC will continue to digitize books from its collections with Google regardless of whether the Settlement goes forward.

Benefits of the Settlement Public Access terminals in public libraries across the country that will allow the general public to find and read books that are out of print or in the public domain An Institutional Subscription that will allow UC students and faculty and other academic libraries to access the full text of millions of out of print books digitized from libraries around the world. –Books in the institutional subscription will have persistent links for use in electronic course reserves, course reading lists, etc. A Research Corpus that will support advanced computational research on the full text of millions of books that Google has digitized Services for visually-impaired users to read and access all of the volumes Google has scanned

Existing Google services will also remain available Google Book Search will continue to make the full text of all books searchable – “Find in a library” pointers will lead users to the copies in our libraries –More books in GBS will be enabled for full text viewing, and still more will have ‘preview’ mode enabled, for better browsability UC will receive copies of all of the books scanned from our collections –Use of the digital files will depend on the copyright status of the book –At a minimum, we will be able to use the digital files to replace missing or deteriorated copies in our collection when needed –These copies will be stored in the HathiTrust shared repository Books in the public domain can be used and downloaded freely by scholars and the general public –Libraries can share their copies with other academic institutions for scholarship and research

What the Settlement won’t allow There are a few things we won’t be able to do with our own digital copies of the Google books We cannot: –Use in-copyright books for interlibrary loan or e-reserves reserve links will be possible from the institutional subscription –allow full text viewing of in-copyright works this will be possible through the institutional subscription –allow access via 3rd-party search engines and automated crawlers

Is the Settlement a good thing or a bad thing? The Google Settlement is not without controversy. Some people are concerned that it will: –Give Google a monopoly over book digitization and suppress competition –Allow Google to charge high prices for subscriptions –Create an artificial market for orphan works, preventing orphan works legislation from being passed that might lead to more open sharing of those works Orphan works = works still under copyright whose copyright owners cannot be identified or located On balance, UC supports the Settlement. While not perfect, UC believes that the Google Library Partner Program and the Google Settlement will result in greatly improved access to the millions of books residing in research library collections, both for libraries and the general public.

But we’re not just banking on Google…

Currently Digitized 2,790,739 volumes 976,758,650 pages 104 terabytes 33 miles 2,267 tons 434,390 volumes (~16% of total) in the public domain

What is the HathiTrust? A shared digital repository for mass digitized books formed in October 2008 Members: –University of Michigan –Indiana University –University of California –CIC Libraries (Committee on Institutional Cooperation) – “Big Ten+” schools –University of Virginia –More institutions may join in future Where are the digitized files stored? –Servers at the University of Michigan and Indiana University –Additional mirror sites may be developed in future

Why is UC participating? Economy of scale –Storing mass digitized books is expensive – many terabytes of data Stewardship and preservation of UC resources –Will bring our Google and Internet Archive books together in one preservation repository under UC control - we can’t leave it all to Google and other third parties Access to our own books –UC will be able to link to full text in HathiTrust from Melvyl and WorldCat Local and build its own access interfaces via the HathiTrust API Aggregate multiple library collections for greater research impact –With UC, nearly 5 million books and counting –¾ million books in the public domain –HathiTrust will support shared access and search mechanisms across all partner content to the extent possible Experiment with largescale search, text mining, and other specialized services developed with academic users in mind –Google and Internet Archive are building services for the general user –Research libraries will build services optimized for academic users

When will all this be available? UC Google books will be ingested into HathiTrust over the next several months –UC Internet Archive books will follow CDL is beginning to investigate access mechanisms in concert with Michigan and other HathiTrust partners –Planning discussions are underway with OCLC for a HathiTrust catalog based on WorldCat Local –APIs will allow UC to add links to books via WorldCat Local –“Collection builder” functionality will allow librarians and individual end users to create and share specific themed collections –More advanced search and text mining to follow Building robust services will take time

What about our beloved print collections??

Current Picture: UC Collections

Future Picture: UC Collections

Print is not going away! Some books will always have artifactual value But…Mass Digitization: –Creates collection management opportunities Can we store print more economically and deliver it to users on demand? Can digital surrogates allow us to reduce duplication among our physical collections? Can we develop shared print repositories with other research institutions to mirror our shared electronic repositories? Can we optimize the use of our valuable library space through better print collection management? –Will allow us to better understand our users’ needs for print vs. digital collections –Will help us shape the library of the 21st Century for the 21 st Century user