Presentation is loading. Please wait.

Presentation is loading. Please wait.

Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009.

Similar presentations


Presentation on theme: "Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009."— Presentation transcript:

1 Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009

2 I have always imagined that paradise will be a kind of library - Jorge Luis Borges

3

4

5 Diderot’s Encyclopédie, 1751 - 1772

6

7

8

9 UC Holdings By Format (Adjusted for Duplication)

10 Usage of Library Materials at UC (2007)

11

12

13

14

15 …and Along Came Google Google Library Project –2005: The ‘Google Five:’ Harvard, Oxford, New York Public Library, Stanford, University of Michigan –2009: 22 library partners in 5 countries Google Publisher Partner Program

16 …and the Open Content Alliance October 2005 –Founders: Internet Archive, University of California, U of Toronto… –Large-scale digitization of out-of-copyright works only –A project of the Internet Archive

17 …and Microsoft Out-of-Copyright Works Only

18 UC Mass Digitization Projects October 2005 August 2006 March 2007 - July 2008 Founding Member of Open Content Alliance UC Joins Google Library Project Microsoft Digitization Agreement

19 So: Two Projects, One Goal Goal: Mass digitization of library book collections Google –In-copyright and out-of-copyright works –All languages –Available via Google search engine and Google Book Search Internet Archive / Open Content Alliance –Out-of-copyright works only –Primarily English language, some romance languages now –Available via the Internet Archive and Open Library websites to any and all search engines –Library and grant-funded

20 Why Are They Doing It? Google’s vision: –To put all the world’s information online –To gain marketshare and competitive advantage for their search (and online advertising) services –It’s all about Search Internet Archive: To put the world’s information online, for free, forever –It’s all about the public good

21 Why are we doing it? Improve discovery –indexing the full text of every book and making that full text available via Google and other search engines makes our books easier to find by placing them where the users are. Fulfill our public service mission –Many books of enduring general interest that are in the public domain – including classic works of literature but also more unique items such as early histories of the settlement of California and the West – can now be read by anyone, anywhere, anytime. Preserve and protect our collections –In earthquake and fire-prone California, digitizing books in our collections may also help protect the university from catastrophic loss should disaster someday strike our libraries Enhance student and faculty research –Scholars can trace the evolution of ideas and perform other sophisticated textual analysis more easily when the full text is indexed and searchable by computer, opening scholarship in new ways. Support collection management –by making our collections more available digitally, we can explore more efficient and effective ways to manage our print collections

22 Internet Archive: UC Contributors Northern Regional Library Facility (NRLF) Southern Regional Library Facility (SRLF) UC Berkeley, Bancroft Library UCLA UC Davis

23 Google Project: UC Contributors Northern Regional Library Facility (NRLF) + UC Berkeley Systems UC Santa Cruz UC San Diego

24 CDL’s role, on behalf of UC Liaison with partners Planning & coordination Funding Stewardship of digital content New services

25 Campuses Provide the Books

26 The Book Digitization Process A world of barcodes, logistics, loading docks, packing materials, and scanning machines!

27

28

29 Digital files Images OCR - Text OCR - Page coordinates Metadata

30

31

32

33 Reasons books might get rejected (images)

34 What subjects are being digitized? Cookbooks Children’s books American history Humanities Science East Asian & Pacific Rim collections

35 Languages

36 Where can you access the books? Google Book Search: http://books.google.com/http://books.google.com/ Internet Archive: http://www.archive.org/details/university_of_california _libraries http://www.archive.org/details/university_of_california _libraries Melvyl and WorldCat Local –Via Google API And eventually… –Google Institutional Subscription –HathiTrust

37

38

39

40

41

42

43

44 What does the future hold? Additional access via the Google Settlement Access and Preservation via HathiTrust

45 Google Settlement: Key Facts In October 2008, Google settled a class action lawsuit brought by organizations representing authors and publishers, who claimed that Google’s library scanning program violated their copyrights. Google has always claimed that this was fair use and legitimate under copyright law. The Settlement must be approved by the courts in order to become effective. At this time we do not know if the court will approve the Settlement, although we hope to know more after a court hearing in June. At this time, everything we say about the effect of the Settlement should be considered preliminary and provisional. UC will continue to digitize books from its collections with Google regardless of whether the Settlement goes forward.

46 Benefits of the Settlement Public Access terminals in public libraries across the country that will allow the general public to find and read books that are out of print or in the public domain An Institutional Subscription that will allow UC students and faculty and other academic libraries to access the full text of millions of out of print books digitized from libraries around the world. –Books in the institutional subscription will have persistent links for use in electronic course reserves, course reading lists, etc. A Research Corpus that will support advanced computational research on the full text of millions of books that Google has digitized Services for visually-impaired users to read and access all of the volumes Google has scanned

47 Existing Google services will also remain available Google Book Search will continue to make the full text of all books searchable – “Find in a library” pointers will lead users to the copies in our libraries –More books in GBS will be enabled for full text viewing, and still more will have ‘preview’ mode enabled, for better browsability UC will receive copies of all of the books scanned from our collections –Use of the digital files will depend on the copyright status of the book –At a minimum, we will be able to use the digital files to replace missing or deteriorated copies in our collection when needed –These copies will be stored in the HathiTrust shared repository Books in the public domain can be used and downloaded freely by scholars and the general public –Libraries can share their copies with other academic institutions for scholarship and research

48 What the Settlement won’t allow There are a few things we won’t be able to do with our own digital copies of the Google books We cannot: –Use in-copyright books for interlibrary loan or e-reserves reserve links will be possible from the institutional subscription –allow full text viewing of in-copyright works this will be possible through the institutional subscription –allow access via 3rd-party search engines and automated crawlers

49 Is the Settlement a good thing or a bad thing? The Google Settlement is not without controversy. Some people are concerned that it will: –Give Google a monopoly over book digitization and suppress competition –Allow Google to charge high prices for subscriptions –Create an artificial market for orphan works, preventing orphan works legislation from being passed that might lead to more open sharing of those works Orphan works = works still under copyright whose copyright owners cannot be identified or located On balance, UC supports the Settlement. While not perfect, UC believes that the Google Library Partner Program and the Google Settlement will result in greatly improved access to the millions of books residing in research library collections, both for libraries and the general public.

50 But we’re not just banking on Google…

51 Currently Digitized 2,790,739 volumes 976,758,650 pages 104 terabytes 33 miles 2,267 tons 434,390 volumes (~16% of total) in the public domain http://www.hathitrust.org

52 What is the HathiTrust? A shared digital repository for mass digitized books formed in October 2008 Members: –University of Michigan –Indiana University –University of California –CIC Libraries (Committee on Institutional Cooperation) – “Big Ten+” schools –University of Virginia –More institutions may join in future Where are the digitized files stored? –Servers at the University of Michigan and Indiana University –Additional mirror sites may be developed in future

53

54

55 Why is UC participating? Economy of scale –Storing mass digitized books is expensive – many terabytes of data Stewardship and preservation of UC resources –Will bring our Google and Internet Archive books together in one preservation repository under UC control - we can’t leave it all to Google and other third parties Access to our own books –UC will be able to link to full text in HathiTrust from Melvyl and WorldCat Local and build its own access interfaces via the HathiTrust API Aggregate multiple library collections for greater research impact –With UC, nearly 5 million books and counting –¾ million books in the public domain –HathiTrust will support shared access and search mechanisms across all partner content to the extent possible Experiment with largescale search, text mining, and other specialized services developed with academic users in mind –Google and Internet Archive are building services for the general user –Research libraries will build services optimized for academic users

56 When will all this be available? UC Google books will be ingested into HathiTrust over the next several months –UC Internet Archive books will follow CDL is beginning to investigate access mechanisms in concert with Michigan and other HathiTrust partners –Planning discussions are underway with OCLC for a HathiTrust catalog based on WorldCat Local –APIs will allow UC to add links to books via WorldCat Local –“Collection builder” functionality will allow librarians and individual end users to create and share specific themed collections –More advanced search and text mining to follow Building robust services will take time

57 What about our beloved print collections??

58 Current Picture: UC Collections

59 Future Picture: UC Collections

60 Print is not going away! Some books will always have artifactual value But…Mass Digitization: –Creates collection management opportunities Can we store print more economically and deliver it to users on demand? Can digital surrogates allow us to reduce duplication among our physical collections? Can we develop shared print repositories with other research institutions to mirror our shared electronic repositories? Can we optimize the use of our valuable library space through better print collection management? –Will allow us to better understand our users’ needs for print vs. digital collections –Will help us shape the library of the 21st Century for the 21 st Century user

61


Download ppt "Re-envisioning (and Re-purposing) Collections: Mass Digitization, Google, and the HathiTrust Ivy Anderson CDL CDL Users Council Meeting April 10, 2009."

Similar presentations


Ads by Google