Download presentation
Presentation is loading. Please wait.
Published byClaire West Modified over 9 years ago
1
Ebooks: digitizing our print collections Sian Meikle University of Toronto Libraries
2
Digitizing our print collections About mass digitization Who is digitizing? What is getting digitized? What is the output? Digitization issues Integration of print-to-digital content Choices for discovery and access Issues for delivery
3
Partners: 11 libraries; several publishers University of California National Library of Catalonia University Complutense of Madrid Harvard University University of Michigan New York Public Library Oxford University Stanford University University of Texas at Austin University of Virginia University of Wisconsin at Madison
4
Scanning: in-copyright & out-of-copyright In-copyright: Searching : all content fully indexed Display: snippets can be viewed, full text can be bought or located in library Out-of-copyright: PDFs can be downloaded for personal use, bought, or located in library
5
Google output Metadata Scanned images: TIFFs Access derivatives: JPEGs Image-based PDFs (one per page or one per book) Uncorrected OCR
6
Partners: Internet Archive 29 libraries Publishers: O’Reilly Media Industrial partners: MSN HP Labs Adobe Xerox
7
Open Content Alliance library partners Boston Public Library Boston Library Consortium Columbia University Emory University European Archive Indiana University Johns Hopkins University Libraries McMaster University Memorial University of Newfoundland Missouri Botanical Garden MSN National Archives (United Kingdom) National Library of Australia Rice University Tufts University San Francisco Public Library Simon Fraser University Smithsonian Libraries University of Alberta University of British Columbia University of California University of Chicago Library University of Georgia University of Illinois Urbana- Champaign University of North Carolina University of Ottawa University of Pittsburgh University of Texas University of Toronto University of Virginia Washington University York University
8
Scanning: Out of copyright, copyright-cleared material Searching: Search full content via MSN Live Book Search Metadata via Internet Archive Use: All scans & derivatives can be downloaded
9
Open Content Alliance output Metadata Scanned images: TIFFs/J2Ks/CR2s Access derivatives: JPEGs DJVU (viewer, requires plugin) Flip book (viewer, does not require plugin) Image-based PDFs (one per book) Uncorrected OCR integrated into the PDFs
10
Mass digitization content 900,000 pre-1923 titles 60% are unique 40% have more than one manifestation Data courtesy Microsoft, 2007
11
Comparison: scanned & born-digital Scanned from printBorn-digital Page imagesE-text Search uncorrected OCRSearch text TOC, title page, index are markedCan be highly segmented, linked Literature, history, …STM, social sciences, reference, …
12
Mass digitization: some Q&A Duplication Q: How do we guard against duplication? A: It might be cheaper just to scan duplicates. Omissions Q: What about fold outs, uncut pages, tightly bound books, print running into margins… A: Mass digitization works because it is efficient. A parallel process should handle exception cases.
13
Mass digitization at U of Toronto Not scanned: 2,400 (8%) Scanned: 32,000 (92%)
14
Method 1: Union digital repository Internet Archive (OCA) E-books integrated with non-book content User contributions (content, reviews) Other sites can point to this content
18
Method 2: Full text search repository MSN Live Books and Google Books Both cross-book & intra-book searching Google’s goal is to index MSN is developing a reading environment
19
Google
23
MSN Live Book Search
26
Why load it locally? Safekeeping Lots of copies keep stuff safe! Discovery Integration with licensed books Integration with non-book content Local subject specialization
27
Method 3: Local load University of Michigan E-books linked from OPAC Rights system decides who can view: Nobody University of Michigan United States World In-book searching: OCR, one-at-a-time
28
MBooks at University of Michigan Download and validation: local data mover GROOVE (perl & mysql) Data integrity: MDS fixity checks on jpegs, tiffs, utf-8 Quality assurance: GROOVE samples 20 p. chunks for students to check with ACDSee Problems referred to Google for later correction
29
University of Michigan: OPAC link
31
University of Michigan: e-book display
32
University of Michigan: search in e-book
33
How do people read? Intentional reading Attentive, sustained, linear reading of text Heavily influenced by printed-book culture Dominant in classical and scholarly literature Functional reading Manipulating different content types Web browsing, text database searching Most screen reading is functional Intentional Functional Hillesund, T., & Noring, J. E. (2006)
34
How do people know what they’ve read? [A] strong relationship…exists between the sensory motor representation of the user and his/her treatment of the information content of the paper book or e-book… Because an electronic book is functionally closer to a computer than a traditional book […] it does not provide the external indicators to memory that the classical book does… Morineau et al, 2005
35
Delivering the book to the user Printed books Make use copy Make discovery surrogate Search surrogates, choose candidates Examine candidates Browse more candidates Choose material Online books Make discovery surrogate Search surrogates, choose candidates Examine candidates ??? Choose material Make use copy User tasks
36
Implications for mass digitization Support production of good print copies for use Target TOC and index for indexing & correction Provide granular linking Provide browse functions
38
References Blanche, C., Gueguen, N., Morineau, T., & Tobin, L. (2005). The emergence of the contextual role of the e-book in cognitive processes through an ecological and functional analysis. International Journal of Human-Computer Studies, 62(3), 329- 348. Christianson, M., & Aucoin, M. (2005). Electronic or print books: Which are used? Library Collections, Acquisitions, and Technical Services, 29(1), 71-81. Hillesund, T., & Noring, J. E. (2006). Digital libraries and the need for a universal digital publication format. JEP: the Journal of Electronic Publishing, vol.9, no.2, cLevine-Clark, M. (2006). Electronic book usage: A survey at the University of Denver. portal: Libraries and the Academy, 6(3), 285-299. Su, S. (2005). Desirable search features of web-based scholarly e-book systems. Electronic Library, 23(1), 64-71.
39
Mass digitization archives Google Books: http://books.google.com/ Internet Archive: http://archive.org/ MSN Live Book Search: http://books.live.com/ University of Michigan: http://mirlyn.lib.umich.edu/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.