Download presentation
Presentation is loading. Please wait.
Published byCody Watkins Modified over 10 years ago
1
HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011
2
HathiTrust Overview Our organization and how it functions Our HathiTrust collection Perspectives on HathiTrust and public services Leveraging HathiTrust data How HathiTrust can make a difference How to find out more
3
Universal Library Common Goal Single Entity, Many Partners HathiTrust
4
Current Partners Arizona State University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley University of California Davis University of California Irvine University of California Los Angeles University of California Merced University of California Riverside University of California San Diego University of California San Francisco University of California Santa Barbara University of California Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin-Madison Utah State University Yale University Library
5
Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning
6
Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM
7
Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University
8
Working Groups Appointed by Strategic Advisory Board and Executive Committee Both operational and strategically-focused groups Collections, Communications, Discovery Interface, Full-text Search, Usability, User Support Now 40+ people across the country Expertise from across the partnership
9
Staff Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators – Working groups Shared development space
10
e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework
11
What work is there? Usage Reporting Quality Copyright Review Specifications Metadata Development Environment Other?
12
Basic Infrastructure Costs
13
Cost Model 1 Economies of scale keep costs low – $0.149/volume/year for Google-digitized – $0.489/volume/year for IA-digitized – $0.154/volume/year for all content Advantages not fully known until you jump in
14
A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus
15
Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M
16
For public domain volumes: (PD*X*C)/N For a given incopyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2
17
Sustaining common resource Costs go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation
18
Cost Model 2: Timeline & Requirements Timeline: – Implement in 2013 – Accept new partners now with costs based on overlap calculations Requirements: – Print holdings database – Update mechanisms – Manual remediation
19
Print Holdings Database Print holdings database will also benefit – De-duplication Compromises user experience, obscures collection development needs – Management of print volumes Information to withdraw volumes (journals) – Legal uses of copyright materials Section 108, 121, ADA uses will depend knowledge of which institutions own(ed) which materials
20
Questions?
21
Our HathiTrust Collection
22
Content Distribution 8,234,081 – Total volumes 2,102,033 – Public Domain 4,527,381 Book titles 202,649 Serial titles * As of March 5, 2011
23
Language Distribution (1) * As of March 5, 2011 The top 10 languages make up ~86% of all content
24
Language Distribution (2) The next 40 languages make up ~13% of total * As of March 5, 2011
25
Dates * As of March 5, 2011
26
Originating Institution * As of March 5, 2011
27
Content over time * As of March 5, 2011
28
Content Growth
29
Collection Development and Management Collections Committee Appropriate principles for duplicate volumes Print management proposal Prioritization of collection development activities Process for decision-making and prioritization for new content types Recommendations for tools and services Prioritization of copyright review and rights- clearing processes
30
What about quality? Validation upon ingest Gating on metrics from Google Updated versions from Google Proactive work by Google library partners IMLS grant to develop framework and methodology for validating content in large-scale digital repositories Crowd sourcing in our future?
31
Questions?
32
Perspectives on HathiTrust and public services
33
HathiTrust and Reference HathiTrust: like Google and licensed databases – very large, rich repositories of content, with services supporting their use Reference librarians – are intermediaries between all these resources and researchers who use them
34
HathiTrust as a Reference Source HathiTrust is CONSTANTLY changing Requirement thats not new to reference librarians, but greatly increased: Stay engaged. Read updates. Use it.
35
HathiTrust is DIFFERENT We are THE PRODUCERS of this resource – HathiTrust is OUR COLLECTION – New role - not recipient/grader/purchaser – WE build this resource Close engagement of sort we have not experienced before
36
HathiTrust and Google Books Fact: content in HathiTrust, by the numbers, is currently largely a subset of Google Books Thats how we started BUT Its just the start
37
HathiTrust stands on its own - Content HathiTrust content has been curated over time by librarians – Mirrors collections of large research libraries – Focus on quality Expanding Non-Google content – Public Domain: Copyright Review Management System – Content from non-Google sources Internet Archive, image collections, government publications
38
Copyright Review Management System – IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and 1963 – Wisconsin, Minnesota and Indiana each devote 1 FTE to this effort for Phase 3, 2010-2011 – As of March, 2011, over 125,000 volumes reviewed; 54% opened up in HathiTrust
39
HathiTrust stands on its own - Functionality HathiTrust supports scholarship Proper metadata User interface designed for scholarly work Services for people with visual impairments Large-scale text mining
40
HathiTrust stands on its own - Services Collection builder Member services (via Shibolleth logons) – download full PDFs – create permanent collections
41
How do people use HathiTrust? Of course, to read public domain books and journals But much more
42
Use stories I now go to HathiTrust as my first destination for in-depth reference questions. Fantastic searchable corpus; good metadata; content and functionality designed for scholarly needs. Indiana University librarian
43
Use stories (2) Complete Works of Voltaire (52-volume set published in late 19 th century) – scholar needed all volumes to do scholarly referencing from home – all in HathiTrust presented together under a single MARC record
44
Use stories (3) Open Folklore – a new way to use HathiTrust – Portal that provides access to open access published and unpublished folklore literature – Indiana Universitys Folklore Collection first CIC Collection of Distinction in Google – HathiTrust – the corner store in the shopping mall of digital repositories – Anchor for whole set of services and initiatives, including journal liberation projects http://www.openfolklore.org
45
Questions?
46
Leveraging HathiTrust data
47
A bibliographic metadata moment Bib data for each digital volume must be present in HathiTrust in order for volumes to be ingested Depositors make bib data available to UM to be loaded into HathiTrust bibliographic management system Info in the submitted bib records is used to make an initial rights determination about each volume The bib record acts as a manifest for the digital content that is then ingested A snapshot in time of the bib data associated with an object is also stored in the preservation metadata
48
HathiTrust makes our data available Goal is to extend possibilities for development of local services and other uses Bibliographic API Data API OAI feed of public domain Hathifiles 120,000 public domain texts for computational research
49
Some examples of use Catalogs UM loaded every record Chicago links to public domain volumes owned in print OCLC loaded records into WorldCat Link resolvers UC created SFX target Vendors H.W. Wilson databases linked to public domain volumes Needed: A guide with examples of how partners have used the data!
50
Future Directions (1) Locally-digitized partner content Usage reporting Coordinate digital and print resources (holdings database) Computational Research Quality Strategies for openness Collaborative Development Extending Services through Shibboleth Non-book, non-journal content
51
Future Directions (2) Born-digital content (Publishing) New Bibliographic Management Compliance with TRAC Grant projects OCLC Catalog 3-year review Improvements to Large-scale Search Improvements to PageTurner Ingest Reporting
52
How can HathiTrust make a difference? Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems
53
How to find out more Web site About section: http://www.hathitrust.org/about http://www.hathitrust.org/about Twitter: http://twitter.com/hathitrust http://twitter.com/hathitrust RSS: http://www.hathitrust.org/updates_rsshttp://www.hathitrust.org/updates_rss Monthly newsletter: http://www.hathitrust.org/updates http://www.hathitrust.org/updates Contact us: hathitrust-info@umich.edu hathitrust-info@umich.edu Soon: Facebook, blog
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.