HATHI TRUST A Shared Digital Repository Unpacking HathiTrusts New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011
About
Partnership Arizona State University Boston University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin- Madison Utah State University Yale University Library
Digital Repository Launched 2008 Initial focus on digitized book and journal content Light archive – As accessible as possible within the bounds of law
Statistics 8,980,200 volumes 4,679,248 book titles 214,155 serial titles 2,450,522 public domain
The Name The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy
Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge
Goals Comprehensive collection Preservation…with Access Shared strategies – Collection management, development – Preservation – Copyright – Efficient user services Openness
Governance
HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning
Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM
Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, University of Wisconsin - Madison John Butler, AUL for Information Technology, University of Minnesota Patricia Cruse, Director, Preservation, CDL Todd Grappone, AUL for Digital Initiatives & IT, UCLA Julia Kochi, Director, Digital Library and Collections, UC San Francisco Sarah Pritchard, University Librarian, Northwestern University Paul Soderdahl, Director, LIT, University of Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University
Constitutional Convention October 2011 Delegates from each institution and consortium – Carry certain number of votes determined according to formula approved by Executive Committee 3-year review Proposals – Print management – Ballot proposals
Partnership
Who can become a partner? – Institutions worldwide – Libraries with print holdings
What are the benefits? (1) Cost-effective long-term preservation and access services for digitized content – Commitments on digital content facilitate decisions about digitization efforts and print collection management For those with content, immediately offering long-term preservation, bibliographic and full-text search, collection-building With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions
What are the benefits? (2) Specialized access to public domain and in-copyright materials for users with print disabilities Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works), access to orphan works HathiTrust encourages participation in initiatives and resources geared toward – Shared collection development and management (e.g., copyright review work, print holdings database, de-duplication, collaboration with other organizations and initiatives) – Participation in governance and collaborative initiatives – Defining future directions of the shared library.
Whats involved? Contract – Sustaining – Content-Contributing Yearly fees Commitment – 5-year periods Shibboleth Print Holdings
Costs Base funding from partner institutions Basic infrastructure costs Commitments in 5-year periods
How much does it cost? (1)
How much does it cost? (2) $0.149/volume/year for Google-digitized $0.489/volume/year for IA-digitized $0.154/volume/year for all content $3.40 per GB
e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework
How does it work? (1) Sustaining membership is base – Pricing model for all partners beginning 2013 – Based on overlap of HathiTrust volumes with institutions print holdings – Share in infrastructure costs for public domain volumes: (PD*C*X)/N – Share in infrastructure costs for in copyright volumes based on holdings For a given incopyright volume: IC=(C*X)/H
How does it work? (2) Main factors in costs are – Amount of content – Number of partners – Also a flexible multiplier designed to pay for programmatic activities Tend to result in lower costs and more benefits over time
Example Factors – 1,000,000 PD volumes – 3,000,000 IC volumes – $0.154 per volume – 60 partners – Assume on average 12 institutions hold IC volumes Costs – PD = (1,000,000 *.154 * 1.5) / 60 = $3,850 – IC = (3,000,000 *.154 * 1.5) / 12 = $57,750 – Total = $61,600
How does it work? (3) In order to support these calculations – Need print holdings database (2013) – Update mechanisms – Manual remediation Analysis will also support – Expansion of legal uses of materials, to users who have print disabilities, to orphan works – Facilitate collaborative collection development and management operations – Will also benefit efforts in de-duplication
Print Holdings Database Volumes institutions own or have owned – Only print volumes (not microform, etc.) – OCLC number [required] – Bib record ID [required] – Condition (e.g., brittle) [optional] – Holding Status (e.g., current holding, withdrawn, missing, etc.) [optional]
Percent Overlap Average = 37.4%
Questions Why not get the information from OCLC? Is it necessary to declare all volumes held, or could an institution choose not to declare some? Are the print holdings data currently provided by institutions taken as an indication of the volumes institutions are declaring they have access to?
What are we doing currently? Basing yearly fees on estimates – Based on infrastructure costs of anticipated content – Estimated partnership growth – Institution total volume counts
SUNY Costs SUNY University Centers – Albany, Binghamton, Buffalo, Stony Brook, Update and Downstate Medical Libraries – 11,049,952 volumes All SUNY (based on 16,000,000 titles) – 27 institutions total – 20,800,000 volumes
SUNY costs (2) Estimate using – 9,500,000 volumes at end of 2011 – 60 partners (for University Centers and Medical libraries) – 87 partners (for all SUNY libraries) – Multiplier of 1.5
SUNY costs (3) University Centers – Public Domain Total PD cost * 1.5 / #partners * 6 = $70, – In Copyright % of holdings (partner holdings / total holdings) * Total IC cost * 1.5 = $67, – Total = $138, Prorated from August 1 = $58,072.21
SUNY costs (3) All SUNY – Public Domain Total PD cost * 1.5 / 87 * 27 = $220, – In Copyright % holdings (partner holdings / total holdings) * Total IC cost * 1.5 = $127, – Total = $347, Prorated from August 1 = $145,556.69
Sustaining v. Content-Contributing Does not exclude contribution of content If contribute content, costs covered up to amount that would be paid as Sustaining partner – Barring additional costs that might be needed to accommodate content (e.g., specialized load routines, generation of OCR) Above that, pay per-GB cost ($3.40)
Summary Partners share in costs of sustaining common resource Share in uses of relevant materials Voice in future directions Costs to institutions go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation Free riders?
Changing Library Landscape Rapidly changing landscape Libraries are making these decisions but they are more and more collective decisions We cannot afford anymore to do work separately that could be done collaboratively
HathiTrust overall benefits to libraries Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability, use – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems
How to find out more Web site About section: Twitter: Monthly newsletter: RSS: Contact us:
Thank you very much!