HathiTrust Digital Library Interface and Services Angelina Zaytsev Collection Services Librarian azaytsev@hathitrust.org
Agenda Collection overview Interface overview Other services If time permits: Programs Working groups & committees Governance & partnership
Collection Overview
HathiTrust Collections: Oct 2016 14.7 million total items 7.4 million book titles 405,000 serial titles 767,000 US federal government documents 5.7 million items open (public domain & CC- licenses) These volumes have been contributed by over 40 different institutions and primarily comes from institutions located in North America. 6 April 2016
For more information... You can click through to see the results for all of these categories! https://www.hathitrust.org/statistics_visualizations
What kind of content formats? Scanned from book-like materials Image formats: TIFFs and JPEG2000s Plain-text OCR PDFs are generated on-the-fly and delivered to users (NOT stored in the repository) Some: Born-digital pdfs (and maybe epubs soon!) Photos
Where does content come from? - Digitization source Type Characteristics Google 94.8% of the collection Download restrictions Primarily scanned in black and white with some color pages Large-scale mass digitization = quality can vary Internet Archive 3.7% of the collection No download restrictions Scanned in full color (as a result, file sizes are 2.5 times larger than Google content) Locally digitized & vendor services 1.48% of the collection Various restrictions may apply Typically small scale, “boutique” digitization = high quality (with some exceptions)
Where does content come from? - Top 10 contributing libraries Institution Volumes University of Michigan 4,714,231 University of California 3,835,563 Harvard University 841,969 Cornell University 585,190 University of Wisconsin - Madison 561,945 Indiana University 530,763 University of Illinois at Urbana-Champaign 528,545 University of Minnesota 503,057 The University of Texas 460,139 Pennsylvania State University 390,345
Special Collections Universidad Complutense de Madrid: Latin, Spanish and French documents from the 1500-1800s Keio University: 92,000+ Japanese and some Chinese language materials Islamic Manuscripts from University of Michigan: 8th-20th century CE mss., 1,795 titles in Arabic, Persian, and Ottoman Turkish languages, collaborative cataloging project Benson Latin American Collection, University of Texas at Austin: 460,000 vols related to Latin American culture and history Minnesota Digital Library & Minnesota Historical Society: 60,000 photos related to Minnesota history US Fed Gov Docs: 766,000+ documents and growing! UCM is one of the oldest universities in the world - around since 1293 Keio is the oldest university of Japan
Access is determined by several factors: Copyright status of the item Derived from: Bibliographic metadata (inc for US fed gov docs) Manual copyright review Permissions agreements Geographic location of the user In the United States vs. Outside the United States Member affiliation Yes/no? Digitization source and/or contributing institution Any restrictions imposed by these entities?
Type of work Search (bibliographic and full-text) Text and Data Mining Viewable* Full-PDF download Print disabilities* Preservation uses (Section 108)* Public domain worldwide Worldwide Partners only if 3rd-party restrictions. If not, worldwide. N/A Public domain (US) – Non-US works published between 1873 and 1923. Available within the United States When accessed from with the United States Partners in the US if 3rd party restrictions. If not, anyone in the US Partners in the US; partners worldwide where laws permit Works that rights holders have opened access to in HathiTrust Worldwide unless license forbids it Worldwide (if digitized by Google, full-PDF only available if opened with CC license) Works that are in-copyright or of undetermined status Forthcoming Not available * Note: Access to in-copyright works is subject to conditions listed in HathiTrust’s policies on Access and Use.
Interface Overview
Full-text search Catalog search Pageturner Collection Builder
Other services Get bib records in bulk: https://www.hathitrust.org/data Get datasets: https://www.hathitrust.org/datasets Get high resolution image files: https://www.hathitrust.org/data_api Get bib records for known identifiers: https://www.hathitrust.org/bib_api Get some data about all HT content: https://www.hathitrust.org/hathifiles
For help See https://www.hathitrust.org/help Contact feedback@issues.hathitrust.org
Questions?
BUT... HathiTrust is more than just a library!
HathiTrust Research Center Goal: to build a secure environment and provide services to support data mining and text analysis Portal: https://analytics.hathitrust.org/ Soon: analysis against the copyrighted content in HT Advanced Collaborative Support (ACS): mini-grants where awardees get staff time, not $ HathiTrust+Bookworm: visualize word trends Extracted Features dataset: bits of data about the content
US Federal Documents Program Build a Registry of all known US fed gov docs Collect a complete corpus of all know US fed gov docs https://www.hathitrust.org/usgovdocs
Shared Print Program Build a shared print monograph program across the membership in order to reduce collective costs of maintaining print collections Goal: secure retention commitments for all monographs in HathiTrust https://www.hathitrust.org/shared_print_program
Copyright Review Management System Volunteers from HT member libraries undertake manual copyright review of certain categories of materials To date, has focused on the following categories: Monographs published in Australia, the United Kingdom and Canada from 1876-1945 Monographs published in the United States from 1923-1977 https://www.hathitrust.org/copyright-review
Members participate in other groups and committees User Support Working Group https://www.hathitrust.org/wg_user-support_charge Collections Committee https://www.hathitrust.org/collections-committee-charge Metadata Policy, Strategy, Use and Sharing Advisory Group (MUSAG) https://www.hathitrust.org/wg_musag_charge HathiTrust Quality Assurance and Standards Working Group https://www.hathitrust.org/qaswg_charge