Download presentation
Presentation is loading. Please wait.
Published byJanice Garrett Modified over 7 years ago
1
HathiTrust Digital Library Interface and Services
Angelina Zaytsev Collection Services Librarian
2
Agenda Collection overview Interface overview Other services
If time permits: Programs Working groups & committees Governance & partnership
3
Collection Overview
4
HathiTrust Collections: Oct 2016
14.7 million total items 7.4 million book titles 405,000 serial titles 767,000 US federal government documents 5.7 million items open (public domain & CC- licenses) These volumes have been contributed by over 40 different institutions and primarily comes from institutions located in North America. 6 April 2016
8
For more information... You can click through to see the results for all of these categories!
9
What kind of content formats?
Scanned from book-like materials Image formats: TIFFs and JPEG2000s Plain-text OCR PDFs are generated on-the-fly and delivered to users (NOT stored in the repository) Some: Born-digital pdfs (and maybe epubs soon!) Photos
10
Where does content come from? - Digitization source
Type Characteristics Google 94.8% of the collection Download restrictions Primarily scanned in black and white with some color pages Large-scale mass digitization = quality can vary Internet Archive 3.7% of the collection No download restrictions Scanned in full color (as a result, file sizes are 2.5 times larger than Google content) Locally digitized & vendor services 1.48% of the collection Various restrictions may apply Typically small scale, “boutique” digitization = high quality (with some exceptions)
11
Where does content come from? - Top 10 contributing libraries
Institution Volumes University of Michigan 4,714,231 University of California 3,835,563 Harvard University 841,969 Cornell University 585,190 University of Wisconsin - Madison 561,945 Indiana University 530,763 University of Illinois at Urbana-Champaign 528,545 University of Minnesota 503,057 The University of Texas 460,139 Pennsylvania State University 390,345
12
Special Collections Universidad Complutense de Madrid: Latin, Spanish and French documents from the s Keio University: 92,000+ Japanese and some Chinese language materials Islamic Manuscripts from University of Michigan: 8th-20th century CE mss., 1,795 titles in Arabic, Persian, and Ottoman Turkish languages, collaborative cataloging project Benson Latin American Collection, University of Texas at Austin: 460,000 vols related to Latin American culture and history Minnesota Digital Library & Minnesota Historical Society: 60,000 photos related to Minnesota history US Fed Gov Docs: 766,000+ documents and growing! UCM is one of the oldest universities in the world - around since 1293 Keio is the oldest university of Japan
13
Access is determined by several factors:
Copyright status of the item Derived from: Bibliographic metadata (inc for US fed gov docs) Manual copyright review Permissions agreements Geographic location of the user In the United States vs. Outside the United States Member affiliation Yes/no? Digitization source and/or contributing institution Any restrictions imposed by these entities?
14
Type of work Search (bibliographic and full-text) Text and Data Mining Viewable* Full-PDF download Print disabilities* Preservation uses (Section 108)* Public domain worldwide Worldwide Partners only if 3rd-party restrictions. If not, worldwide. N/A Public domain (US) – Non-US works published between 1873 and 1923. Available within the United States When accessed from with the United States Partners in the US if 3rd party restrictions. If not, anyone in the US Partners in the US; partners worldwide where laws permit Works that rights holders have opened access to in HathiTrust Worldwide unless license forbids it Worldwide (if digitized by Google, full-PDF only available if opened with CC license) Works that are in-copyright or of undetermined status Forthcoming Not available * Note: Access to in-copyright works is subject to conditions listed in HathiTrust’s policies on Access and Use.
15
Interface Overview
16
Full-text search Catalog search Pageturner Collection Builder
17
Other services Get bib records in bulk: Get datasets: Get high resolution image files: Get bib records for known identifiers: Get some data about all HT content:
18
For help See https://www.hathitrust.org/help
Contact
19
Questions?
20
BUT... HathiTrust is more than just a library!
21
HathiTrust Research Center
Goal: to build a secure environment and provide services to support data mining and text analysis Portal: Soon: analysis against the copyrighted content in HT Advanced Collaborative Support (ACS): mini-grants where awardees get staff time, not $ HathiTrust+Bookworm: visualize word trends Extracted Features dataset: bits of data about the content
22
US Federal Documents Program
Build a Registry of all known US fed gov docs Collect a complete corpus of all know US fed gov docs
23
Shared Print Program Build a shared print monograph program across the membership in order to reduce collective costs of maintaining print collections Goal: secure retention commitments for all monographs in HathiTrust
24
Copyright Review Management System
Volunteers from HT member libraries undertake manual copyright review of certain categories of materials To date, has focused on the following categories: Monographs published in Australia, the United Kingdom and Canada from Monographs published in the United States from
25
Members participate in other groups and committees
User Support Working Group Collections Committee Metadata Policy, Strategy, Use and Sharing Advisory Group (MUSAG) HathiTrust Quality Assurance and Standards Working Group
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.