Download presentation
Presentation is loading. Please wait.
Published byVictoria Richard Modified over 6 years ago
1
Building the Universal Library: Introducing HathiTrust
Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries December 8, 2008
2
The Vision Universal Digital Library Common Goal
Single Entity but Partnership of Many Libraries
3
The Reasons Google Digitization Project
Collective Agreement with CIC Announced in June 2007 U of Michigan and U of Wisconsin Projects already underway
4
The Reasons Librarians value preservation
How to ensure digital files are preserved?
5
The Reasons Librarians value access Librarians believe in cooperation
How to create a comprehensive and coherent body of materials? Librarians believe in cooperation How do you achieve a common goal?
6
The Beginning In 2007, CIC agreed to establish a shared digital repository University of Michigan and Indiana University initial leaders of this effort
7
CIC Shared Digital Repository
The Beginning CIC Shared Digital Repository HathiTrust
8
The Name The name… hathitrust.org hathi.org olifant.org silverback.org
kingkong.org toomai.org
9
The Name The meaning behind the name
Hathi (hah-tee)--Hindi for elephant Big, strong Never forgets, wise Secure Trustworthy
10
Banking Analogy
11
The Logo
12
The Partners When announced in October 2008, full partners included:
University of California system CIC (Committee on Institutional Cooperation) University of Virginia University of Chicago University of Illinois Indiana University University of Iowa University of Michigan Michgian State University University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison
13
The Universal Bookstore
The Differences vs. The Universal Bookstore The Universal Library
14
Sorting the Issues Cost Model
Partners charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of those volumes.
15
Sorting the Issues Governance HathiTrust Operational Advisory Board
Executive Management Group Strategic Advisory Board
16
Sorting the Issues Impact of Google settlement
Full access to materials More quickly than a court Win would have permitted content locked up for years
17
HathiTrust Architecture
Storage in Ann Arbor and Indianapolis Encrypted backup to 2nd AA location Inbound validation, standards-based object storage and related metadata Rights database for rights metadata Online catalog as source and storage for descriptive metadata
18
Page image and metadata repository
Objectives: A guiding principle: store archival images, create deliverables on demand Incorporate TDR-specific practices Simple filesystem layout using Pairtree structure One directory per volume, all files inside zip w/associated METS file Use of a namespace allows for conflicting identifiers Namespaces for institutions and, if needed, types of identifiers within the institution
19
Rights database, pt1 © What information to store?
Considered complexity and maintenance Considered using MARC directly Needed to accommodate both bib record-derived rights and manual overrides Approach: examine bib record, determine authoritative copyright status, store rights attribute, source, reason, and timestamp Stored in MySQL
20
Rights database, pt. 2 © Each rights attribute must have a reason.
bib: bibliographically-derived man: manual access control override ddd: due diligence documented Typical rights attributes in use pd: public domain pdus: public domain for US viewers* inc: in copyright nobody (override): no access Source (e.g.,‘google’)
21
Pageturner: page image retrieval
XML rights database GeoIP database XSLT archival page image HTML library catalog metadata online page image METS XML browser
22
HathiTrust and TRAC Automatic validation in GROOVE
Check barcode check digit using Luhn algorithm Fixity check on JPG, TIFF, UTF8 using MD5 Well-formedness and embedded metadata check on JPG, TIFF, UTF8 using JHove Various completeness cross-checks Failures retried, admin will eventually intervene Periodic fixity checks using MD5
23
OAIS Reference Model Page Turner HathiTrust API MARC record extensions
GeoIP DB CNRI Handles [Solr] MARC record extensions (Aleph) Rights DB GROOVE (JHOVE) Google [OCA] In-house Conversion GRIN Internal Data Loading METS object PNG OCR PDF METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums Isilon Site Replication TSM MD5 checksum validation
24
METS Object Why METS? Can serve as an Archival Information Package and a Dissemination Information Package Designed to record the relationship between pieces of complex digital objects Can be created automatically as texts are loaded or reloaded
25
METS Object What’s there? metsHdr with an ID and CREATEDATE
dmdSec with a URL Two techMD referencing notes files Two fileGrps (images and OCR) Physical structMap tying together the files with any metadata (pg. numbers or features)
26
HathiTrust Services Preservation of digital surrogate
Access (within bounds of law and settlement) Viewing Redistribution Services for print-disabled users Section 108 Non-consumptive research
27
HathiTrust Branding
28
Legal Status of the Books
Outside of the Settlement Public domain content digitized by libraries unconstrained Libraries continue to do preservation-related work with in-copyright works (Sec108) Settlement LDC or cooperative LDC (HathiTrust) Services for print-disabled users Non-consumptive research Section 108 uses General discovery Sharing of Public domain
29
HathiTrust Future Expansion of partnership New services
Revision of governance Refinement of content
30
Contacts, etc. http://www.HathiTrust.org (see sitemap)
Patricia Steele John Wilkin
31
Digital library for the future
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.