HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust.

Slides:



Advertisements
Similar presentations
HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
HathiTrust Digital Library
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.
HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
HATHITRUST A Shared Digital Repository Update on Developments and Activities UM Selectors October 9, 2012 Jeremy York, Project Librarian, HathiTrust.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
What’s Next for HathiTrust?. We’re Growing Up! Partnership Arizona State University Baylor University Boston University California Digital Library Columbia.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
HATHITRUST A Shared Digital Repository HathiTrust as a Model for Preservation and Access Jeremy York Media Preservation Conference April 17, 2013.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
HATHITRUST A Shared Digital Repository Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association.
HATHITRUST A Shared Digital Repository Collective Stewardship through HathiTrust Digital Library African Studies in the Digital Age November 12, 2014 Mike.
HATHITRUST A Shared Digital Repository HathiTrust METS and PREMIS October 25, 2011 Jeremy York Project Librarian, HathiTrust.
HATHITRUST A Shared Digital Repository HathiTrust on the Move A Growing Partnership Taking Stock and Looking Ahead National Library of Medecine October.
HATHITRUST A Shared Digital Repository HathiTrust: A Second Life for Library Collections Jeremy York Exploring Humanities Cyberinfrastructure April 30,
HATHITRUST A Shared Digital Repository HathiTrust: The Collection and Its Uses NEFLIN Webinar - November 7, 2013 Jeremy York, Assistant Director, HathiTrust.
HATHITRUST A Shared Digital Repository A Preservation Infrastructure Built to Last: Preservation, Community, and HathiTrust UNESCO Memory of the World.
HATHITRUST A Shared Digital Repository How Can Digital Collections Support Shared Print Initiatives? The HathiTrust Print Monograph Archive Planning Task.
HATHITRUST A Shared Digital Repository Big Collections in an Era of Big Copyright: Practical Strategies for Making the Most of Digitized Heritage Jeremy.
HATHITRUST A Shared Digital Repository HathiTrust Overview: Partnership and Services Jeremy York Wesleyan University Web Presentation February 18, 2014.
HATHITRUST A Shared Digital Repository The HathiTrust Digital Repository: Under the hood SI 625 April 20, 2015 Jeremy York, Assistant Director, HathiTrust.
HATHITRUST A Shared Digital Repository Why Digitize? or The Limits of Preservation 2014 TEI/DHCS Plenary Session Evanston, IL Mike Furlough Executive Director,
HATHI TRUST A Shared Digital Repository Columbia University and HathiTrust Collaboration at a new level.
HATHITRUST A Shared Digital Repository HathiTrust Past, Present, and Future A Brief Introduction.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense.
HathiTrust – How To By Dr. Rob McGeachin 20 th Annual AgNIC Meeting May 7, 2015.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging.
Overview of the Google Books digitized from the University of Michigan Library collection: its impact to Korean Studies scholars -- Yunah Sung, University.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing.
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
An Introduction to METS Morgan Cundiff Network Development and MARC Standards Office Library of Congress Metadata Encoding and Transmission Standard.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
Implementation of PREMIS in METS Rebecca Guenther Sr. Networking & Standards Specialist, Library of Congress PREMIS Implementation Fair San.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HathiTrust: Possibilities Metadata Working Group Cornell University Library March 21, 2014.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
Collaboration: to work jointly with others towards a common goal Or the whole is greater than the sum of its parts Lisa B. German Library Faculty Organization.
HathiTrust: A valuable and visionary Partnership.
CENTRAL/WESTERN MASSACHUSETTS AUTOMATED RESOURCE SHARING Digitization GOALS & THEIR LOGISTICS Michael J. Bennett Digital Initiatives Librarian C/WMARS,
HATHITRUST A Shared Digital Repository The HathiTrust Digital Repository: Under the hood SI 625 April 20, 2015 Jeremy York, Assistant Director, HathiTrust.
HATHITRUST A Shared Digital Repository ALA CopyTalk: CRMS The Copyright Review Management System September 1, 2016 Melissa Levine, Lead Copyright Officer,
HathiTrust Digital Library Interface and Services
HathiTrust Copyright Review
Building the Universal Library: Introducing HathiTrust
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Presentation transcript:

HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust

Partnership Arizona State University Baylor University Boston University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Getty Research Institute Harvard University Library Indiana University Johns Hopkins University Lafayette College Library of Congress Massachusetts Institute of Technology McGill University Michigan State University New York Public Library New York University North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of Arizona University of Calgary University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Connecticut University of Florida University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Miami University of Michigan University of Minnesota University of Missouri University of Nebraska-Lincoln The University of North Carolina at Chapel Hill University of Notre Dame University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin- Madison Utah State University Yale University Library

Digital Repository Launched 2008 Initial focus on digitized book and journal content “Light” archive – As accessible as possible within the bounds of law

The Name The meaning behind the name – Hathi (hah-tee)--Hindi for elephant – Big, strong – Never forgets, wise – Secure – Trustworthy

Content 9,728,814 Total volumes 2,654,979 “Public domain” 5,164,532 Book titles 256,874 Serial titles * As of November 5, 2011

Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge

Collections and Collaboration Comprehensive collection -Preservation…with Access Shared strategies – Collection management, development – Copyright – Preservation (digital and print) – Bibliographic Indeterminacy – Discovery / Use – Efficient user services Public Good

Descriptive headings added (hidden from GUI with CSS) Info about SSD service & link to accessibility page Images used for style are in css so no need to use alt tags Skip navigation link Access keys for navigating pages with keyboard Added labels & descriptive titles to forms & ToC table

Type of work Search – Bib and Full text ViewFull-PDF download Print on Demand Print disabilities Section 108 (preservation uses) Public domain worldwide World World if no restrictions, Partners if restrictions WorldPartners worldwide N/A Public domain in the US WorldUSUS if no restrictions, US partners if restrictions USUS Partners N/A Open Access (+Creative Commons) World World if no restrictions World with permission Partners worldwide if no restrictions N/A In copyright (and undetermin ed) WorldNot available Partners US and worldwide, where applicable Access Matrix

Technical Infrastructure

Repository Philosophy/Design OAIS/TRAC Consistency Standardization Simplicity (in design, not function) Practicality Sustainability

Content Largely uniform in technical characteristics 4 formats – ITU G4 TIFF – JPEG2000 – JPEG – Unicode (with and without coordinates)

Object Package images bib data bib data Source METS text HT METS Zip

Bibliographic Data – Must be present prior to content ingest – MARCXML, as complete as possible Content – Pre-ingest – Ingest Ingest

Ingest (2) Pre- ingest SIP Backend servers GROOVE Validation METS creation Package creation Package creation Handle creation Handle creation - Evaluation - Determination of standards - Modification / Transformation - Ensure conformance - Barcode - Fixity - Consistency - Well-formedness - Prepare archival package Bibliographic data Content

Archival Storage Reliability – ensure integrity Redundancy – in single and multiple sites Scalability – including ease of management Accessibility – for repository processes and services Platform-independence – for data/object management

Media & Architecture Michigan Indiana Tape Backup Archival Storage Isilon Systems Load balancing and failover Ingest at Michigan, replicated to Indiana Replacement on 3-4 year cycle

Architecture & Management images bib data bib data Source METS text HT METS../uc1/pairtree_root/b3/54/34/86/b b zip b mets.xml Example ids: wu mdp uc2.ark:/1390/t miua.aaj

Data Management Rights Determination Rights Database Bibliographic Management System Copyright Review Management System - Inventory - Loading and updating records - Duplicate detection and collation - Solr indexes behind VuFind catalog - Source of information for Access services - Rights determination (automated and support for manual review) Holdings Database

Rights Database System of precedence 15 attributes 15 reason codes Bibliographic (automatic) Manual 1.Conformance with formalities 2.Contractual agreements 3.Access control overrides Manual 1.Conformance with formalities 2.Contractual agreements 3.Access control overrides

Print Holdings Database Volumes institutions own or have owned – For monographic holdings – Only print volumes (not microform, etc.) – OCLC number [required] – Bib record ID [required] – Enumeration/chronology, if available – Condition (e.g., brittle) [optional] – Holding Status (e.g., current holding, withdrawn, missing, etc.) [optional] – For serial holdings -OCLC number [required] -Bib record ID [required] -ISSN, if available

Access Rights Database Michigan Indiana Data Management Archival Storage Tab-delimited Metadata files Rights Determin ation Bibliographic Management Full text Index VuFind Index Bibliographic Catalog Bibliographic API OAI sets Full text Search application PageTurner Data API Collection Builder Holdings Database

Content Access Rights Database Michigan Indiana Data Management Archival Storage Tab-delimited Metadata files Rights Determin ation Bibliographic Management Bibliographic Catalog Bibliographic API OAI sets Full text Search application PageTurner Data API Collection Builder Full text Index VuFind Index Holdings Database

Search and Aggregation Access Rights Database Michigan Indiana Data Management Archival Storage Tab-delimited Metadata files Rights Determin ation Bibliographic Management Bibliographic Catalog Bibliographic API OAI sets Full text Search application PageTurner Data API Collection Builder Full text Index VuFind Index Holdings Database

Metadata Access Rights Database Michigan Indiana Data Management Archival Storage Tab-delimited Metadata files Rights Determin ation Bibliographic Management Bibliographic Catalog Bibliographic API OAI sets Full text Search application PageTurner Data API Collection Builder Full text Index VuFind Index Holdings Database

Object Package images bib data bib data Source METS text HT METS Zip

METS Object Why METS? – Can serve as Archival Information Package and a Dissemination Information Package – Designed to record the relationship between pieces of complex digital objects – Can be created automatically as texts are loaded or reloaded – Preservation actions (PREMIS)

Metadata Details and specifications at repository level – Object specifications / Validation criteria – Page-tagging Variations at object level – Files missing – Non-valid files – Incorrect file checksums

HathiTrust METS Contains regularized information that is generally applicable to items across the repository, not specific to a particular source, that we can see a current or near-term use for. This information is fundamentally valuable for understanding or using the preserved object in preservation activities after deposit, or in the access and display environments, including the APIs.

Source METS Contains information that may be valuable for preservation or archaeology, but is subjective (descriptive, e.g., bibliographic data, page-tags), idiosyncratic, or we do not have a clear idea of its use and/or application. The information could be used to enhance knowledge of about the core files, but is not fundamentally valuable for understanding or using the preserved object in the repository. Is a “parking lot” for information we are getting that may be useful in the future. The desire not to touch things after they entire the repository might result in information that might be included in the Source METS being stored in other ways (e.g., in-repository fixity checks)

HathiTrust METS (2) What’s there? – 2 dmdSecs: Marcxml and mdRef – amdSec containing one techMD with PREMIS metadata – fileSec with 4 fileGrps (zip, images, OCR, hOCR) – Physical structMap tying together files with metadata (pg. numbers and features) – METS Creation (Google) | Example METS Creation (Google)Example – METS Creation (IA) | Example METS Creation (IA)Example – HathiTrust METS Profile HathiTrust METS Profile

Source METS (2) What’s there? – dmdSecs – amdSec – fileSec (coordOCR, OCR, images…) – Physical structMap tying together files with metadata (pg. numbers and features) Source METS example (Google) Source METS example (IA) Source METS Creation

Vocabularies PREMIS Pagetag mapping

Pagetag Mapping (Google)

Pagetag Mapping (IA)

Pagetag Mapping (DLPS)

Change Management PREMIS 2.1 “uplift” Add – Reading order – Explicitly record page insertions – Deletion PREMIS event – PREMIS event to mark move to PREMIS 2.1 – Reference to Source METS – Scheme to identify "version" of METS files – Preservation levels (e.g., for PDF/A and PDF) – New method of coding PDFs in the METS Remove – MARC metadata (pending approval of UC) – References to pagedata and notes.txt PREMIS 2.1 example

How to find out more Website “About” section – Twitter – Monthly newsletter – – (RSS) Contact us – –

Thank you!