HathiTrust Digital Library

Slides:



Advertisements
Similar presentations
Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
Advertisements

HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
HATHI TRUST A Shared Digital Repository Unpacking HathiTrusts New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011.
HathiTrust: A Big Idea with Bold Plans
HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.
HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,
HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.
What is HathiTrust and Why is it relevant to research libraries? Sourcing and Scaling brought to the collective collection.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
E-Content Service Group Virtual Meeting Digital Preservation: How to Get Started.
An update on Google Book search digitization at the University of Michigan … the agreement and plans for work between Google and the.
Pulling it all together… with thanks to Sheila Anderson.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
Digital Preservation and Trusted Digital Repositories Priscilla Caplan Florida Center for Library Automation ALA 2005 Chicago IL.
Preserving E-Prints: Scaling the Preservation Mountain Sheila Anderson, Arts and Humanities Data Service Stephen Pinfield, University of Nottingham.
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
What’s Next for HathiTrust?. We’re Growing Up! Partnership Arizona State University Baylor University Boston University California Digital Library Columbia.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
HATHITRUST A Shared Digital Repository HathiTrust as a Model for Preservation and Access Jeremy York Media Preservation Conference April 17, 2013.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
HATHITRUST A Shared Digital Repository Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association.
Digital & Preservation Resources Managing the digital collection life cycle.
Digital archival storage for the University of Michigan Library collections.
HATHITRUST A Shared Digital Repository HathiTrust METS and PREMIS October 25, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Constitutional Convention Session #2: Report on 3-year review and Q & A Ed Van Gemert, Chair, Strategic Advisory Board Patricia Cruse, Member,
HATHITRUST A Shared Digital Repository HathiTrust on the Move A Growing Partnership Taking Stock and Looking Ahead National Library of Medecine October.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Merrilee Proffitt e(X)literature / Digital Cultures Project April 2003 News from the Digital Library The Metadata Encoding and Transmission Standard; the.
HATHI TRUST A Shared Digital Repository Columbia University and HathiTrust Collaboration at a new level.
HATHITRUST A Shared Digital Repository HathiTrust Past, Present, and Future A Brief Introduction.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging.
HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Preserving Digital Collections for Future Scholarship Oya Y. Rieger Cornell University
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing.
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
Challenges and Opportunities for Academic Libraries Collaborative Imperatives to Support Collections, Digital Initiatives, and New Services for a Changing.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
CONTENT DISCOVERY, SERVICES, AND SUSTAINED ACCESS Timothy Cole, William Mischo, Beth Sandore, Sarah Shreeves ~ University of Illinois Library
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
HATHI TRUST A Shared Digital Repository Use of PREMIS for Internet Archive AIPs September 22, 2010.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
HathiTrust Digital Library Interface and Services
Building the Universal Library: Introducing HathiTrust
Robin Dale RLG OAIS Functionality Robin Dale RLG
digital archival storage
Presentation transcript:

HathiTrust Digital Library Cooperation for Preservation

Outline About HathiTrust Background What we do How we do it Mission & Goals Background What we do Services How we do it Governance Partnership & Resources Technology Future Directions

About

What is HathiTrust Shared Digital Repository Launched 2008 by 25 institutions (now 26) Initial focus on digitized book and journal content Expanding to non-book/non-journal, born digital “Light” archive Collaboration Preservation and access Print collections Local services Public Good

Background

History Michigan Digitization Project 2004 “…U of M shall have the right to use the U of M Digital Copy, in whole or in part at U of M's sole discretion, as part of services offered in cooperation with partner research libraries such as the institutions in the Digital Library Federation…”

History Collective Agreement with CIC Announced in June 2007 CIC agreed to establish a shared digital repository

CIC Shared Digital Repository History CIC Shared Digital Repository HathiTrust

The Partners When announced in October 2008, partners included: University of California system CIC (Committee on Institutional Cooperation) University of Virginia University of Chicago University of Illinois Indiana University University of Iowa University of Michigan Michigan State University University of Minnesota Northwestern University Ohio State University Pennsylvania State University Purdue University University of Wisconsin-Madison Columbia University

The Name The meaning behind the name Hathi (hah-tee)--Hindi for elephant Big, strong Never forgets, wise Secure Trustworthy

Content Distribution As of February 1: 5,323,716 - Total 764,481 - Public Domain

Content Growth

What we do

Services Bit-level preservation and migration Viewing Redistribution Long-term preservation Bit-level preservation and migration Access Viewing Redistribution Print disabilities Section 108 Rights management Rights database Copyright review Publish virtual collections Collection Builder Availability of data Metadata files Bib API Data API Google ingest Inbound validation Fixity checks Bibliographic search Temporary catalog Version 1 permanent catalog April 2010 Full-text search November 2009 Print on Demand UM public domain UM Press

How we do it

Strategic Advisory Board Governance Budget/Finances Decision-making Policy Planning Executive Committee Strategic Advisory Board HathiTrust

Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Director of Libraries, UW - Madison Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Library, LIT, UM

Strategic Advisory Board Ed Van Gemert (Chair), Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio)

Partnership & Resources (1) Funded for a initial 5 years with base-funding from partners Budget – separately held within UMich budget system, managed by the Executive Committee Cost Model – Per GB cost of storage per year with a one-time fee on new content to build a capital fund Review in 3rd yr of each 5 yr period

Partnership & Resources (2) Staff/Expertise – highly integrated Project managers, IT and communications staff, copyright experts, administrators (UM, Indiana and UC taking the lead) Working groups UM recently hired a Digital Preservation Librarian Shared development space

HathiTrust Functional Framework Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non-Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy  Financial contributions of partners HathiTrust Functional Framework

Partnership & Resources (3) Toward a Cloud Library CLIR, Mellon Foundation OCLC Research, NYU, HathiTrust, Recap Libraries Objective: Characterize the near-term opportunity for externalizing management of academic research collections leveraging capacity of large-scale shared print and digital repositories* Outcomes: opportunity and risk assessment based on aggregate collection analysis; draft service agreement enabling generic consumer library to selectively outsource preservation and access of low-use research collections to large-scale print and digital repositories *From the RLG Partner Update January 7, 2010

Partnership & Resources (4) CRL TRAC Audit Portico and HathiTrust assessments timely “Certification will augment CRL’s strategic archiving of print, and support a responsible transition to electronic-only formats where appropriate.” Work with UC to design shared print journal archiving effort “With this hybrid strategy CRL hopes to enable its community to accelerate the shift to electronic-only resources in a careful and responsible manner.” * http://www.crl.edu/archiving-preservation/digital-archives/certification-and-assessment-digital-repositories

Partnership & Resources (5) New cost model Based on benefits to institutions Public Domain In-copyright Volumes “held”

Partnership & Resources (6) Timeline: Implement in 2013 Accept new partners now with costs based on overlap calculations Requirements: Print holdings database Update mechanisms Manual remediation

Technology - OAIS ; Page Turner HathiTrust API MARC record extensions GeoIP DB CNRI Handles [Solr] MARC record extensions (Aleph) Rights DB GROOVE (JHOVE) Google [OCA] In-house Conversion ; GRIN Internal Data Loading METS object PNG OCR PDF METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums Isilon Site Replication TSM MD5 checksum validation

Technology – Architecture Inbound validation, standards-based object storage and related metadata Storage in Ann Arbor and Indianapolis Encrypted backup to 3rd location Rights database for rights metadata Online catalog as source and storage for descriptive metadata

Technology - Ingest Automatic validation in GROOVE Check barcode check digit using Luhn algorithm Fixity check on JPG2000, TIFF, UTF8 using MD5 Well-formedness and embedded metadata check on JPG2000, TIFF, UTF8 using JHOVE Creation of METS and PREMIS

Technology - Repository Isilon storage Simple filesystem layout One directory per volume, zip file and METS file Use of a namespace allows for conflicting identifiers Namespaces for institutions and, if needed, types of identifiers within the institution

Technology – METS Object Why METS? Can serve as Archival Information Package and a Dissemination Information Package Designed to record the relationship between pieces of complex digital objects Can be created automatically as texts are loaded or reloaded Preservation actions (PREMIS)

Technology – METS Object What’s there? metsHdr with an ID and CREATEDATE 2 dmdSecs: Marcxml and mdRef amdSec containing one techMD with PREMIS metadata fileSec with 4 fileGrps (zip, images, OCR, hOCR) Physical structMap tying together files with metadata (pg. numbers and features)

Future Directions

Future Directions (1) 3-year review SAB OCLC catalog Quality De-duplication TRAC compliance Current and ongoing areas Shibboleth Full-PDF Collection Builder Section 108 Users with print disabilities Non-Google print content IA-digitized locally-digitized Non-book/non-journal Audio pilot Images (maps) Born-digital Beginning to investigate ePub as a delivery format Openness Data API

Future Directions (2) Collaborative Development PageTurner Advanced search Search facets Collection Builder Fixity checking Isilon software June 2010 Large-scale Search CB Integration Index optimizing New hardware Ingest reporting Wisconsin Bibliographic management University of California Content validation Grant projects NSF EAGER Mellon Quality Usage reporting Partner Institutions Holdings database Data mining tools Research Center Data distribution Tools such as SEASR

Links Catalog, Full-text search, and Collection Builder http://catalog.hathitrust.org METS and PREMIS implementation http://www.hathitrust.org/preservation Technical profile: http://www.hathitrust.org/technology Technical flow diagram http://www.hathitrust.org/documents/HathiTrust-PASIG-200910.pdf http://www.hathitrust.org/documents/HathiTrust-PASIG-notes-200910.pdf Rights management http://www.hathitrust.org/rights_management TRAC http://www.hathitrust.org/accountability

Thank You! hathitrust-info@umich.edu jjyork@umich.edu http://www.hathitrust.org