HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.

Slides:



Advertisements
Similar presentations
HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
HathiTrust Digital Library
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
HATHI TRUST A Shared Digital Repository Unpacking HathiTrusts New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011.
HathiTrust: A Big Idea with Bold Plans
HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,
HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.
HathiTrust and Print Storage Building around a digital core.
What is HathiTrust and Why is it relevant to research libraries? Sourcing and Scaling brought to the collective collection.
How HathiTrust Serves the UC Community Users Council May 21, 2012 Heather Christenson, California Digital Library.
Building the Universal Library: Introducing HathiTrust Patricia A. Steele Indiana University Libraries John Price Wilkin University of Michigan Libraries.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
HathiTrust Digital Library: Enrich Your Research and Scholarship Doreen Bradley Chris Powell University Library May 2011.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
What’s Next for HathiTrust?. We’re Growing Up! Partnership Arizona State University Baylor University Boston University California Digital Library Columbia.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
IMLS National Leadership Grant: CRMS World Bobby Glushko University of Michigan Copyright Office.
HATHITRUST A Shared Digital Repository HathiTrust as a Model for Preservation and Access Jeremy York Media Preservation Conference April 17, 2013.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
HATHITRUST A Shared Digital Repository Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association.
HATHITRUST A Shared Digital Repository HathiTrust METS and PREMIS October 25, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Constitutional Convention Session #2: Report on 3-year review and Q & A Ed Van Gemert, Chair, Strategic Advisory Board Patricia Cruse, Member,
HATHITRUST A Shared Digital Repository HathiTrust on the Move A Growing Partnership Taking Stock and Looking Ahead National Library of Medecine October.
HATHITRUST A Shared Digital Repository HathiTrust: A Second Life for Library Collections Jeremy York Exploring Humanities Cyberinfrastructure April 30,
HATHITRUST A Shared Digital Repository HathiTrust: The Collection and Its Uses NEFLIN Webinar - November 7, 2013 Jeremy York, Assistant Director, HathiTrust.
HATHITRUST A Shared Digital Repository A Preservation Infrastructure Built to Last: Preservation, Community, and HathiTrust UNESCO Memory of the World.
HATHITRUST A Shared Digital Repository How Can Digital Collections Support Shared Print Initiatives? The HathiTrust Print Monograph Archive Planning Task.
HATHITRUST A Shared Digital Repository Big Collections in an Era of Big Copyright: Practical Strategies for Making the Most of Digitized Heritage Jeremy.
HATHITRUST A Shared Digital Repository HathiTrust Overview: Partnership and Services Jeremy York Wesleyan University Web Presentation February 18, 2014.
HATHITRUST A Shared Digital Repository Why Digitize? or The Limits of Preservation 2014 TEI/DHCS Plenary Session Evanston, IL Mike Furlough Executive Director,
HATHI TRUST A Shared Digital Repository Columbia University and HathiTrust Collaboration at a new level.
HATHITRUST A Shared Digital Repository HathiTrust Past, Present, and Future A Brief Introduction.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense.
High Water Raises All Boats Leveraging Partnerships on Campus to Build a Repository Mary Molinaro University of Kentucky Libraries.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging.
HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing.
Breana McCracken University of Illinois at Urbana-Champaign HathiTrust and Copyright Future Implications - Strong precedent for libraries to continue to.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
H ATHI T RUST HTTP :// WWW. HATHITRUST. ORG Large-Scale Digital Initiatives and their potential impact on the Maine Shared Collections Strategy Colby College.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution–Noncommercial–Share.
E-books and E-Journals in US University Libraries: Current Status and Future Prospects James Michalko Vice President, OCLC Research Symposium Keio University.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
Collaboration: to work jointly with others towards a common goal Or the whole is greater than the sum of its parts Lisa B. German Library Faculty Organization.
HathiTrust: A valuable and visionary Partnership.
HathiTrust Digital Library Interface and Services
HathiTrust Copyright Review
Building the Universal Library: Introducing HathiTrust
Presentation transcript:

HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011

Outline Overview Mission and Goals Content Services Governance, how the partnership operates Partnership Changing Library Landscape

About

Current Partners Arizona State University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley Davis Irvine Los Angeles Merced Riverside San Diego San Francisco Santa Barbara Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin- Madison Utah State University Yale University Library HathiTrust Community

Mission To contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge Mission and Goals

Universal Library Common Goal Single Entity, Many Partners HathiTrust

Goals Comprehensive collection Preservation…with Access Shared strategies – Collection management, development – Preservation – Copyright – Efficient user services Openness Mission and Goals

Content

What is in HathiTrust? 8,625,158 Total volumes 2,297,041 Public Domain 4,722,664 Book titles 209,930 Serial titles * As of May 1, 2011

Content Sources * As of May 1, 2011

Content Distribution * As of May 1, 2011

Dates * As of May 1, 2011 Statistics and Visualizations

Breakdown of HathiTrust book corpus by publication date Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building Bibliographic Indeterminacy and the Scale of Problems and Opportunities of "Rights" in Digital Collection Building – 2/2011

Breakdown of HathiTrust book corpus by publication date

Language Distribution (1) The top 10 languages make up ~86% of all content Statistics and Visualizations * As of May 1, 2011

Language Distribution (2) The next 40 languages make up ~13% of total Statistics and Visualizations * As of May 1, 2011

Content over time * As of May 1, 2011

Content Growth

A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

Services

Services (1) Ingest – Book and Journal content Google Internet Archive In-house, other vendor digitization – Images, Audio, Born digital (coming soon…) Two parts – Bibliographic Data – Content Getting Content Into HathiTrustGetting Content Into HathiTrust | Building a Future by Preserving our PastBuilding a Future by Preserving our Past

Services (2) Long-term preservation – Bit-level, migration – Standard and open formats (ITU G4 TIFF, JPEG2000, JPG, Unicode) – Validation, integrity, redundancy – OAIS How reliable is it? – DRAMBORA, TRAC PreservationPreservation | Technology | TRACTechnologyTRAC

Technology - OAIS GRIN Internal Data Loading GRIN Internal Data Loading Google Internet Archive In-house Conversion Google Internet Archive In-house Conversion MARC record extensions (Aleph) Rights DB MARC record extensions (Aleph) Rights DB Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] Page Turner HathiTrust API OAI GeoIP DB CNRI Handles [Solr] METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS/PREMIS object TIFF G4/JPEG2000 OCR MD5 checksums METS object PNG OCR PDF METS object PNG OCR PDF Isilon Site Replication TSM MD5 checksum validation Isilon Site Replication TSM MD5 checksum validation GROOVE (JHOVE) GROOVE (JHOVE) ; Technology

Quality Partner Digitization Google Digitization Quality work / Volume certification Quality

Services (3) Preservation…with Access – As part of preservation, service to partners, and as public good – Discovery Bibliographic (temporary catalog, OCLC/HathiTrust catalog) Full-text – Reading Interface optimized for users with print disabilities – Collections Searching, Reading, and Building Collections

Type of work Search – Bib and Full text ViewFull-PDF download Print on Demand Print disabilities Section 108 (preservation uses) Public domain worldwide World World if no restrictions, Partners if restrictions WorldPartners worldwide N/A Public domain in the US WorldUSUS if no restrictions, US partners if restrictions USUS Partners N/A Open Access (+Creative Commons) World World if no restrictions World with permission Partners worldwide if no restrictions N/A In copyright (and undetermin ed) WorldNot available Partners US and worldwide, where applicable Access Matrix

Services (4) Rights Management – Rights Database – Copyright review IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and staff members, 4 institutions – Indiana University – University of Michigan – University of Minnesota – University of Wisconsin 125k reviewed through CRMS 67,000 (54%) in public domain Copyright

Copyright status of books published pre-1923 and US works published

Services (5) Data Availability – Tab-delimited inventory files – Bibliographic API – Data API – OAI feed of public domain – SFX target – Summon HathifilesHathifiles | Data Distribution and APIsData Distribution and APIs

Services (6) Collaborative Development Environment – Active repository development Support for Computational Research – Datasets 120,000-volume set Google-digitized public domain – Protocol-based access – Research Center Datasets

How Different from Google? Preservation Content Collective work Uses of materials Own trajectory Partnership – Not just about digital content or repository – Address challenges – Fulfill mission – Provide services for our communities

Governance and Work

Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning Governance

Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM Executive Committee

Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University Strategic Advisory Board

Constitutional Convention October 2011 Delegates from each institution and consortium – Carry certain number of votes determined according to formula approved by Executive Committee 3-year review Proposals – Print management – Ballot proposals

How does work get done? Collective work – e.g., working groups – Perform the work of the partnership – Now 40+ people across partner institutions Distributed work – Driven by needs of institutions – able to leverage across the partnership – Projects, e.g. grant work, ingest specifications, page-turner, bibliographic data management Leverage expertise across institutions Working Groups and Committees Working Groups and Committees | ProjectsProjects

Working Groups (1) Operational focus – Appointed by Executive Director in coordination with Executive Committee – Current Usability User Support Communications – Previous Development Environment Storage Research Center

Working Groups (2) Planning or Exploratory focus – Appointed by Strategic Advisory Board – Recommendations reviewed by SAB and XCom; may call for subsequent implementation Collections Committee Surrogates Quality, Ingest, and Error rate Discovery

How is work prioritized? Initial functional objectives Collective processes – Working groups and committees Functional ObjectivesFunctional Objectives | Working Groups and CommitteesWorking Groups and Committees

e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework Functional Framework

Partnership

Who can become a partner? – Institutions worldwide – Libraries with print holdings Eligibility and Agreements

What are the benefits? (1) Cost-effective long-term preservation and access services for digitized content – Commitments on digital content facilitate decisions about digitization efforts and print collection management For those with content, immediately offering long-term preservation, bibliographic and full-text search, collection-building With content or not, full viewing and downloading capabilities for public domain materials and materials for which we have received permissions Features and Benefits Features and Benefits | New Cost Model FAQNew Cost Model FAQ

What are the benefits? (2) Specialized access to public domain and in-copyright materials for users with print disabilities Other lawful uses of in copyright materials such as Section 108 uses (print replacement copies, digital access to applicable works) HathiTrust encourages participation in initiatives and resources geared toward – Shared collection development and management (e.g., copyright review work, print holdings database, de-duplication, collaboration with other organizations and initiatives) – Participation in governance and collaborative initiatives – Defining future directions of the shared library.

Whats involved? Contract – Sustaining – Content-Contributing Yearly fees Commitment – 5-year periods Shibboleth Print Holdings

How much does it cost? (1) Cost

How much does it cost? (2) $0.149/volume/year for Google-digitized $0.489/volume/year for IA-digitized $0.154/volume/year for all content $3.40 per GB

How does it work? (1) Sustaining membership is base – Pricing model for all partners beginning 2013 – Based on overlap of HathiTrust volumes with institutions print holdings – Share in infrastructure costs for public domain volumes: (PD*X*C)/N – Share in infrastructure costs for in copyright volumes based on holdings For a given in­copyright volume: IC=(C*X)/H

How does it work? (2) Main factors in costs are – Amount of content – Number of partners – Also a flexible multiplier designed to pay for programmatic activities Tend to result in lower costs and more benefits over time

How does it work? (3) In order to support these calculations – Need print holdings database (2013) – Update mechanisms – Manual remediation Using estimates currently – Based on infrastructure costs of anticipated content – Estimated partnership growth – Institution total volume counts Cost

How does it work? (4) Does not exclude contribution of content If contribute content, costs covered up to amount that would be paid as Sustaining partner – Barring additional costs that might be needed to accommodate content (e.g., specialized load routines, generation of OCR) Above that, pay per-GB cost ($3.40)

How does it work? (5) Partners share in costs of sustaining common resource Share in uses of relevant materials Voice in future directions Costs to institutions go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation Free riders?

Changing Library Landscape Rapidly changing landscape Libraries are making these decisions but they are more and more collective decisions We cannot afford anymore to do work separately that could be done collaboratively

HathiTrust overall benefits to libraries Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability, use – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems

How to find out more Web site About section: Twitter: Monthly newsletter: RSS: Contact us: Soon: Facebook, blog

Thank you very much