HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011.

Slides:



Advertisements
Similar presentations
HathiTrust Unless otherwise noted, these slides and their contents are licensed under a Creative Commons Attribution Unported License.
Advertisements

Beyond the Google Book: the Future of the Digital Library Cory Snavely Library IT Core Services manager University of Michigan April 20, 2010.
HathiTrust Digital Library
HATHI TRUST A Shared Digital Repository Building A Future By Preserving Our Past The Preservation Infrastructure of HathiTrust Digital Library Jeremy York.
HATHI TRUST A Shared Digital Repository HathiTrust Digital Library Is There A Past In Your Future? Princeton University February 2010.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
HathiTrust: Building the Universal Collection John Wilkin 18 May 2009.
This Library Never Forgets Preservation, Cooperation, and the Making of HathiTrust Digital Library Jeremy York Project Librarian HathiTrust Digital Library.
HATHI TRUST A Shared Digital Repository Unpacking HathiTrusts New Cost Model Jeremy York Project Librarian, HathiTrust SUNY July 15, 2011.
HathiTrust: A Big Idea with Bold Plans
HATHI TRUST A Shared Digital Repository HathiTrust Open Webinar Jeremy York Project Librarian, HathiTrust May 3 and 5, 2011.
Building the Universal Library: The Promise and Challenges of HathiTrust John Wilkin 2 April 2009.
HathiTrust Sharing a Federal Print Repository: Issues and Opportunities May 25, 2011 Heather Christenson.
HATHI TRUST A Shared Digital Repository Digital Preservation, HathiTrust, and the Reimagination of the Library Landscape Jeremy York Iceland August 5,
HATHI TRUST A Shared Digital Repository HathiTrust How We Can Make A Difference Jeremy York Yale University November 3, 2010.
What is HathiTrust and How Can it Make a Difference? Sourcing and Scaling brought to the collective collection.
HATHI TRUST A Shared Digital Repository HathiTrust 101 John Wilkin and Jeremy York August 27, 2010.
HathiTrust and Print Storage Building around a digital core.
What is HathiTrust and Why is it relevant to research libraries? Sourcing and Scaling brought to the collective collection.
How HathiTrust Serves the UC Community Users Council May 21, 2012 Heather Christenson, California Digital Library.
HATHI TRUST A Shared Digital Repository HathiTrust, Collections, and Collaboration COLD 2011 Spring Meeting Jeremy York May 20, 2011.
HATHITRUST A Shared Digital Repository HathiTrust Outside-In University of Michigan Law School June 14, 2011 Jeremy York HathiTrust Project Librarian.
KAT HAGEDORN HATHITRUST SPECIAL PROJECTS COORDINATOR UNIVERSITY OF MICHIGAN LIBRARIES OCTOBER 9, 2009 Seamless Sharing: NYU, HathiTrust, ReCAP and the.
Digital Preservation A Matter of Trust. Context * As of March 5, 2011.
HathiTrust Digital Library: Enrich Your Research and Scholarship Doreen Bradley Chris Powell University Library May 2011.
HathiTrust and the Ecology of Shared Collections Paul N. Courant 21 May 2009.
HATHITRUST A Shared Digital Repository We’re Preserving the Past, What About the Present? NISO Webinar: Ensuring the Preservation of E-Books May 23, 2012.
What’s Next for HathiTrust?. We’re Growing Up! Partnership Arizona State University Baylor University Boston University California Digital Library Columbia.
HATHITRUST A Shared Digital Repository HathiTrust current work, challenges, and opportunities for public libraries Creating a Blueprint for a National.
IMLS National Leadership Grant: CRMS World Bobby Glushko University of Michigan Copyright Office.
HATHITRUST A Shared Digital Repository HathiTrust as a Model for Preservation and Access Jeremy York Media Preservation Conference April 17, 2013.
HATHI TRUST A Shared Digital Repository Digital Repositories for Preservation and Access Digital Directions 2013 Jeremy York July 22, 2013 Unless otherwise.
HATHITRUST A Shared Digital Repository Bibliographic Metadata and HathiTrust ALCTS CaMMS Catalog Management Interest Group Meeting American Library Association.
HATHITRUST A Shared Digital Repository HathiTrust METS and PREMIS October 25, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Constitutional Convention Session #2: Report on 3-year review and Q & A Ed Van Gemert, Chair, Strategic Advisory Board Patricia Cruse, Member,
HATHITRUST A Shared Digital Repository HathiTrust on the Move A Growing Partnership Taking Stock and Looking Ahead National Library of Medecine October.
HATHITRUST A Shared Digital Repository HathiTrust: A Second Life for Library Collections Jeremy York Exploring Humanities Cyberinfrastructure April 30,
HATHITRUST A Shared Digital Repository HathiTrust: The Collection and Its Uses NEFLIN Webinar - November 7, 2013 Jeremy York, Assistant Director, HathiTrust.
HATHITRUST A Shared Digital Repository A Preservation Infrastructure Built to Last: Preservation, Community, and HathiTrust UNESCO Memory of the World.
HATHITRUST A Shared Digital Repository How Can Digital Collections Support Shared Print Initiatives? The HathiTrust Print Monograph Archive Planning Task.
HATHITRUST A Shared Digital Repository HathiTrust Overview: Partnership and Services Jeremy York Wesleyan University Web Presentation February 18, 2014.
HATHITRUST A Shared Digital Repository Why Digitize? or The Limits of Preservation 2014 TEI/DHCS Plenary Session Evanston, IL Mike Furlough Executive Director,
HATHI TRUST A Shared Digital Repository Columbia University and HathiTrust Collaboration at a new level.
HATHITRUST A Shared Digital Repository HathiTrust Past, Present, and Future A Brief Introduction.
HATHITRUST A Shared Digital Repository More, Better, Together: HathiTrust Accomplishments and Aspirations The Researcher of Tomorrow Universidad Complutense.
High Water Raises All Boats Leveraging Partnerships on Campus to Build a Repository Mary Molinaro University of Kentucky Libraries.
HATHITRUST A Shared Digital Repository HathiTrust: Putting Research in Context HTRC UnCamp September 10, 2012 John Wilkin, Executive Director, HathiTrust.
HATHITRUST A Shared Digital Repository Collaborating Globally, Planning Locally HathiTrust and New Opportunities in Collection Management GWLA/UNM: Emerging.
HATHITRUST A Shared Digital Repository HathiTrust Infrastructure and Information Organization November 7, 2011 Jeremy York Project Librarian, HathiTrust.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Rapidly Developing Mass Digitization and the Future of the University Library James Michalko Vice President, OCLC Research Keio University 6 October 2010.
Looking to the East: Challenges in Connecting Asian Libraries in the World of Information Karen T. Wei University of Illinois at Urbana-Champaign Hong.
HATHITRUST A Shared Digital Repository HathiTrust: Key Concepts and Issues in Managing the Digital Archive ICPSR Summer Workshop “Curating and Managing.
Next Generation Technical Services Rethinking Library Technical Services for the University of California R Bruce Miller.
HATHITRUST A Shared Digital Repository HathiTrust and TRAC DigitalPreservation 2012 July 25, 2012 Jeremy York, Project Librarian, HathiTrust.
Challenges and Opportunities for Academic Libraries Collaborative Imperatives to Support Collections, Digital Initiatives, and New Services for a Changing.
HathiTrust’s Past, Present and Future. Short- and Long-term Functional Objectives Short-term Page turner mechanism (and Mobile!) Branding (overall initiative;
Sakaibrary Project Update: Subject Research Guides and Next Steps Jon Dunn Indiana University July 2, 2008.
E-books and E-Journals in US University Libraries: Current Status and Future Prospects James Michalko Vice President, OCLC Research Symposium Keio University.
HATHITRUST A Shared Digital Repository HathiTrust and the Future of Research Libraries American Antiquarian Society March 31, 2012 Jeremy York, Project.
HATHITRUST A Shared Digital Repository Your Library, Now Online! Putting HathiTrust in the Context of Traditional (and New) Library Services MCLS Webinar.
HATHITRUST A Shared Digital Repository Institution Uses of HathiTrust Jeremy York University of Maine May 24, 2013.
HathiTrust: Collaboration in Building the Universal Collection John Wilkin 1 October 2009.
HATHITRUST A Shared Digital Repository HathiTrust Large Digital Libraries: Beyond Google Books Modern Language Association January 5, 2012 Jeremy York,
Barbara Preece ICOLC, April Mark Sandler Center for Library Initiatives Chicago Illinois Indiana Iowa Michigan Michigan State Minnesota Northwestern.
Collaboration: to work jointly with others towards a common goal Or the whole is greater than the sum of its parts Lisa B. German Library Faculty Organization.
HathiTrust: A valuable and visionary Partnership.
HathiTrust--a GovDocs Repository? Brian Vetruba, Catalog Librarian/Germanic Studies Librarian Washington University in St. Louis Leveraging.
HATHITRUST A Shared Digital Repository ALA CopyTalk: CRMS The Copyright Review Management System September 1, 2016 Melissa Levine, Lead Copyright Officer,
HathiTrust Digital Library Interface and Services
Presentation transcript:

HATHI TRUST A Shared Digital Repository HathiTrust Overview Julie Bobay, Heather Christenson, and John Wilkin April 12, 2011

HathiTrust Overview Our organization and how it functions Our HathiTrust collection Perspectives on HathiTrust and public services Leveraging HathiTrust data How HathiTrust can make a difference How to find out more

Universal Library Common Goal Single Entity, Many Partners HathiTrust

Current Partners Arizona State University Baylor University California Digital Library Columbia University Cornell University Dartmouth College Duke University Emory University Harvard University Library Indiana University Johns Hopkins University Library of Congress Massachusetts Institute of Technology Michigan State University New York University New York Public Library North Carolina Central University North Carolina State University Northwestern University The Ohio State University The Pennsylvania State University Princeton University Purdue University Stanford University Texas A&M University Universidad Complutense de Madrid University of California Berkeley University of California Davis University of California Irvine University of California Los Angeles University of California Merced University of California Riverside University of California San Diego University of California San Francisco University of California Santa Barbara University of California Santa Cruz The University of Chicago University of Illinois University of Illinois at Chicago The University of Iowa University of Maryland University of Michigan University of Minnesota The University of North Carolina at Chapel Hill University of Pennsylvania University of Pittsburgh University of Utah University of Virginia University of Washington University of Wisconsin-Madison Utah State University Yale University Library

Governance HathiTrust Executive Committee Strategic Advisory Board Budget/Finances Decision-making Guidance on Policy, Planning

Executive Committee Paul Courant, University Librarian and Dean of Libraries, UM Laine Farley, Executive Director, CDL John King, Vice Provost for Academic Information, UM Paula Kaufman, University Librarian and Dean of Libraries, UI Brian Schottlaender, University Librarian, UCSD Ed Van Gemert, Deputy Director of Libraries, UW – Madison (ex officio) Brenda Johnson, Dean of Libraries, IU Brad Wheeler, Chief Information Officer, IU John Wilkin, Executive Director of HathiTrust and Associate University Librarian, LIT, UM

Strategic Advisory Board Ed Van Gemert (Chair), Deputy Director of Libraries, UW - Madison John Butler, Associate University Librarian for Information Technology, U Minn Patricia Cruse, Director, Preservation, CDL Bernie Hurley, Director, Library Technologies, UC Berkeley R. Bruce Miller, University Librarian, UC - Merced Sarah Pritchard, University Librarian, Northwestern Paul Soderdahl, Director, LIT, U Iowa John Wilkin, Executive Director, HathiTrust (ex officio) Robert Wolven, Columbia University

Working Groups Appointed by Strategic Advisory Board and Executive Committee Both operational and strategically-focused groups Collections, Communications, Discovery Interface, Full-text Search, Usability, User Support Now 40+ people across the country Expertise from across the partnership

Staff Staff/Expertise – highly integrated – Project managers, IT and communications staff, copyright experts, administrators – Working groups Shared development space

e-Commerce Print on Demand Content Ingest Transformation Validation Content Access PageTurner Collection Builder Large-scale Search Bibliographic Catalog Research Center APIs Quality Assurance Quality Review Content Certification User Services Usability User support (helpdesk) Outreach Project website Monthly newsletter Papers and presentations Communication with potential partners Surveys, general inquiries Repository evaluation and audit (e.g., DRAMBORA, TRAC) Legal Risk management (use of materials) Partner agreements Advocacy Governance Budget, Finances Decision-making Policy Planning Enterprise Management Communication and Coordination with partner institutions Project management Repository Administration Hardware configuration and maintenance Web and application server configuration and maintenance Security Permissions Logging Repository Administration Data management (content storage, backup, integrity checks, deletion) Hardware selection and replacement Content and Metadata specifications Disaster Recovery Processes for ensuring content integrity Rights Management Copyright determination Copyright review Copyright information management (database) Rightsholder permissions Bibliographic Data Management Entity description (record-level) Object identification (item-level) Data availability Collection Development Digital Expansion beyond books and journals (born-digital, images and maps, audio) Selection of content (for non- Google volume ingest and pilots projects) Print Cloud Library (effect of digital on print) Financial contributions of partners HathiTrust Functional Framework

What work is there? Usage Reporting Quality Copyright Review Specifications Metadata Development Environment Other?

Basic Infrastructure Costs

Cost Model 1 Economies of scale keep costs low – $0.149/volume/year for Google-digitized – $0.489/volume/year for IA-digitized – $0.154/volume/year for all content Advantages not fully known until you jump in

A global change in the library environment June 2010 Median duplication: 31% June 2009 Median duplication: 19% Academic print book collection already substantially duplicated in mass digitized book corpus

Digitized Books in Shared Repositories ~75% of mass digitized corpus is backed up in one or more shared print repositories ~3.5M titles ~2.5M

For public domain volumes: (PD*X*C)/N For a given in­copyright volume: IC=(C*X)/H Share in costs of curation Share in uses of relevant materials Voice in future directions Free riders? Cost Model 2

Sustaining common resource Costs go down Quality of services increases – Realize in aggregated collection, something dont get through distributed search or federation

Cost Model 2: Timeline & Requirements Timeline: – Implement in 2013 – Accept new partners now with costs based on overlap calculations Requirements: – Print holdings database – Update mechanisms – Manual remediation

Print Holdings Database Print holdings database will also benefit – De-duplication Compromises user experience, obscures collection development needs – Management of print volumes Information to withdraw volumes (journals) – Legal uses of copyright materials Section 108, 121, ADA uses will depend knowledge of which institutions own(ed) which materials

Questions?

Our HathiTrust Collection

Content Distribution 8,234,081 – Total volumes 2,102,033 – Public Domain 4,527,381 Book titles 202,649 Serial titles * As of March 5, 2011

Language Distribution (1) * As of March 5, 2011 The top 10 languages make up ~86% of all content

Language Distribution (2) The next 40 languages make up ~13% of total * As of March 5, 2011

Dates * As of March 5, 2011

Originating Institution * As of March 5, 2011

Content over time * As of March 5, 2011

Content Growth

Collection Development and Management Collections Committee Appropriate principles for duplicate volumes Print management proposal Prioritization of collection development activities Process for decision-making and prioritization for new content types Recommendations for tools and services Prioritization of copyright review and rights- clearing processes

What about quality? Validation upon ingest Gating on metrics from Google Updated versions from Google Proactive work by Google library partners IMLS grant to develop framework and methodology for validating content in large-scale digital repositories Crowd sourcing in our future?

Questions?

Perspectives on HathiTrust and public services

HathiTrust and Reference HathiTrust: like Google and licensed databases – very large, rich repositories of content, with services supporting their use Reference librarians – are intermediaries between all these resources and researchers who use them

HathiTrust as a Reference Source HathiTrust is CONSTANTLY changing Requirement thats not new to reference librarians, but greatly increased: Stay engaged. Read updates. Use it.

HathiTrust is DIFFERENT We are THE PRODUCERS of this resource – HathiTrust is OUR COLLECTION – New role - not recipient/grader/purchaser – WE build this resource Close engagement of sort we have not experienced before

HathiTrust and Google Books Fact: content in HathiTrust, by the numbers, is currently largely a subset of Google Books Thats how we started BUT Its just the start

HathiTrust stands on its own - Content HathiTrust content has been curated over time by librarians – Mirrors collections of large research libraries – Focus on quality Expanding Non-Google content – Public Domain: Copyright Review Management System – Content from non-Google sources Internet Archive, image collections, government publications

Copyright Review Management System – IMLS Grant awarded to University of Michigan 2008 to determine copyright status of books published in US between 1923 and 1963 – Wisconsin, Minnesota and Indiana each devote 1 FTE to this effort for Phase 3, – As of March, 2011, over 125,000 volumes reviewed; 54% opened up in HathiTrust

HathiTrust stands on its own - Functionality HathiTrust supports scholarship Proper metadata User interface designed for scholarly work Services for people with visual impairments Large-scale text mining

HathiTrust stands on its own - Services Collection builder Member services (via Shibolleth logons) – download full PDFs – create permanent collections

How do people use HathiTrust? Of course, to read public domain books and journals But much more

Use stories I now go to HathiTrust as my first destination for in-depth reference questions. Fantastic searchable corpus; good metadata; content and functionality designed for scholarly needs. Indiana University librarian

Use stories (2) Complete Works of Voltaire (52-volume set published in late 19 th century) – scholar needed all volumes to do scholarly referencing from home – all in HathiTrust presented together under a single MARC record

Use stories (3) Open Folklore – a new way to use HathiTrust – Portal that provides access to open access published and unpublished folklore literature – Indiana Universitys Folklore Collection first CIC Collection of Distinction in Google – HathiTrust – the corner store in the shopping mall of digital repositories – Anchor for whole set of services and initiatives, including journal liberation projects

Questions?

Leveraging HathiTrust data

A bibliographic metadata moment Bib data for each digital volume must be present in HathiTrust in order for volumes to be ingested Depositors make bib data available to UM to be loaded into HathiTrust bibliographic management system Info in the submitted bib records is used to make an initial rights determination about each volume The bib record acts as a manifest for the digital content that is then ingested A snapshot in time of the bib data associated with an object is also stored in the preservation metadata

HathiTrust makes our data available Goal is to extend possibilities for development of local services and other uses Bibliographic API Data API OAI feed of public domain Hathifiles 120,000 public domain texts for computational research

Some examples of use Catalogs UM loaded every record Chicago links to public domain volumes owned in print OCLC loaded records into WorldCat Link resolvers UC created SFX target Vendors H.W. Wilson databases linked to public domain volumes Needed: A guide with examples of how partners have used the data!

Future Directions (1) Locally-digitized partner content Usage reporting Coordinate digital and print resources (holdings database) Computational Research Quality Strategies for openness Collaborative Development Extending Services through Shibboleth Non-book, non-journal content

Future Directions (2) Born-digital content (Publishing) New Bibliographic Management Compliance with TRAC Grant projects OCLC Catalog 3-year review Improvements to Large-scale Search Improvements to PageTurner Ingest Reporting

How can HathiTrust make a difference? Digital Curation – Drive costs down – Reduce bibliographic indeterminacy – Make meaningful decisions about formats and quality – Increase discoverability – Consolidate development talent – Improve strength of archiving Print Curation – Means to associate our print holdings – Coordinated record-keeping Subsidiary benefits – Quantify problems – Collective attention to solving shared problems

How to find out more Web site About section: Twitter: RSS: Monthly newsletter: Contact us: Soon: Facebook, blog